Sellforte Experiments: Methodology (Geo Lift and A/B Test)
How Sellforte estimates the counterfactual using Bayesian synthetic control.
Geo Lift and A/B Test in Sellforte share the same underlying analytical methodology. This article explains how that methodology works, what it produces, and when it works best. Differences between the two experiment types are noted where relevant.
The core question: what is the counterfactual?
Incrementality experiments rarely produce a clean, simple comparison. Test and control groups are never perfectly identical. They differ in size, baseline performance, seasonality, and trend patterns.
Because of that, the key question is not simply whether performance changed during the experiment. The real question is:
What would have happened in the test group if the campaign or treatment had not run?
That hypothetical no-treatment trajectory is called the counterfactual. The treatment effect is the gap between what actually happened and what the model estimates would have happened without the intervention.
Everything in the results dashboard — including incremental sales, iROAS, confidence, and credible intervals — is derived from how that counterfactual is estimated and how certain the model is about it.
Sellforte estimates the counterfactual using Bayesian synthetic control.
The synthetic control approach
Instead of selecting one 'most similar' control group, Sellforte builds a synthetic version of the test group as a weighted combination of the available control groups.
You can think of the control groups as ingredients, and the model as finding the combination that best reproduces the test group's behaviour during the pre-treatment period.
Two properties of these weights are important:
- Weights are non-negative and sum to 1. The synthetic control is always a blend of real groups, never an extrapolation beyond observed data.
- Each group's weight reflects its contribution to the match. These weights are shown in the Test and control groups section of the results dashboard, so you can see which groups contribute most to the synthetic control.
If the synthetic control closely tracks the test group before the treatment begins, it becomes a credible estimate of what would have happened during the treatment period without the campaign. The pre-treatment fit is what the R-squared on the dashboard helps summarise.
Why the method is Bayesian
A single best-fit synthetic control gives you one counterfactual estimate, but not much insight into uncertainty.
In reality, pre-treatment data is noisy, and several different weight combinations may fit the historical data almost equally well. Each of those combinations implies a slightly different counterfactual.
The Bayesian approach accounts for this uncertainty directly. Instead of producing one fixed synthetic control, it produces a distribution of plausible synthetic controls that are all consistent with the pre-treatment data.
When those plausible controls are projected forward into the treatment period, they produce a distribution of plausible treatment effects. This is where the dashboard's uncertainty measures come from:
- The shaded band around the counterfactual line shows the credible range of no-treatment outcomes
- The iROAS distribution shows the spread of plausible return estimates, expressed as a 90% Highest Density Interval (HDI)
- Confidence shows how much of the treatment effect distribution supports a real positive effect rather than random noise
What the methodology produces
For each experiment, the methodology produces:
- A counterfactual time series for the test group, with a credible range
- A cumulative treatment effect over the treatment window, with credible bounds
- Control weights showing how the synthetic control was constructed
- A pre-treatment fit metric (R-squared) indicating how reliable the projection is
- A confidence score based on the posterior treatment effect
- An iROAS distribution combining the estimated treatment effect with incremental media spend, when media data is available
The role of the pre-treatment period
The pre-treatment period is the foundation of the analysis. The model uses it to learn the relationship between test and control groups before any intervention occurs. The quality of this learning directly determines how credible the counterfactual will be.
A longer pre-treatment period generally allows the model to find a more stable and reliable match. A very short pre-treatment window may not provide enough data to distinguish signal from noise.
Both Geo Lift and A/B Test allow you to set an explicit pretreatment start date if you want to limit the pre-treatment window — for example, to exclude older data that may not be representative of current conditions. For both Geo Lift and A/B Test, editing the pre-treatment period is available as an advanced option in the experiment setup screen.
The cool-down period
The cool-down period extends the analysis window beyond the end of the treatment. It allows you to measure any carry-over effects that continue after the intervention ends.
Whether a cool-down period is relevant depends on your business context. Some channels or campaigns produce effects that persist for days or weeks after the treatment ends. Others produce effects that stop immediately. Reviewing the cumulative treatment effect chart is the clearest way to judge whether a cool-down period is adding meaningful information.
The appropriate cool-down length is customer-specific. If the KPI effect continues to accumulate after the treatment ends, include enough days to capture it fully. If there is no post-treatment effect, the cool-down period does not materially change the result.
For both Geo Lift and A/B Test, setting up the cool-down period is available as an advanced option in the experiment setup screen.
When the methodology works well
Bayesian synthetic control works best when a few practical conditions are met:
- The available control groups can collectively reproduce the test group's pre-treatment behaviour
- Nothing unusual happens in the control groups during the treatment period, such as overlapping campaigns or external market shocks
- The pre-treatment period is long enough to estimate a stable match
- The treatment is strong enough to produce a measurable effect above the noise
When these conditions hold, the method provides a credible estimate of incremental impact that can support budget decisions, experiment readouts, and MMM calibration.
Differences between Geo Lift and A/B Test
The methodology is identical for both experiment types. The main differences are in how the data is sourced and what kinds of groupings are possible:
- Data source: Geo Lift uses data already present in Sellforte. Groups are defined by geographic dimension — region, market, or similar. No data upload is required.
- Data source: A/B Test uses data you upload directly. Groups can represent any meaningful split — customer segments, product categories, store tiers, or geographies — as long as the treatment is clearly isolated.
- Data granularity: Geo Lift is independent of the MMM data granularity. If more granular regional data is available in Sellforte, it can be used in Geo Lift even if MMM runs at a higher level.
- Data granularity: A/B Test accepts any grouping structure present in the uploaded data, giving full flexibility over how test and control groups are defined.
In both cases, the synthetic control is built from the control groups provided, and the counterfactual estimation process is the same.