Meta Incrementality and GeoLift Testing

by Francis Rozange | Jun 25, 2026 | Meta Ads (Facebook & Instagram)

Meta Incrementality and GeoLift Testing

Your ROAS says 4.0. Your boss is happy. But here is the uncomfortable question: how many of those sales would have happened anyway, without a single ad? That gap is what incrementality testing measures, and most advertisers never check it. They trust attribution, which counts conversions near an ad and assumes the ad caused them. It often did not. This guide walks through conversion lift, holdouts, geo testing and Meta’s open source GeoLift library. You will learn how a test group and a control group work, when each method fits, and what real 2025 lift figures look like. The goal is simple: stop guessing whether your ads create sales, and start proving it with a clean experiment instead of a hopeful dashboard.

Why the ROAS does not prove causality

Here is the myth to kill first: a high ROAS proves your ads created sales. It does not. The ROAS is an attribution metric. It takes the conversions that Meta credited to your ads and divides revenue by spend. The problem hides in the word credited. Attribution sees a user click an ad, then buy two days later, and it assigns the sale to the ad. But that user may have been a loyal customer who searched your brand name, saw a retargeting ad on the way, and would have bought regardless. The ad collected the credit without creating the sale. Correlation between ad exposure and conversion is not causation, and the ROAS, however precise it looks, cannot tell the two apart.

The most extreme case is brand retargeting. Imagine a coffee subscription brand running ads to people who already abandoned a cart. Attribution shows a glorious ROAS of 8.0 on that campaign. Pause it for a month, and revenue barely moves, because those people were coming back anyway. The campaign looked like the hero and was mostly free riding on existing intent. This is not a rare edge case. Branded search, retargeting warm audiences and broad prospecting to past buyers all suffer the same blind spot. Incrementality is the only way to separate the ads that genuinely move the needle from the ones that simply stand near the finish line and take a bow. A skincare brand, a gym chain, a B2B tool: every account hides at least one campaign like this.

What incrementality really measures

Incrementality measures the conversions that exist only because the ad existed. The clean way to find that number is a randomized controlled trial, the same design used to test medicine. You split your audience randomly into two groups. The test group is eligible to see your ads. The control group, called the holdout, is withheld from them entirely. Because the split is random, the two groups are statistically identical in demographics, geography, prior behavior and intent. Any difference in conversion rate between them can only come from one thing: the ads. That difference, expressed as a percentage, is the lift. It is the cleanest causal evidence digital marketing can produce, and it is the same logic a pharmaceutical trial uses to prove a drug works.

A short worked example makes it concrete. Say the test group converts at 3.0 percent and the holdout converts at 2.2 percent. The incremental lift is the difference, 0.8 percentage points, which means roughly 27 percent of the test group conversions were caused by the ads and the rest would have happened anyway. If your test group generated 10000 sales, only about 2700 are truly incremental. That reframes everything. A campaign with a reported ROAS of 4.0 but only 27 percent incrementality has a true incremental ROAS closer to 1.1. The difference between those two numbers is the difference between scaling a winner and pouring money into a campaign that mostly harvests sales you already had. The headline number flattered you, the incremental number tells the truth.

Meta Conversion Lift: the user level holdout

Conversion Lift is Meta’s native incrementality tool, and it runs a user level randomized controlled trial inside Ads Manager. You define an objective, usually purchases, leads or app installs, then Meta randomly assigns eligible users to a test group that can see your ads or to a holdout that cannot. The randomization happens at the account level, so a person is consistently held out across Facebook and Instagram for the whole study. According to Triple Whale’s documentation, conversions are measured through your Pixel, the Conversions API or offline event uploads, which means the quality of your tracking setup directly shapes how trustworthy the result will be. Garbage signal in, garbage lift out.

The holdout size is a balance. A larger holdout gives a stronger statistical signal, because the control group is big enough to produce a stable baseline. A smaller holdout sacrifices fewer potential customers to the control group. Triple Whale recommends a holdout between 10 and 20 percent for most studies, while other practitioners run 5 to 10 percent when conversion volume is high. The rule of thumb is straightforward: the more conversions you generate, the smaller the holdout you can afford while still detecting a real effect. Low volume accounts need a larger holdout and a longer window, which is why very small advertisers struggle to run a clean user level lift test at all and should look at the geo methods further down.

Reading the result: lift, confidence interval, p value

At readout, Conversion Lift gives you three numbers that matter. The lift is the percentage difference in conversion rate between test and holdout. The confidence interval shows the range your true lift likely sits in, so a 95 percent interval means you can be 95 percent sure the real value falls inside that band. The p value tells you whether the result is statistically significant, and the standard threshold is below 0.05. A lift of 30 percent with a p value of 0.03 is a trustworthy signal. A lift of 30 percent with a wide confidence interval spanning negative to positive values is noise dressed as insight, and acting on it would be a costly mistake. Most tests run three to four weeks before the numbers stabilize.

Real readouts look like this. BrandAlley, a UK fashion e-commerce business, ran a Meta Conversion Lift Study over four weeks and found a Meta ROI of 4.00 with a 90 percent confidence interval between 2.91 and 5.09. That interval is the honest part. It says the true value is very likely above 2.91, which is enough to justify the spend, while admitting the point estimate of 4.00 is not a guarantee. Compare that to Shinola, the luxury retailer, whose incrementality test showed a 14.3 percent increase in online conversions from Facebook ads and revealed that standard attribution had underreported the channel by 413 percent. Same idea, opposite direction: here attribution undercounted, not overcounted, and the test rescued a channel that looked weak on paper.

Meta A/B tests, PSA and ghost ads

Conversion Lift is not the only randomized design Meta offers. The A/B test tool in Ads Manager splits your audience randomly to compare two variables, usually two creatives or two audiences, and it is a true experiment because the split is random rather than sequential. It answers which version wins, not whether advertising works at all, so it sits next to incrementality rather than replacing it. The deeper variants come from the academic side of ad measurement. PSA tests, short for public service announcement tests, show the control group a neutral non branded ad instead of withholding ads entirely, which controls for the simple fact that seeing any ad at all changes behavior.

Ghost ads push that logic further and fix the main weakness of PSA tests, which is paying for control group impressions. According to Tinuiti and Remerge, ghost ads log the moment when your ad would have won the auction for a control user, then withhold it, so you get a perfectly matched comparison without spending money to show that user anything. The exposed group saw your ad, the ghost group saw whatever organic content filled the slot, and the two are otherwise identical because both genuinely entered the auction. Ghost ads are considered the gold standard of clean, cost efficient incrementality, and they remove the selection bias that haunts naive holdouts where the control group never even qualified to see an ad in the first place.

GeoLift: incrementality without user tracking

User level holdouts have a growing problem: privacy. As tracking signal degrades, building a clean user level control group gets harder. GeoLift solves this from a different angle. It is Meta’s open source library, hosted on GitHub under facebookincubator, and it measures incrementality at the market level instead of the user level. Rather than splitting people, you split geography. You pick a set of test regions where you turn ads up or on, and you compare them against control regions where nothing changes. Because it uses aggregated regional data, GeoLift is resilient to signal loss, transparent because the code is public and reproducible, and it never needs a single user identifier to work. That last point is why it keeps gaining ground in a privacy first world.

How synthetic control works

The clever part is the synthetic control method at GeoLift’s core. You rarely have a control region that perfectly mirrors your test region, so GeoLift builds one. It takes your untreated regions and combines them, assigning each an optimized weight, to construct an artificial region whose historical sales track your test region as closely as possible before the campaign starts. According to Meta’s GeoLift methodology documentation, this synthetic region becomes the counterfactual: the model’s best estimate of what your test region would have done with no campaign. After the campaign runs, the gap between actual test sales and the synthetic baseline is your incremental lift. No user data, no holdout of individuals, just regions and history doing the work that cookies used to do.

Power analysis and market selection

GeoLift is not a tool you launch on instinct. Before any test, you run a power analysis, and the library ships the functions to do it. GeoLiftMarketSelection simulates fake interventions across your historical data at varying effect sizes, then calculates the minimum detectable effect for each candidate combination of test and control markets, producing power curves across effect sizes from zero to roughly 25 percent. This tells you, before spending a euro, whether a given set of markets can even detect the lift you expect. Practitioners target a minimum detectable effect of around 2 to 5 percent for a well powered test. Skipping this step is how teams run a geo test for six weeks and end up unable to conclude anything at all.

Duration follows from the same analysis. According to practitioner guides built on the GeoLift documentation, geo tests typically run four to six weeks, but only after the power analysis confirms the markets and the window can detect your target effect. Volatility matters here: a category with steady demand needs less time, while a seasonal or spiky business needs a longer window to separate the campaign signal from natural noise. A useful pattern is a holdout geo test, where you turn ads off in selected regions while keeping them on everywhere else, then measure the sales drop in the dark regions. That drop, scaled back up, is the incremental contribution of the channel you paused, and it costs you only the sales you chose to forgo for the experiment.

Incremental Attribution: incrementality goes always on

The biggest 2025 shift is that Meta turned incrementality from an occasional study into a live optimization signal. In April 2025, Meta rolled out Incremental Attribution in Ads Manager, a feature that separates real ad driven conversions from those that would have happened naturally, and lets you optimize campaigns directly toward incremental conversions. Instead of running a manual lift test every quarter, you can ask the delivery system to chase the conversions that genuinely would not have occurred without the ad. According to coverage of Meta’s announcements, early adopters reported improvements above 20 percent in true performance once they switched the optimization target from raw conversions to incremental ones, which is a meaningful jump for a single setting change.

The headline figure came from Meta directly. On its Q1 2025 earnings call in April 2025, Meta told investors that advertisers optimizing toward incremental conversions were seeing an average 46 percent lift in performance. That number traces back to a set of 37 conversion lift studies run from July to October 2024 across 30 advertisers and 8 verticals, presented at Meta’s Performance Marketing Summit. Treat the 46 percent with healthy caution: it is a flattering average from Meta’s own studies on advertisers who chose to optimize this way, not an independent benchmark you should expect to replicate. Still, the direction is real, and it signals that incrementality is moving from a side audit to the core of how Meta wants you to optimize from now on.

The myth: incrementality is only for big brands

A stubborn belief says incrementality testing is a luxury for brands with millions in spend and a data science team. Half true, half wrong. The user level Conversion Lift does favor scale, because a clean lift needs enough conversions in both the test group and the holdout to reach statistical significance. A store doing 50 purchases a month will not power a meaningful user level study, full stop. But that does not mean incrementality is off limits. It means you pick the right tool. GeoLift was open sourced precisely so smaller teams could run market level tests without buying an enterprise platform, and a simple on off holdout in a few regions needs spreadsheet discipline far more than a big budget or a quant on staff.

There is an even cheaper entry point that any advertiser can run: the lightweight geo holdout. Pick two comparable regions, keep ads running in one and pause them in the other for a fixed window, then compare sales. It is less rigorous than a full synthetic control model, but it is directionally honest and costs nothing but the foregone spend in the paused region. A regional bakery chain, a single location gym, a niche cosmetics brand: all can learn whether their ads create demand or merely follow it. The real barrier to incrementality is not budget, it is the willingness to discover that a campaign you love is not actually working. That fear, not money, is what keeps most accounts from ever testing in the first place.

A practical testing playbook

Start by picking the question that actually matters to your business. Do not test everything at once. The most valuable first test is usually your most expensive or most suspicious campaign, the one with a beautiful ROAS that you secretly fear is free riding on existing demand. A subscription meal kit brand might test its branded retargeting, a furniture retailer might test broad prospecting, a SaaS company might test its retargeting of trial users. Pick one, define the conversion that pays the bills, and choose your method by your volume: user level Conversion Lift if you have the conversions, GeoLift or a geo holdout if you do not. The wrong test on the wrong campaign teaches you nothing and burns a month.

Then protect the result from yourself. Decide the holdout size, the duration and the success threshold before you start, and write them down, because a lift result is painfully easy to rationalize after the fact. Set the confidence bar in advance: a p value below 0.05 and a confidence interval that stays clearly positive. Run the test long enough to clear the power analysis, never stopping early because the numbers look good on day four. When the readout lands, act on it even when it hurts. If a beloved campaign shows near zero lift, that is not a failed test, it is a successful one that just saved you money. Incrementality is only useful if you let it change what you do next, otherwise it is an expensive way to feel scientific.

Common mistakes that ruin a lift test

Four mistakes wreck most first attempts. The first is contamination: running other big changes during the test window, like a price drop, a new email campaign or a seasonal sale, so you can no longer tell what caused the lift. Lock the test window and freeze everything else you can. The second is peeking and stopping early, which is the single fastest way to fool yourself, because random noise crosses your threshold for a day or two before settling. The third is testing a campaign too small to ever reach significance, then reading a coin flip as a verdict. The fourth is choosing test and control regions that never behaved alike historically, which poisons a geo test before it starts.

There is also a conceptual trap worth naming. Incrementality is not a single permanent number for a channel. It moves with your audience saturation, your creative, the season and how aggressively you already spend. A campaign that shows 40 percent lift today can show 15 percent once you scale it and exhaust the easy incremental customers. That is why the best teams treat incrementality as a recurring check, not a one time stamp of approval. Re run the test after a major change in budget, audience or creative, and watch the lift trend over time rather than worshipping a single readout. The number is a snapshot of a moving target, and you scale into the zone where extra spend still buys real, additional sales.

Sources

Meta, Conversion Lift Testing for Incrementality Measurement, facebook.com/business/measurement/conversion-lift. Meta Open Source, GeoLift repository and methodology documentation, github.com/facebookincubator/GeoLift and facebookincubator.github.io/GeoLift. Meta Q1 2025 earnings call, April 2025, on the average 46 percent lift in incremental conversions. Meta Performance Marketing Summit, 37 conversion lift studies across 30 advertisers and 8 verticals, July to October 2024. Triple Whale, Meta Conversion Lift Tests and GeoLift 101 guides. Tinuiti, How Do Ghost Ads Measure Ad Performance. Remerge, Incrementality Tests 101: PSA, Ghost Ads and Ghost Bids. Haus, Understanding Meta Incrementality Testing. Case studies: BrandAlley Meta Conversion Lift Study with ROI 4.00 and a 90 percent confidence interval of 2.91 to 5.09, and Shinola with a 14.3 percent conversion increase where attribution underreported the channel by 413 percent.

Cart