Hi there,
Thanks for reaching out! I saw your post about A/B testing on Facebook and thought I could offer some thoughts from my experience. It's a really common problem you've run into, and your suspicion about what's happening is spot on. It's something we see all the time when auditing new client accounts.
I'm happy to give you some initial guidance on how we'd approach this. Getting testing right is probably one of the most important things in paid advertising, but it's also where most people go wrong. Let's get into it.
You're right, we'll need to look at your testing methodology...
First off, your gut feeling is absolutely correct. When you run an A/B/C test on a very broad audience, you're not really running a controlled experiment. You think you're testing Creative A vs Creative B vs Creative C, but you're actually testing (Creative A + Audience Pocket A) vs (Creative B + Audience Pocket B) vs (Creative C + Audience Pocket C).
The wildly different CPMs you're seeing are the biggest clue. Meta's algorithm is incredibly powerful. When you give it a huge, broad audience, it immediately starts trying to find the cheapest, easiest-to-reach people who will engage with each specific ad. If one of your ad's messaging hooks resonates slightly more with, say, women aged 25-34 in urban areas, the algorithm will quickly start funnelling more of the budget towards that specific group for that specific ad. Meanwhile, your other ad might be doing better with men aged 45-54 in the suburbs. The algorithm finds these different pockets for each ad, and the cost to reach those pockets (the CPM) will be different.
So, what does this mean for your results? It means they're completely skewed and pretty much unrelaible for telling you which creative is actually better. An ad might get a lower cost-per-result simply because the algorithm found a cheaper, but not necessarily better or more scalable, sub-audience for it. You might pick a "winner" that only works on a tiny slice of your potential market, and when you try to scale it, the performance completely falls apart. You've essentially polluted your own data pool from the get-go. To get a true read on which creative performs best, you need to force Meta to show all the ad variants to the same, or at least a very similar, group of people. This is non-negotiable for a proper, scientific test.
This is a fundamental flaw in the test setup. Tbh, it's a mistake we see all the time, so don't feel bad about it. The platform almost encourages you to go broad, but for testing, it's a trap.
I'd say you need a more structured approach to testing...
So, how do we fix this? We need to go back to basics and create a proper controlled environment. In any scientific test, you only change one variable at a time. In this case, the variable you want to test is the 'messaging hook'. Therefore, every other variable – particularly the audience – must remain as constant as possible.
The solution is exactly what you were thinking: redo the test but with a more narrow, clearly defined audience. By constraining the audience, you give the algorithm much less room to wander off and find different pockets of people for each ad. It forces the ads to compete for the same eyeballs, which is exactly what you want. This way, when one ad gets a better CTR, a lower CPA, or a higher ROAS, you can be much more confident that it's because the creative itself is more effective, not because the algorithm found a fluke audience for it.
Here's how I'd suggest setting it up practically:
1. Use Meta's A/B Test Feature: Don't just run three ads in one ad set and see what happens. Use the actual A/B Test tool when you create the campaign. Select 'A/B Test' and then choose 'Creative' as your variable. This tells Meta you're running a formal experiment, and it will try to ensure a fair delivery, preventing one ad from getting all the budget prematurely.
2. Choose a Defined Audience: Instead of going broad, pick one of your best-performing interest-based audiences or a specific lookalike audience. The key is that it's the *same* audience for all versions of the test. We'll get into which audience to pick in a minute, but for the test itself, just pick one and stick to it.
3. Structure the Test Correctly: The campaign setup should be simple: one campaign, one ad set (with your chosen narrow audience), and then your multiple ad creatives (your control and the two variants) inside that ad set. The A/B test tool will guide you through this.
4. Give it Time and Budget: A common mistake is calling a test too early. You need to let it run long enough to gather statistically significant data. How long depends on your budget and CPA, but you definately need more than a couple of days. A good rule of thumb we use is to let an ad or ad set spend at least 3x your target CPA before making a hard decision on its performance. If you're not getting enough conversions, look at leading indicators like CTR and CPC, but the real winner is the one that drives your main objective most efficiently.
This disciplined approach might feel slower, and your CPMs in the test might even be higher than in your broad campaigns, but the quality and reliability of the data you get is worth its weight in gold. This is how you find genuinely winning creatives that you can then scale with confidence.
You probably should think more about your audience structure...
This naturally leads to the next question: which "narrow" audience should you use for testing, and how should you structure your audiences in general? This is where a lot of the magic happens. A disorganised audience strategy is just as bad as a flawed testing one. I've audited so many accounts where people are just testing random audiences that don't align with their business goals or the customer journey.
The most effective way to think about this is to break your audiences down by their position in the sales funnel: Top of Funnel (ToFu), Middle of Funnel (MoFu), and Bottom of Funnel (BoFu). Each stage has a different purpose and requires different audiences and messaging.
Here’s a breakdown based on the prioritisation we use for our clients, many of whom are in eCommerce. This structure is pretty universal though.
ToFu: Cold Audiences / Prospecting
This is about reaching people who have likely never heard of you before. The goal is to introduce your brand and generate initial interest.
-> Detailed Targeting (Interests, Behaviours): This is your bread and butter when starting out. This is where you should be running your creative tests. The key here, and I can't stress this enough, is to be specific and logical. If you're selling, for example, handcrafted pottery (judging by your username 'funkyspots'), don't just target a broad interest like 'Shopping'. It's useless. Think about what your ideal customer is *actually* interested in. You'd be better off targeting interests like 'Etsy', 'Handmade', specific pottery magazines or famous potters, or competitor brands. I usually group related interests into themed ad sets (e.g., one ad set for competitor brands, one for related hobbies, one for publications). This keeps your targeting tight and tells you which themes resonate most.
-> Lookalike Audiences (LALs): Once you have enough data (you need at least 100 people in a source audience, but really you want 1,000+ for a good quality LAL), these become your most powerful prospecting tool. But not all lookalikes are created equal. You need to prioritise them based on the value of the source audience. The hierarchy should be:
- Lookalike of your highest value previous customers
- Lookalike of all previous customers (e.g., Purchased 180 days)
- Lookalike of people who Initiated Checkout
- Lookalike of people who Added to Cart
- Lookalike of Website Visitors
- Lookalike of Video Viewers or Page Engagers
-> Broad Targeting: This is what you were using. Broad can work, but only *after* your pixel is very "seasoned" with thousands of conversion events. At that point, you can trust the algorithm to find the right people. It should not be your starting point and it is definately not the place for controlled creative testing.
MoFu: Warm Audiences / Consideration
These are people who have engaged with you in some way but haven't made a purchase or taken that final step. They know who you are. The goal here is to remind them of your value and overcome any objections.
-> Website Visitors: People who've visited your site in the last 30-90 days. You should always exclude recent purchasers and people who've reached the checkout pages.
-> Video Viewers: People who watched a significant portion (e.g., 50% or more) of your video ads. This is a great, low-cost way to build a retargeting pool.
-> Social Engagers: People who've liked, commented on, or saved your posts on Facebook or Instagram.
The messaging for this group should be different. You don't need to introduce yourself again. You could show them testimonials, different product use cases, or answer common questions.
BoFu: Hot Audiences / Conversion
These are people who are on the verge of buying. They've shown strong intent. Your goal is to get them over the finish line.
-> Added to Cart (but didn't purchase): Typically within the last 7-14 days. These are your hottest leads.
-> Initiated Checkout (but didn't purchase): Same as above. The highest intent audience you have.
For this BoFu group, you can be very direct. Use ads that mention scarcity ("Limited stock!"), urgency ("Offer ends soon!"), or maybe offer a small incentive like free shipping to close the deal. One campaign we worked on for a women's apparel brand achieved a 691% return by having a really aggressive and effective BoFu strategy.
You'll need a solid campaign structure to make this work...
Knowing the audiences is one thing, but you need to put them into a coherent campaign structure that you can manage and analyse properly. Throwing them all into one campaign is a recipe for disaster. You need to seperate them logically.
Here’s a simple but very effective structure we use as a baseline for many of our clients:
Campaign 1: PROSPECTING (ToFu)
- Objective: Conversions (e.g., Purchases)
- Budget: Campaign Budget Optimisation (CBO) enabled. This lets Meta allocate your budget to the best-performing ad set within the campaign.
- Ad Set 1: Interest Group A (e.g., Interests related to 'Etsy', 'Handmade', 'Not on the High Street')
- Ad Set 2: Interest Group B (e.g., Interests related to competitor brands in your niche)
- Ad Set 3: LAL 1% (Purchasers) (Once you have enough data for it)
Inside each of these ad sets, you'd place your 2-3 best-performing, "evergreen" creatives. When you want to test *new* creatives, you do it in a seperate A/B test campaign as we discussed earlier. Once you find a new winner from your test, you turn off the worst performer in your main Prospecting campaign and add the new winner in. This creates a continuous cycle of testing and improvement.
Campaign 2: RETARGETING (MoFu & BoFu)
- Objective: Conversions (e.g., Purchases)
- Budget: CBO enabled, but with a much smaller budget than prospecting (maybe 10-20% of your total spend).
- Ad Set 1 (MoFu): Website Visitors (30 Days) + Social Engagers (30 Days)
- Exclusions: Exclude anyone who Added to Cart, Initiated Checkout, or Purchased in the last 30 days.
- Ad Set 2 (BoFu): Added to Cart (14 Days) + Initiated Checkout (14 Days)
- Exclusions: Exclude anyone who Purchased in the last 14 days.
This structure gives you total clarity. You can see exactly how much it costs to acquire a new customer (from your Prospecting campaign) and how much your retargeting is contributing to your overall revenue. It prevents audience overlap and ensures your messaging is always relevant to where the user is in their journey. I remember one software client where simply restructuring their account this way and refining their targeting helped us reduce their Cost Per Acquisition from over £100 down to just £7. Structure is everything.
We'll need to look at the right metrics...
Finally, once your tests and campaigns are running with this new structure, you have to be looking at the right things to make decisions. As we established, CPM is not a key performance indicator, its a diagnostic metric. A high CPM isn't necessarily bad if that audience is converting at a high rate, and a low CPM is useless if nobody buys.
For your creative tests, the primary metric you should judge a winner by is the one that aligns with your campaign objective.
- If your objective is Sales: The winner is the ad with the highest **Return on Ad Spend (ROAS)** or the lowest **Cost Per Purchase (CPA)**. Full stop.
- If your objective is Leads: The winner is the ad with the lowest **Cost Per Lead (CPL)**.
You can use secondary metrics to diagnose *why* an ad is winning or losing.
- Click-Through Rate (CTR): A high CTR means your ad creative and copy is good at grabbing attention and getting people to click. It’s a great measure of how engaging your ad is.
- Cost Per Click (CPC): This is influenced by both your CTR and your CPM. A good ad with a high CTR will generally lead to a lower CPC.
You can also use these metrics to diagnose problems in your funnel, which is something we do constantly.
- Low CTR? -> Your ad creative or copy isn't working. It's not stopping the scroll. Test new images, videos, or headlines.
- High CTR but low conversion rate on your site? -> You have an issue on your landing page. The ad is doing its job getting the right people to click, but your page isn't convincing them. This could be pricing, product photos, descriptions, or a lack of trust. Looking at your store analytics is vital here. Where do people drop off? Lots of product page views but no adds to cart points directly to an issue with the product page itself.
Being this systematic about analysis is how you move from just 'spending money on ads' to building a predictable, scalable customer acquisition machine.
This is the main advice I have for you:
To put it all together, here is a summary of the approach I would recommend you take to fix your testing process and improve your results.
| Area of Focus | Recommended Action | Why It's Important |
|---|---|---|
| A/B Testing Methodology | Stop testing creatives on broad audiences. Re-run tests using Meta's A/B Test feature with a single, narrow, well-defined audience (e.g., a specific interest group). | Ensures you are testing only one variable (the creative). Provides reliable, clean data to identify true winning ads, not just ads that found a fluke sub-audience. |
| Audience & Campaign Structure | Restructure your account into seperate Prospecting (ToFu) and Retargeting (MoFu/BoFu) campaigns. Prioritise your ToFu audiences logically (Interests > high-value Lookalikes). | Gives you clarity on performance at each stage of the funnel, prevents audience overlap, and allows you to tailor messaging correctly to warm vs. cold audiences. |
| Performance Analysis | Judge test winners based on primary business metrics (ROAS, CPA, CPL), not vanity metrics like CPM. Use CTR and CPC as diagnostic tools to identify creative or landing page issues. | Focuses you on what actually drives business growth. Stops you from making bad decisions based on misleading metrics. |
| Optimisation Cycle | Create a system: continuously test new creatives in a controlled test campaign. When a winner is found, roll it into your main 'evergreen' campaigns and turn off the worst performer. | Prevents ad fatigue and ensures your account performance consistently improves over time, rather than stagnating. This is how you scale. |
I know this is a lot to take in, but getting these foundations right is honestly the difference between an account that struggles to break even and one that generates significant returns. We've seen it time and again with clients who come to us with the exact same issue you're facing. Implementing a structured approach like this is fundamental to the results we get, whether it's achieving a 1000% ROAS for a subscription box or generating over $115k in course sales in just a few weeks.
While the principles are here, the execution requires constant attention, analysis, and experience to know which levers to pull and when. If you're feeling a bit overwhelmed and would like to have an expert pair of eyes look over your account and strategy in more detail, we offer a free initial consultation call. We can walk through your setup together and provide some more specific, actionable advice.
Either way, I hope this detailed breakdown has been helpful for you and gives you a clear path forward.
Regards,
Team @ Lukas Holschuh