Stop testing. Start shipping.

Big brands are often obsessed with SEO testing. And it’s rarely more than performative theatre.

They try to determine whether having alt text on images is worthwhile. They question whether using words their audience actually searches for has any benefit. They debate how much passing Core Web Vitals might help improve UX. And they spend weeks orchestrating tests, interpreting deltas, and presenting charts that promise confidence – but rarely deliver clarity.

Mostly, these tests are busywork chasing the obvious or banal, creating the illusion of control while delaying meaningful progress.

Why?

Because they want certainty. Because they need to justify decisions to risk-averse stakeholders who demand clarity, attribution, and defensibility. Because no one wants to be the person who made a call without a test to point to, or who made the wrong bet on resource prioritisation.

And in most other parts of the organisation, especially paid media, incrementality testing is the norm. There, it’s relatively easy and normal to isolate inputs and outputs, and to justify spend through clean, causal models.

In those channels, the smart way to scale is to turn every decision into data, to build a perfectly optimised incrementality measurement machine. That’s clever. That’s scalable. That’s elegant.

But that only works in systems where inputs and outputs are clean, controlled, and predictable. SEO doesn’t work like that. The same levers don’t exist. The variables aren’t stable. The outcomes aren’t linear.

So the model breaks. And trying to force it anyway only creates friction, waste, and false confidence.

It also massively underestimates the cost, and overstates the value.

Because SEO testing isn’t free. It’s not clean. And it’s rarely conclusive.

And too often, the pursuit of measurability leads to a skewed sense of priority. Teams focus on the things they can test, not the things they should improve. The strategic gives way to the testable. What’s measurable takes precedence over what’s meaningful. Worse, it’s often a distraction from progress. An expensive, well-intentioned form of procrastination.

Because while your test runs, while devs are tied up, while analysts chase significance, while stakeholders debate whether +0.4% is a win, your site is still broken. Your templates are still bloated. Your content is still buried.

You don’t need more proof. You need more conviction.

The future belongs to the brands that move fast, improve things, and ship the obvious improvements without needing a 40-slide test deck to back it up. The ones who are smart enough to recognise that being brave matters more.

Not the smartest brands. The bravest.

The mirage of measurability

The idea of SEO testing appeals because it feels scientific. Controlled. Safe. And increasingly, it feels like survival.

You tweak one thing, you measure the outcome, you learn, you scale. It works for paid media, so why not here?

Because SEO isn’t a closed system. It’s not a campaign – it’s infrastructure. It’s architecture, semantics, signals, and systems. And trying to test it like you would test a paid campaign misunderstands how the web – and Google – actually work.

Your site doesn’t exist in a vacuum. Search results are volatile. Crawl budgets fluctuate. Algorithms shift. Competitors move. Even the weather can influence click-through rates.

Trying to isolate the impact of a single change in that chaos isn’t scientific. It’s theatre.

And it’s no wonder the instinct to mechanise SEO has taken hold. Google rolls out algorithm updates that cause mass volatility. Rankings swing. Visibility drops. Budgets come under scrutiny. It’s scary – and that fear creates a powerful market for tools, frameworks, and testing harnesses that promise to bring clarity and control.

Over the last few years, SEO split-testing platforms have risen in popularity by leaning into that fear. What if the change you shipped hurt performance? What if it wasted budget? What if you never know?

That framing is seductive – but it’s also a trap.

Worse, most tests aren’t testing one thing at all. You “add relatable images” to improve engagement, but in the process:

  • You slow down the page on mobile devices
  • You alter the position of various internal links in the initial viewport
  • You alter the structure of the page’s HTML, and the content hierarchy
  • You change the average colour of the pixels in the top 30% of the page
  • You add different images for different audiences, on different locale-specific versions of your pages

So what exactly did you test? What did Google see (in which locales)? What changed? What stayed the same? How did that change their perception of your relevance, value, utility?

You don’t know. You can’t know.

And when performance changes – up or down – you’re left guessing whether it was the thing you meant to test, or something else entirely.

That’s not measurability. That’s an illusion.

And it’s only getting worse.

As Google continues to evolve, it’s increasingly focused on understanding, not just matching. It’s trying to evaluate the inherent value of a page: how helpful, trustworthy, and useful it is. Its relevance. Its originality. The educational merit. The inherent value.

None of that is cleanly testable.

You can’t A/B test “being genuinely helpful” or meaningfully isolate “editorial integrity” as a metric across 100 variant URLs – at least, not easily. You can build frameworks, run surveys, and establish real human feedback loops to evaluate that kind of quality, but it’s hard. It’s expensive. It’s slow. And it doesn’t scale neatly, nor does it fit the dashboards most teams are built around.

That’s part of why most organisations – especially those who’ve historically succeeded through scale, structure, and brute force – have never had to develop that kind of quality muscle. It’s unfamiliar. It’s messy. It’s harder to consider and wrangle than simpler, more mechanical measures.

So people try to run SEO tests. Because it feels like control. Because it’s familiar. But it’s the wrong game now.

But you almost certainly don’t need more SEO tests. You almost certainly need better content. Better pages. Better experience. Better intent alignment.

And you don’t get there with split tests.

You get there by shipping better things.

Meanwhile, obvious improvements are sitting waiting. Unshipped. Untested. Unloved.

Because everyone’s still trying to figure out whether the blue button got 0.6% more impressions than the green one.

It’s nonsense. And it’s killing your momentum.

Why incrementality doesn’t work in SEO

A/B testing, as it’s traditionally understood, doesn’t even cleanly work in SEO.

In paid channels, you test against users – different cohorts seeing different creatives, with clean measurement of results. But SEO isn’t a user-facing test environment. You have one search engine (Google, Bing, ChatGPT; choose your flavour) and it’s the only ‘user’ who matters in your test. And none of them behave predictably. Their algorithms, crawl behaviour, and indexing logic are opaque and ever-changing.

So instead of testing user responses, you’re forced to test on pages. That means segmenting comparable page types – product listings, blog posts, etc. – and testing structural changes across those segments. But this creates huge noise. One page ranks well, another doesn’t, but you have no way to know how Google’s internal scoring, crawling, or understanding shifted. You can’t meaningfully derive any insight into what the ‘user’ experienced, perceived, or came to believe.

That’s why most SEO A/B testing isn’t remotely scientific. It’s just a best-effort simulation, riddled with assumptions and susceptible to confounding variables. Even the cleanest tests can only hint at causality – and only in narrowly defined environments.

Incrementality testing works brilliantly in paid media. You change a variable, control the spend, and measure the outcome. Clear in, clear out.

But in SEO, that model breaks. Here’s why:

1. SEO is interconnected, not isolated

Touch one part of the system and the rest moves. Update a template, and you affect crawl logic, layout, internal links, rendering time, and perceived relevance.

You’re not testing a change. You’re disturbing an ecosystem.

Take a simple headline tweak. Maybe it affects perceived relevance and CTR. But maybe it also reorders keywords on the page, shifts term frequency, or alters how Google understands your content.

Now, imagine you do that across a set of 200 category pages, and traffic goes up. Was it the wording? Or the new layout? Or the improved internal link prominence? You can’t know. You’re only seeing the soup after the ingredients have been blended and cooked.

2. There are no true control groups

Everything in SEO is interdependent. A “control group” of pages can’t be shielded from algorithmic shifts, site-wide changes, or competitive volatility. Google doesn’t respect your test boundaries.

You might split-test changes across 100 product pages and leave another 100 unchanged. But if a Google core update rolls out halfway through your test, or a competitor launches new content, or your site’s crawl budget is reassigned, the playing field tilts. User behaviour can skew results, too – if one page in your test group receives higher engagement, it might rise in rankings and indirectly influence how related pages are perceived. And if searcher intent shifts due to seasonal changes or emerging trends, the makeup of search results will shift with it, in ways your test boundaries can’t contain.

Your “control” group isn’t stable. It’s just less affected – maybe.

3. The test takes too long, and the world changes while you wait

You need weeks or months for significance. In that time, Google rolls out updates, competitors iterate, or the site changes elsewhere. The result is no longer meaningful.

A test that started in Q1 may yield data in Q2. But now the seasonality is different, the algorithm has shifted, and your team has shipped unrelated changes that also affect performance. Maybe a competitor shipped a product or ran a sale.

Whatever result you see, it’s no longer answering the question you asked.

4. You can’t observe most of what matters

The most important effects in SEO happen invisibly – crawl prioritisation, canonical resolution, index state, and semantic understanding. You can’t test what you can’t measure.

Did your test change how your entities were interpreted in Google’s NLP pipeline? How would you know?

There’s no dashboard for that. You’re trying to understand a black box through a fogged-up window.

5. Testing often misleads more than it informs

A test concludes. Something changed. But was it your intervention? Or a side effect? Or something external? The illusion of certainty is more dangerous than ambiguity.

Take a hypothetical test on schema markup. You implement the relevant code on a set of PDPs. Traffic lifts 3%. Great! But in parallel:

  • You added 2% to the overall document weight.
  • Google rolled out new Rich Results eligibility rules.
  • A competitor lost visibility on a subset of pages due to a botched site migration.
  • The overall size of Wikipedia’s website shrank by 1%, but the average length of an article increased by 3.8 words. Oh, and they changed the HTML of their footer.
  • It was unseasonably sunny.

What caused the lift? You don’t know. But the test says “success” – and that’s enough to mislead decision-makers into prioritising rollouts that may do nothing in future iterations.

6. Most testing is a proxy for fear

Let’s be honest: a lot of testing isn’t about learning – it’s about deferring responsibility. It’s about having a robust story for upward reporting. About ensuring that, if results go south, there’s a paper trail that says you were being cautious and considered. It’s not about discovery – it’s about defensibility.

In that context, testing becomes theatre. A shield. A way to look responsible without actually moving forward.

And it’s corrosive. Because it shifts the culture from one of ownership to one of avoidance. From action to hesitation.

If you’re only allowed to ship something once a test proves it’s safe, and you only test things that feel risk-free, you’re no longer optimising. You’re stagnating.

And worse, you’re probably testing things that don’t even matter, just to justify the process.

If your team needs a test to prove that improving something broken won’t backfire, the issue isn’t uncertainty – it’s fear.

The buy-in trap

A question I hear a lot is: “What if I need demonstrable, testable results to get buy-in for the untestable stuff?” It’s a fair concern – and one that reveals a hidden cultural trap.

When testable wins become the gatekeepers for every investment, the essential but untestable aspects of SEO (like quality, trust, editorial integrity) end up relegated to second-class status. They’re concessions that have to be justified, negotiated, and smuggled through the organisation.

This creates a toxic loop:

  • Quality improvements aren’t seen as baseline, non-negotiable investments – they’re optional extras that compete for limited time and attention.
  • Teams spend more time lobbying, negotiating, and burning social capital for permission than actually doing the right thing.
  • Developers and creators get demotivated, knowing their work requires political finesse and goodwill rather than just good judgment.
  • Stakeholders stay stuck in risk-averse mindsets, demanding ever more proof before committing, which slows progress and rewards incremental, low-risk wins over foundational change.

The real problem? Treating quality as a concession rather than a core principle.

The fix isn’t to keep chasing testable wins to earn the right to work on quality. That only perpetuates the cycle.

Instead, leadership and teams need to shift the mindset:

  • Make quality, trust, and editorial standards strategic pillars that everyone owns.
  • Stop privileging only what’s measurable, and embrace qualitative decision-making alongside quantitative.
  • Recognise that some things can’t be tested but are obviously the right thing to do.
  • Empower teams to act decisively on quality improvements as a default, not an afterthought.

This cultural shift frees teams to focus on real progress rather than political games. It builds momentum and trust. It creates space for quality to become a non-negotiable foundation, which ultimately makes it easier to prove value across the board.

Because when quality is the baseline, you don’t have to fight for it. You just get on with making things better.

Culture, not capability

Part of the issue is that testing lends itself to the mechanical. You can measure impressions. You can test click-through rates. You can change a meta title and maybe see a clean lift.

But the things that matter more – clarity, credibility, helpfulness, trustworthiness – resist that kind of measurement. You can’t A/B test whether users believe you. You can’t split-test authority. At least, not easily.

So we over-invest in the testable and under-invest in the meaningful.

Because frankly, investing in ‘quality’ is scary. It’s ephemeral. It’s hard to define, and hard to measure. It doesn’t map neatly to a team or a KPI. It’s not that it’s unimportant – it’s just that it’s rarely prioritised. It sits somewhere between editorial, product, engineering, UX, and SEO – and yet belongs to no one.

So it falls through the cracks. Not because people don’t care, but because no one’s incentivised to catch it. And without ownership, it’s deprioritised. Not urgent. Not accountable.

No one gets fired for not investing in quality.

It’s not that things like trustworthiness or editorial integrity can’t be measured – but they’re harder. They require real human feedback, slower feedback loops, and more nuanced assessment frameworks. You can build those systems. But they’re costlier, less convenient, and don’t fit neatly into the A/B dashboards most teams are built around.

So we default to what’s easy, not what’s important.

We tweak the things we can measure, even when they’re marginal, instead of improving the things we can’t – even when they’re fundamental.

The result? A surface-level optimisation culture that neglects what drives long-term success.

Most organisations don’t default to testing because it’s effective. They do it because it’s safe.

Or more precisely, because it’s defensible.

If a test shows no impact, that’s fine. You were being cautious. If a test fails, that’s fine. You learned something. If you ship something without testing, and it goes wrong? That’s a career-limiting move.

So teams run tests. Not because they don’t know what to do, but because they’re not allowed to do it without cover.

The real blockers aren’t technical – they’re cultural:

  • A leadership culture that prizes risk-aversion over results.
  • Incentives that reward defensibility over decisiveness.
  • A lack of trust in SEO as a strategic driver, not just a reporting layer.

In that environment, testing becomes a security blanket.

You don’t test to validate your expertise – you test because nobody will sign off without a graph.

But if every improvement needs a test, and every test needs sign-off, and every sign-off needs consensus, you don’t have a strategy. You have inertia. That’s not caution. That’s a bottleneck.

But what about prioritisation?

Of course, resources are finite. That’s why testing can seem appealing – it offers a way to “prove” that an investment is worth it before spending the effort.

But in practice, that often backfires.

If something is so uncertain or marginal that it needs a multi-week SEO test to justify its existence… maybe it shouldn’t be a priority at all.

And if it’s a clear best practice – improving speed, crawlability, structure, or clarity – then you don’t need a test. You need to ship it.

Testing doesn’t validate good work. It delays it.

So what should you do instead? Use a more honest, practical decision model.

Here’s how to decide:

1. If the change is foundational and clearly aligned with best practice – things like improving site speed, fixing broken navigation, clarifying headings, or making pages more crawlable: → Just ship it. You already know it’s the right thing to do. Don’t waste time testing the obvious.

2. If the change is speculative, complex, or genuinely uncertain – like rolling out AI-generated content, removing large content sections, or redesigning core templates: → Test it, or pilot it. There’s legitimate risk and learning value. Controlled experimentation makes sense here.

3. If the change is minor, marginal, or only matters if it performs demonstrably better – like small content tweaks, cosmetic design changes, or headline experiments: → Deprioritise it. If it only matters under test conditions, it probably doesn’t matter enough to invest in at all.

This isn’t just about prioritising effort. It’s about prioritising momentum. And it’s worth noting that other parts of marketing, like brand or TV, have long operated with only partial measurability. These disciplines haven’t been rendered ineffective by the absence of perfect data. They’ve adapted by anchoring in strategy, principles, and conviction. SEO should be no different. 

Yes, sometimes even best-practice changes surprise us. But that’s not a reason to freeze. It’s a reason to improve your culture, your QA, and your confidence in making good decisions. Testing shouldn’t be your first defence – good fundamentals should.

If you’re spending more time building test harnesses than fixing obvious problems, you’re not optimising your roadmap – you’re defending it from progress.

If your organisation can’t ship obvious improvements because it’s addicted to permission structures and dashboards, testing isn’t your salvation. It’s your symptom.

And no amount of incrementality modelling will fix that.

The alternative

This isn’t just idealism – it’s a strategic necessity. In a world where other channels are becoming more expensive, more competitive, and less efficient, the brands that succeed will be the ones who stop dithering and start iterating. Bravery isn’t a rebellion against data – it’s a recognition that over-optimising for certainty can paralyse progress.

What’s the alternative?

Bravery.

Not recklessness. Not guesswork. But conviction – the confidence to act without demanding proof for every obvious improvement.

You don’t need another test. You need someone senior enough, trusted enough, and brave enough to say:

“We’re going to fix this because it’s clearly broken.”

That’s it. That’s the strategy.

A fast site is better than a slow one. A crawlable site is better than an impenetrable one. Clean structure beats chaos. Good content beats thin content. These aren’t radical bets. They’re fundamentals.

You don’t need to test whether good hygiene is worth doing. You need to do it consistently and at scale.

And the only thing standing between you and that outcome isn’t a lack of data. It’s a lack of permission.

Bravery creates permission. Bravery cuts through bureaucracy. Bravery aligns teams and unlocks velocity.

You don’t scale SEO by proving every meta tag and message. You scale by improving everything that needs to be improved, without apology.

The best brands of tomorrow won’t be the most optimised for certainty. They’ll be the ones who shipped. The ones who trusted their people. The ones who moved.

The brave ones.

The strategic fork

Many of the large brands that over-rely on testing do so because they’ve never had to be good at SEO. They’ve never needed to build genuinely useful content. Never had to care about page speed, accessibility, or clarity. They’ve succeeded through scale, spend, or brand equity.

But the landscape is changing. Google is changing. Users are changing.

And if those brands don’t adapt – if they keep waiting for tests to tell them how to be better – they’ll be left with one option: spend.

More money on ads. More dependency on paid visibility. More fragility in the face of competition.

And yes, that route is testable. It’s measurable. It’s incremental.

But it’s also a treadmill – one that gets faster, more expensive, and less effective over time.

Because if you don’t build your organic capability now, you won’t have one when you need it.

And you will need it.

Because the answer isn’t to build some omniscient framework to measure and score every nuance of quality. Sure, you could try – but doing so would be so complex, expensive, and burdensome that you’d spend 10x more time and resources managing the framework than actually fixing the issues it measures. You can’t checklist your way to trust. You can’t spreadsheet your way to impact. There is no 10,000-point rubric that captures what it means to be genuinely helpful, fast, clear, or useful – and even if there were, trying to implement it would be its own kind of failure.

At some point, you have to act. Not because a graph told you to. But because you believe in making things better.

That’s not guesswork. That’s faith. Faith in your team, your users, and your principles.

What happens next

You don’t need more data. You don’t need to test for certainty. You need conviction.

The problems are obvious and many. The opportunities are clear. The question isn’t what to do next – it’s whether you’ve built the confidence to do it without waiting for permission.

If you’re in a position to lead, lead. Say: “We’re going to fix this because it’s clearly broken.

If you’re in a position to act, act. Don’t wait for a dashboard, or a test, or the illusion of certainty.

Because the brands that win won’t be the ones who proved every improvement was safe.
They’ll be the ones who made them anyway.

Just ship it. Be brave.

guest
1 Comment
Inline Feedbacks
View all comments
David Yehaskel

Oh man, I love this. Thanks for injection some sanity into the conversation.