A page is more than just a container for words

The latest SEO fad is the idea that websites need a machine-only version. Strip out the layout, remove the “noise”, and hand LLMs a simplified view of your content.

The pitch is always framed as pragmatic. Modern websites are bloated. LLMs don’t need the design. They just want the content. So let’s strip things back, give machines what they want, and meet them where they are.

I understand the instinct. Anyone who’s spent time looking at real websites in production knows how much accidental complexity we’ve normalised. When it takes megabytes of JavaScript and a small prayer to render a paragraph of text, it’s tempting to conclude that the page itself is the problem.

But that conclusion is backwards.

What’s being optimised for here isn’t understanding, it’s extraction. And those are not the same thing.

A page is not just a container for words. It’s an editorial artefact. It has hierarchy, emphasis, framing and intent baked into it. What comes first matters. What’s prominent matters. What’s tucked away in a sidebar or a footnote matters. These things are not decorative flourishes for humans. They are signals about meaning.

When you flatten a page into markdown, you don’t just remove clutter. You remove judgment, and you remove context.

The other problem, which people tend to skate past, is trust. The moment you publish a machine-only representation of a page, you’ve created a second candidate version of reality. It doesn’t matter if you promise it’s generated from the same source or swear that it’s “the same content”. From the outside, a system now sees two representations and has to decide which one actually reflects the page.

And once that choice exists, ‘optimisation’ inevitably follows. Not necessarily in the cartoonish sense of outright deception, but in the far more common sense of tidying, polishing and shaping. Awkward caveats get softened. Commercial messages get clearer. The version intended for machines becomes a little cleaner, a little more persuasive, a little more flattering than the one a human actually sees.

So the consuming system has a problem. It can trust the machine-facing version and accept that it will be gamed. It can verify it against the human-facing version, which means extra fetches, extra parsing and extra logic to reconcile differences. Or it can ignore it and just parse the page itself.

At any kind of scale, that third option wins. Fetching one artefact and extracting what’s needed from it is dramatically cheaper than arbitrating between multiple representations forever, even if a machine temporarily leans on a cheaper shortcut for convenience.

The moment that shortcut becomes something a system has to trust, validate, or keep in sync, it stops being cheap. This is why alternate realities on the web have such a poor track record. Not because they’re philosophically offensive, but because they’re economically unsustainable.

There’s also a deeper misunderstanding running through all of this. People point out, correctly, that many LLM systems today don’t render pages in anything like the way a browser does. They fetch the document, they don’t load all the CSS, they don’t execute much JavaScript. From that, it’s tempting to conclude that layout and presentation don’t matter.

But that’s really a statement about today’s limitations, not about long-term relevance.

Google didn’t always render pages either. It learned to, because if you want to model relevance the way humans experience it, you can’t treat all text as equal just because it exists in the markup. Placement, prominence and context change how information is interpreted. They always have.

If these systems are trying to approximate human understanding, then rendering isn’t optional. It’s inevitable. And when that happens, all the signals you stripped away in the name of “LLM friendliness” suddenly matter again.

Which leads to the most uncomfortable part of this whole discussion.

A lot of the enthusiasm for markdown mirrors and machine-only feeds is really a proxy argument about bad websites. Sites where the DOM is incoherent. Sites where content only exists after several client-side miracles. Sites where design systems actively obscure hierarchy instead of clarifying it.

In those cases, stripping everything back does make the content easier to deal with. But that doesn’t mean the right solution is to maintain a shadow version of reality for machines. It means the page itself is failing at its job.

The boring fix still works. Semantic HTML. Clear structure. Sensible hierarchy. Progressive enhancement. Content that exists when the page loads. Layout that reflects editorial intent instead of fighting it.

Do that, and you don’t need a second version of your site. You get one version that works for users, for search engines, and for any system that wants to understand what the page is actually saying.

Ideas like llms.txt sit in the same category. Well-intentioned, occasionally useful as hints, but fundamentally unable to escape the same gravity once they’re relied upon. Separate surface area, separate incentives, same trust and verification problems. Treat them as convenience, not as a strategy.

There isn’t going to be one version of the web for humans and another for machines. The systems that matter will always converge on the human-facing reality, because it’s the cheapest thing to fetch, the easiest thing to validate, and the richest source of meaning.

If you want to be understood by whatever comes next, don’t build shadows and side channels.

A page is not just a container for words. Make your site the best version of your site for whoever, or whatever, is accessing it.

0 Comments
Inline Feedbacks
View all comments