LLMs aren’t playing by Google’s rules

For decades, search engines like Google and Bing have been evolving their capabilities. Their crawlers simulate page layouts, render JavaScript, cache intelligently, and handle errors gracefully, all to extract and understand as much information as possible from the web.

This sophistication has, in some circles, bred complacency around technical SEO. If Google can work around your slow scripts or broken HTML, why fix them? (Though I’d argue technical SEO is still far more important than most realise.)

However, as new generations of AI models and LLMs, like ChatGPT, Claude, and others, begin to play a bigger role in how content is discovered and recommended, we find ourselves at a turning point.

These systems do not crawl the web like Google. They are not rendering your pages or executing your scripts. In most cases, they are simply fetching the raw HTML from your server and moving on.

That has profound implications.

If an AI agent finds only an empty page, a URL returning the wrong HTTP status, or a tangled mess of markup, it will not see or understand your content. It may misinterpret you, ignore you, or recommend a competitor instead. If your site relies heavily on client-side rendering, the agent may find nothing at all.

It is not just about visibility either. Poor caching and inefficient delivery can cause heavy agent traffic to overload or slow down your servers. Sending huge HTML documents, multi-megabyte images, or bloated JavaScript bundles makes your site expensive, brittle, and a nightmare for lightweight systems to crawl.

And most LLM-connected systems are not close to solving these problems. They will not render JavaScript. They will not download, parse and execute the 300 individual files that your bloated framework needs in order to hydrate the page content. They will not fight their way past your megabytes of tracking scripts, just to find a paragraph of useful information.

If they cannot extract what they need quickly and cleanly, they will simply move on.

So it’s time to go back to basics.

Semantic, valid HTML. Server-side rendering. Correct HTTP headers. Efficient caching. Security best practices. These are no longer just nice-to-haves for SEO. They are critical foundations for your discoverability, your reputation, and your future.

The web is no longer just a place for people to browse and explore. Increasingly, it is a feedstock for AI systems that summarise, recommend, and decide what information gets seen. The future of discoverability is not just about being indexed. It is about being understood, accurately and efficiently, by non-human agents that do not have the patience to wrestle with bad code or bloated sites.

It’s time to stop building websites that are merely ‘good enough’ for Google, and start building ones that are good enough for the next generation of machines.

guest

4 Comments
Inline Feedbacks
View all comments
aska

When Google visits my page I know (or at least knew) to expect real human visits and ad clicks. When LLM crawler visits my page they steal my content, load my server (expenses) and bring zero revenue. I see no use in optimizing my website for LLM crawlers.

SEObot

It depends on what you are offering to your audiences. If it is only about ad revenue then yes not worth it but if you have some product or service to offer than people coming from LLMs can convert at better conversion rates because the intent is already very high where the user is looking for a specific thing and if your product/service is helping to achieve it then yes you have your chances.

Stéphane

Maybe it’s a silly question, but I guess I don’t fully understand what you mean here:
> They will not download, parse and execute the 300 individual files that your bloated framework needs in order to hydrate the page content. They will not fight their way past your megabytes of tracking scripts, just to find a paragraph of useful information.

If they won’t download and parse all the JS and etc, why would the tracking scripts (and others) cause problems?