Google have to improve their structured data reporting

I’ve talked previously about how the adop­tion of struc­tured data – specific­ally schema.org markup – is stra­tegic­ally import­ant to Google. That’s why they’re invest­ing so much in support­ing new formats, and why they’re continuing to add new reports to their Google Search Console product.

But the way in which those reports summar­ise and visu­al­ise the use of struc­tured data on your website is prob­lem­atic. Their reports are increas­ingly confus­ing, mislead­ing, and are going to become worse over time as they extend their report­ing to cover new features.

Their reports are confusing

Many webmas­ters will have issues like the one outlined in the image below. Here, the “Sitelinks search­box” report in Google Search Console (which iden­ti­fies the pres­ence of struc­tured data describ­ing an internal site search feature) says that the yoast.com website features over two thou­sand internal search elements.

image
Yoast​.com does­n’t have ~2k sitelinks search boxes, but it does refer­ence the search box on every page (as part of the WebSite object)

Yoast​.com obvi­ously does­n’t have thou­sands of internal search tools, but we do describe the internal search feature of our site in the struc­tured data of every page on our site. That’s because each WebPage element we’re describ­ing on each page is part of a WebSite, and that WebSite has a SearchAction prop­erty. We’re not saying “This page has an internal search feature”, we’re saying “This WebPage is part of a WebSite, which has a search feature”.

That looks some­thing like the image below, which is taken from the source code of one of our articles about meta descriptions.

image
An example of our SearchAction markup, attached to the yoast​.com WebSite entity, as part of a connec­ted graph describ­ing an Article.

Despite look­ing like a ‘good’ report, these green bar charts are completely mean­ing­less. In fact, this is just one example of many, where the reports simply don’t make sense. The SearchAction markup is a prop­erty of the WebSite, not the WebPage, so report­ing on it based on the number of pages it’s found on simply does­n’t make sense. Having more pages with this markup is neither ‘good’ nor ‘bad’, and argu­ably isn’t relev­ant or useful inform­a­tion whatsoever.

This confu­sion in report­ing between “things on this page”, and “things we found refer­ences to on this page” is consist­ent through­out Google’s report­ing, and renders most of those reports largely useless. At best, they’re simply a count of pages on a website. At worst, they’re simply confus­ing to users who don’t under­stand what they’re look­ing at, and the data does­n’t make sense.

As Google adds support for more types of struc­tured data, the prob­lem is going to get worse. But address­ing this isn’t as simple as just improv­ing the charts.

There’s a deeper problem

The poor report­ing formats and visu­al­isa­tion are the tip of the iceberg for a bigger prob­lem. Fixing the Sitelinks search­box graph would be a trivial issue, but it’s unlikely to happen due to three deeper chal­lenges with Google’s internal oper­a­tions and commu­nic­a­tions. Specifically:

  1. Google’s ‘mental model’ for struc­tured data is entirely page-​centric*. That means that their report­ing is also page-​centric; they’re all based on visu­al­ising the struc­tured data discovered on a specific URL. For prop­er­ties which don’t belong to the WebPage which a URL repres­ents, page-​centric report­ing won’t make sense.
  2. Their reports are element-​centric. They’re based on visu­al­ising the (numbers of) URLs which feature a specific type of struc­tured data within a site. Rich descrip­tions of pages are about the connec­tions between elements (e.g., the Author of an Article on a WebPage. This is more useful to under­stand than, say, how many Article elements exist across a site.
  3. Their teams are feature-​centric. Different groups of people within Google work on differ­ent areas struc­tured data, directly related to the features they want to develop/​expose in their search results. This leads to conflict­ing opin­ions, imple­ment­a­tions, and fragmentation.

*Specifically, they have no way to connect or under­stand the rela­tion­ships between differ­ent entit­ies (repres­en­ted in struc­tured data) when they’re spread across differ­ent pages. They rely on the rest of Google’s systems to determ­ine rela­tion­ships between pages/​things. This aligns with their narrat­ive around the deprecation of rel next/prev tags, the insist­ence that there's no such thing as Domain Authority, and similar.

The problem with a page-​centric model

Because Google won’t “join the dots” between pages on multiple entit­ies, the only way in which we can describe such rela­tion­ships (e.g., describ­ing the prop­er­ties of a WebSite within which a given WebPage exists, or, the prop­er­ties of the Organization who published that page) is to repeat all of those prop­er­ties, in context, on every page on the site.

To describe an Article, for example, you must also describe the WebPage that it’s on, the Organization which published it, and the Author who wrote it. These entit­ies might natur­ally live on differ­ent URLs, and have deep rela­tion­ships and contexts of their own. Critically, describ­ing the prop­er­ties of those entit­ies in situ is not the same as declar­ing that they’re ‘on’ the page upon which they’re described.

That’s why, in Yoast SEO, we construct a ‘graph’ of all the objects on (or refer­enced on) a given page and describe all of their prop­er­ties. You can see a great example of what that looks like here (from this article). You can read more about the ‘graph approach’ which we used here.

This isn’t a prob­lem of the graph approach defin­ing things which “aren’t on the page”, it’s a prob­lem of Google provid­ing the wrong tools for under­stand­ing the pres­ence and role of struc­tured data on a site. They’re trying to visu­al­ise a graph in page-​centric, feature-​centric report­ing. Their internal struc­tures, teams and polit­ics are forcing a square peg into a round hole.

Remember, we have to describe all of the prop­er­ties of each related element, because Google won’t build those asso­ci­ations across pages/​URLs.

…And their advice is consistently bad

To make things worse, Google’s own docu­ment­a­tion provides bad and conflict­ing advice on the topic.

This page, for example, says that you should only add internal search markup on the site’s homepage. But the internal search markup is a prop­erty of the WebSite, not the homepage’s WebPage (never mind that there's no schema.org definition for a 'homepage'). And if we have to describe that WebSite on every page – because it’s the natural parent of each WebPage, and has other import­ant prop­er­ties (like describ­ing the Organization who publish the WebSite, and there­fore the WebPage) – then it does­n’t make sense to alter the prop­er­ties of the WebSite entity based on which WebPage is being described. That’s a terrible approach.

Similarly, advice from Google's John Mu suggests that Organization markup should only be placed on the homepage (or perhaps, depend­ing on the site, the contact or about page), and “not on every page”. Without a graph-​based approach, that advice would make sense – every page should­n’t represent/​contain the Organization’s markup. But in a connec­ted graph, we need to under­stand that, for example, the Author of an Article on a WebPage works for the Organization who publish the WebSite which the Article is on. Excluding the Organization from the graph makes the rela­tion­ships between those entit­ies fall apart.

They’re internally inconsistent, too

On top of this, their approaches are incon­sist­ent between features. The way in which Logo markup (usually used in the context of describ­ing an Organization) is repor­ted on is completely differ­ent to the Sitelinks SearchBox scenario. 

Specifically, they’ll only report on Logo markup if it’s on a ‘lead­ing edge’ in the graph markup (i.e., if it’s attached to a ‘top-​level’ node). This is never the case in our approach, as the Organization is always at least a few ‘layers’ deep when we’re primar­ily describ­ing a WebPageArticle, Product or similar. That means that despite our Organization being refer­enced on every page (and having a Logo), our Logo report­ing in Google Search Console is empty.

Google Search Console does­n’t detect or report on the Logo prop­er­ties attached to our Organization markup.

This differ­ence in approach is a delib­er­ate, and seem­ingly arbit­rary design decision by the team respons­ible for this report. In fact, each feature that Google supports is detec­ted, extrac­ted, and repor­ted on in differ­ent ways, accord­ing to the whims of the teams involved. Their prob­lems with silos and internal incon­sist­ency extend well beyond their report­ing, into the heart of how they extract process differ­ent types of schema​.org entit­ies – but that’s a whole other topic.

There’s consist­ency in their bad advice, at least. It all assumes that they’re talk­ing to users who’re simply copying/​pasting bits of isol­ated struc­tured data into their webpages. It caters to users who don’t have connec­ted graphs, and that they’re just trying to add snip­pets of code to specific pages (to chase specific features/​rewards from Google's structured data features).

It also assumes that it’s fine to rely on Google to extract all of those isol­ated ‘bits’ of markup, and for them to construct their own context about how they’re related.

Their approach is bad for the open web

I’m deeply uncom­fort­able with both of those assump­tions because I think it could be harm­ful to the open web. Google isn’t the only consumer who theor­et­ic­ally stands to bene­fit from richer, deeper struc­tured data on the web, if it’s imple­men­ted prop­erly. With their current feature-​centric approach, they’re only encour­aging and reward­ing ‘snip­pets’ of struc­tured markup, which crit­ic­ally does­n’t describe the rela­tion­ships between things. That worries me because:

  1. In graph data, it’s the rela­tion­ships between things which matter. The value of struc­tured data is in describ­ing those rela­tion­ships (the “linked” bit of JSON-LD is the whole point). If users are just copying/​pasting frag­men­ted bits of markup onto their pages in order to chase rewards from Google, they do so at the expense of the future wider web, in a way which Google is endorsing/​rewarding. Even if Google (with all of their processing and under­stand­ing of the web) can reli­ably infer the rela­tion­ships between things on pages across a site, other consumers may not be able to.
  2. It’s close to impossible to construct a nuanced graph by copy­ing and past­ing code. To produce (and main­tain) markup like the example on this page, each node needs to know about the prop­er­ties of each other node, and make complex decisions about how they’re related. I’ve described many of the chal­lenges in achiev­ing that here. Catering for copy/​paste approaches, both in their guidelines and their report­ing, limits the adop­tion of struc­tured data solely to a few bits-​and-​pieces which Google want.
  3. Google’s pref­er­ences frequently diverge from schema.org’s defin­i­tions, in ways which are designed to simplify and enable this ‘copy/​paste’ approach. That conflict­ing inform­a­tion causes confu­sion. Forums, blogs, and even reput­able sites are full of conflicting, and often incorrect advice. Google’s approach to struc­tured data docu­ment­a­tion and report­ing is mysti­fy­ing and confus­ing the topic, rather than clari­fy­ing it.

All of this is bad for the open web. Tying our struc­tured data imple­ment­a­tions specific­ally to whichever features Google is currently reward­ing (never mind that they frequently deprecate them), rather than richly describ­ing our pages and our content (in a way which Google can still consume!) stifles innov­a­tion, and releg­ates schema​.org to little more than a refer­ence sheet for copy-​and-​paste Google hacks.

In an ideal world, every webpage would fully describe itself and all of its rela­tion­ships, and other platforms/​tools/​companies would be able to consume and util­ise that inform­a­tion. In today’s world, none of that is possible, because all of Google’s tools and report­ing force webmas­ters to look ‘inwards’, not outwards.

What’s the answer?

The struc­tured data reports in Google Search Console urgently need to be over­hauled. They need to do a better job of visu­al­ising the rela­tion­ships between entit­ies, and the distinc­tions between prop­er­ties on the page and prop­er­ties of the page. That’s a complex chal­lenge, but one which they should rise to if they want to encour­age broader and deeper adoption.

To do that, they’ll need to reas­sess their internal team struc­tures and oper­at­ing processes. They’ll need consist­ent internal posi­tion­ing on how their processes for data extrac­tion, eval­u­ation and report­ing work across differ­ent schema ‘features’.

They’ll also need to change the narrat­ive in their support docu­ment­a­tion, to shift the focus away from copy­able snip­pets, and towards construct­ing graphs of connec­ted data. 

In many cases, that’ll mean shift­ing the processes away from facil­it­at­ing blunt copying-​and-​pasting by indi­vidual website managers, to making stronger demands of tech­no­logy vendors and under­ly­ing plat­forms to include rich struc­tured data in their outputs. Individual webmas­ters and content edit­ors should­n’t – can’t reas­on­ably – be respons­ible for main­tain­ing complex struc­tured data about their pages (never mind the markup for all of the other pieces on their website). And the issues with Google’s report­ing suite only serves to widen the gap between their under­stand­ing of what ‘good’ looks like.

Whilst Yoast has taken some great strides in standardising an approach for this kind of connec­ted data, Google should be doing more to solve the prob­lems of adop­tion and util­isa­tion further upstream. We should­n’t be lead­ing this charge, Google should be. But at the moment, there’s no sign that they under­stand the scope of the oppor­tun­it­ies at hand for getting this right, or the risk we face if they (continue to) get it wrong.

Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

6 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Evgeni Yordanov

Jono, I feel the exact same way, mate. And as an owner of one of the big eCommerce plat­forms out there, I too want to set the stand­ard in the entity gener­a­tion and manage­ment through Schema. So let’s just keep our fingers crossed that Google picks this up and runs with it, as opposed to what they’ve been doing so far.

Mack

Hi Jono – I’ve been digging into schema and struc­ture data for the last week and a bit trying to learn as much as possible about the subject. It’s been diffi­cult getting a compre­hens­ive under­stand­ing of it and how to use it across differ­ent use cases but little by little I’m piecing it together. This article might be a bit ahead of me in terms of my know­ledge but I appre­ci­ate what you had to say and will round back to it at some point (it also intro­duced me to the YOAST docu­ment­a­tion on the subject which was a lot of help). This might not… Read more »

Mack

Thanks, Jono!

Tyson Roehrkasse

Excellent article! I’ve been learn­ing more and more about struc­tured data and SEO for the past several months, and Google’s docu­ment­a­tion has been driv­ing me bonkers. Conflicting inform­a­tion, lack of inform­a­tion, arbit­rary require­ments, appar­ent disreg­ard for the entire point of know­ledge graph and struc­tured data in the first place… good grief! 

It sucks that webmas­ters are faced with such a choice: 

  • Embed struc­tured data the right way, clear, connected
  • Embed struc­tured data the “Google” way, a night­mare (but performs better in Google search)

Do you have any thoughts on how other search engines compare in this regard?

6
0
Would love your thoughts, please comment.x
()
x