generative engine optimization guide

Schema.org for answer engines: the structured data that wins AI citations in 2026

The largest controlled study found JSON-LD didn't lift AI Overview citations. Here's where structured data still earns its keep, and the markup that does the work.

June 15, 202611 min read
schema.orgstructured dataJSON-LD
Schema.org for answer engines: the structured data that wins AI citations in 2026

The biggest controlled study of the year on structured data and AI search delivered an uncomfortable result. When Ahrefs tracked 1,885 pages that added JSON-LD between August 2025 and March 2026, Google AI Overview presence didn't climb.

It dropped 4.6%, with roughly a 1-in-2,500 chance of being random. AI Mode moved +2.4% and ChatGPT +2.2%, both inside the noise floor.

So the practitioner question for 2026 isn't "does schema.org help AI citations." It's "what is schema actually for, now that we have real numbers." The answer is narrower and more durable than the vendor blogosphere claims.

TL;DR: Schema.org and JSON-LD are citation infrastructure, not a citation booster. Structured data still earns its place by powering rich results, disambiguating your entities for LLM crawlers, and keeping you clear of Google's one schema-specific manual action. It does not reliably lift AI Overview citation rates, and the best-controlled evidence says the opposite. Implement it well because the cost is low and the rich-result upside is real. Don't over-invest in it for a citation boost Google denies in writing.

Does schema.org increase AI citations in 2026?

Short answer: not in any way the strongest evidence can detect, at least for Google AI Overviews. Structured data improves how machines understand your page, which is a different thing from how often answer engines cite it.

The cleanest definition to carry: schema.org JSON-LD is a machine-readable labeling layer that tells parsers what your visible content means. It governs rich-result eligibility and entity resolution. It is not a documented ranking input for AI answers.

That framing matters because GEO discourse keeps conflating the two. A page can be perfectly marked up and still get ignored by ChatGPT, and a page with zero schema can get cited constantly.

A 730-citation study from Capconvert/SearchVIU in October 2025 found pages with no schema were cited 59.8% of the time, while pages with sparse, generic schema sat at 41.6%. Generic markup underperformed having none.

Key takeaways

  • The largest controlled study (Ahrefs, May 2026) found JSON-LD did not lift AI citations; AI Overviews actually fell 4.6%.
  • Google states plainly there are "no additional requirements" or special schema needed to appear in AI Overviews or AI Mode.
  • Microsoft is the lone engine whose operator confirmed schema feeds its LLMs.
  • Attribute density beats schema presence: rich Product + Review markup was cited 61.7% vs 41.6% for generic Article/Org schema.
  • FAQPage rich results were fully retired May 7, 2026. The 2023 event was a restriction, not the deprecation.
  • The durable wins are entity disambiguation (sameAs, @id) and rich-result eligibility, not AI ranking.

What does each AI engine actually say about schema?

Each major engine has staked out a different posture, and most of the "schema for AI" advice online ignores that they disagree. Here is what's on the record, first-party only.

Engine Published position on schema/JSON-LD Robots.txt note Source
Google (AI Overviews, AI Mode) "No special schema required" Standard Search Central, May 2025
OpenAI (ChatGPT, SearchGPT) No parser spec published ChatGPT-User stopped honoring robots.txt Dec 9, 2025 OpenAI crawler docs
Microsoft (Bing, Copilot) Endorsed: schema "helps Microsoft's LLMs understand your content" Standard SMX Munich, March 2025
Perplexity No first-party schema statement Published robots guidance Publishers' Program
Anthropic Claude No first-party schema statement Provider-dependent web_search tool (Messages API)

Google's AI Features page is the most explicit denial: your content "needs to meet the same requirements as it would to be eligible to appear in Google Search with a snippet, there's nothing extra to do." Google's internal mechanism is query fan-out, breaking one query into parallel sub-queries and re-synthesizing, and no Google doc names JSON-LD as an input to it.

Microsoft is the outlier. Bing Principal PM Fabrice Canel, who owns the web-data pipeline, told SMX Munich that schema markup helps Microsoft's LLMs, and pointed publishers at IndexNow for fast push. That's the closest any major engine has come to an on-record "yes."

OpenAI documents its crawler taxonomy (GPTBot, OAI-SearchBot, ChatGPT-User, and the April 2026 OAI-AdsBot) but never specifies whether OAI-SearchBot consumes JSON-LD. The honest engineering stance: use robots.txt and IndexNow to control access, use schema to control clarity. Don't assume schema is a ChatGPT ranking factor.

Which schema types still produce a Google rich result?

The 2026 landscape is a two-layer system. Layer 1 types produce a visible rich result. Layer 2 types feed entity understanding and the Knowledge Graph with no visual payoff.

The common 2026 mistake is reading a Layer 1 retirement as a full retirement. Google still parses deprecated types for understanding, and confirms "unused structured data doesn't cause problems for Search."

Type Rich result status (June 2026) AI/entity utility
Article / NewsArticle / BlogPosting Active High
FAQPage Retired May 7, 2026 Low (still parsed)
HowTo Retired Sept 13, 2023 Low
Organization / Person No visible result High (entity layer)
Product Active (Universal Cart needs a feed) High
BreadcrumbList Active Medium
Review / AggregateRating Active, often silently stripped Low
Dataset Dataset Search only Medium
Course + 6 others Deprecated June 12, 2025 Low

The dates worth knowing cold: HowTo left desktop on September 13, 2023; Google deprecated seven types in one June 2025 post (Book Actions, Course Info, Claim Review, Estimated Salary, Learning Video, Special Announcement, Vehicle Listing); practice problems were removed January 6, 2026.

And the one everyone misquotes: FAQPage was fully retired May 7, 2026. The August 2023 event only restricted FAQ rich results to government and health sites. If a source tells you "FAQ was deprecated in 2023," they're wrong about the year and the event.

Why your review stars vanish

Review and AggregateRating validate fine and still don't show stars. Google's self-serving rule, live since 2019, silently strips ratings a business applies to its own Organization on its own site. The verbatim rule: "Don't use Review or AggregateRating markup for your own business or your own products on your own site." Put AggregateRating on the Product or Recipe page that actually displays customer reviews. That's the only placement that survives.

How do you use schema for AI entity recognition?

This is where structured data still pulls real weight. The single most important property for AI-crawler entity resolution is sameAs, defined by schema.org as "URL of a reference Web page that unambiguously indicates the item's identity", with Wikipedia and Wikidata named explicitly.

The high-value targets, in order: your official domain, the entity's Wikipedia article, its Wikidata Q-item, and verified profiles that themselves carry a rel="me" link back to you (LinkedIn, GitHub, ORCID, Bluesky).

One caveat the vendor blogs skip: sameAs does not "submit" you to the Knowledge Graph. Google's Knowledge Graph Search API is read-only for developers, and no Search Central page says on-page sameAs pushes an entity in. The defensible claim is alignment with canonical references, not submission.

The pattern that works:

json
{
  "@context": "https://schema.org",
  "@type": "Person",
  "@id": "https://example.com/about#person",
  "name": "Jane Doe",
  "jobTitle": "Senior Research Analyst",
  "worksFor": { "@type": "Organization", "@id": "https://example.com/#organization" },
  "knowsAbout": ["Generative Engine Optimization", "Information Retrieval"],
  "sameAs": [
    "https://www.wikidata.org/wiki/Q000",
    "https://www.linkedin.com/in/janedoe",
    "https://orcid.org/0000-0000-0000-0000"
  ]
}

Define the Person and Organization once, each with a stable @id you control, then reference them everywhere via that @id. Defining one real-world entity under multiple @id values is a top cause of broken entity resolution: the crawler reads it as several distinct entities.

Attribute density is the real differentiator

The Capconvert study's sharpest finding: attribute-rich Product + Review schema hit a 61.7% AI-citation rate versus 41.6% for generic Article/Organization/BreadcrumbList markup. And 0 of 5 AI systems extracted product data from JSON-LD that had no visible HTML counterpart.

AI citation rate by schema richness (Capconvert, Oct 2025)Rich Product + Review61.7%No schema59.8%Generic/sparse schema41.6%
AI citation rate by schema richness (Capconvert, Oct 2025)

Visible HTML is the substrate. JSON-LD is confirmation. The 2026 enforcement reality, confirmed across all five engines, is that schema is read at retrieval time against what's actually rendered.

What can schema do, and what can't it?

The "author markup → E-E-A-T → AI citation" chain is the most overclaimed idea in GEO. E-E-A-T is a quality-rater concept, not a ranking factor and not a schema property. There's no first-party statement that Person markup causes higher AI citation.

Google's own ceiling on schema is blunt: "The existence of structured data does not guarantee that your page will be ranked higher... Use of structured data enables a feature to be present, it does not guarantee that it will be present."

Capability Defensible Overstated
Rich-result eligibility Yes, documented "Guaranteed inclusion" (Google denies)
Entity alignment Yes, via sameAs "Pushes entity into Knowledge Graph"
AI Overview citation boost No (−4.6%, Ahrefs) "2.5x" / "30, 40%" / "3.2x FAQ" claims
Author/E-E-A-T signal Documents the entity "Causes higher AI citability"
Crawl access No, robots.txt + IndexNow do "Schema is a ranking factor"

Treat the circulating "2.5x" (Stackmatix) and "30, 40%" (greadme) figures as content marketing with no published methodology, directionally contradicted by the Ahrefs primary result. Cite them with "reported by" hedging or skip them.

The one genuinely intriguing counter-signal: a Search Engine Land experiment in September 2025 where a schema-only page (markup, no body copy) hit position 3 and was the only variant to surface in the AI Overview. It's n=3, suggestive not robust, but it hints schema can help a thin page qualify to be understood at all.

The one manual action schema can trigger

Schema has its own spam policy, and it's worth respecting. The Spammy Structured Markup policy flags four violations: marking up invisible content, irrelevant or misleading markup, markup meant to manipulate ranking, and programmatically generated markup not reviewed for accuracy.

A manual action here removes rich-result eligibility (your only documented schema benefit) without tanking organic rank, and it's reversible only via a Search Console reconsideration request. The most common trigger is the first one: marking up content hidden in display:none or aria-hidden blocks. Mark up only what a visitor can see.

What this means for you

Ship correct, content-matched schema, and stop expecting it to move AI citations on its own.

Concretely:

  • Keep the Layer-1 winners: Article/NewsArticle, Product, BreadcrumbList. The rich-result upside is real and unaffected by AI mechanics.
  • Invest in entity primitives: one Organization node, one Person node per author, stable @id S, and a real sameAs array pointing to Wikidata, Wikipedia, and verified profiles.
  • Go deep, not wide: attribute-rich markup on pages with matching visible HTML beats sparse markup everywhere. Sparse generic schema can underperform none.
  • Keep FAQPage only where a visible FAQ exists; the rich result is gone but the type is still parsed. Don't invent Q&A pairs.
  • Validate in CI: type-check with schema-dts at build time, run every emitted block through the Schema Markup Validator at deploy, and hard-code JSON-LD into the HTML rather than injecting via Tag Manager.
  • Use the right levers for access: robots.txt and IndexNow control crawling, not JSON-LD.

The moat in 2026 is honesty plus hygiene. Implement schema well because it's cheap, it disambiguates your entities, and it powers rich results that haven't been deprecated. Just don't sell anyone, including yourself, on an AI citation boost the data doesn't show.

Sources

Frequently asked questions

Does schema.org markup increase AI citations in 2026?

Not measurably for Google AI Overviews. The largest controlled study, by Ahrefs in May 2026, tracked 1,885 pages that added JSON-LD and found a 4.6% decrease in AI Overview presence, with AI Mode and ChatGPT effects within statistical noise. Schema is infrastructure for entity clarity and rich results, not a citation lever.

Is FAQPage schema still worth adding after the May 2026 deprecation?

The FAQ rich result was fully retired on May 7, 2026, so it no longer produces visible accordions in Google Search. The FAQPage type itself remains valid schema.org vocabulary and is still parsed for understanding, so keep it only where a real, visible FAQ exists on the page. Do not add invented Q&A pairs.

What is the most important schema property for AI entity recognition?

SameAs. It links your Organization or Person node to canonical references like Wikipedia, Wikidata, and verified profiles, which is the strongest single signal for entity disambiguation by LLM crawlers. Pair it with a stable @id reused across your page graph.

Which AI engine has confirmed it uses schema markup?

Microsoft. At SMX Munich in March 2025, Bing Principal PM Fabrice Canel said schema markup helps Microsoft's LLMs understand content. Google states the opposite, that no special schema is required for AI Overviews. OpenAI, Perplexity, and Anthropic publish no parser-level position.

Can bad schema hurt my rankings?

It can remove rich-result eligibility, not organic rank. Marking up content that isn't visible, or generating unreviewed markup at scale, can trigger Google's Spammy Structured Markup manual action, which disables rich results and is reversible only through a reconsideration request.