Ahrefs found the number that explains why GEO exists: 76% of Google AI Overview citations come from pages outside the top 10 organic results.
That breaks the old operating model. A page can lose the classic SERP and still win the answer. A page can rank first and never appear in the paragraph the buyer actually reads.
Generative engine optimization is the discipline that sits in that gap. It is not SEO with new labels. It is the work of making a page crawlable by AI retrieval bots, readable without JavaScript, easy to chunk into answer passages, trusted enough to rerank, and measurable across ChatGPT, Perplexity, Google AI Overviews, Gemini, Claude, and Copilot.
The citation is now the scarce unit. Treat it that way.
TL;DR
GEO is the operating layer for AI citations. The workflow is: allow retrieval crawlers, control training crawlers, ship static HTML, write answer-first passages, keep pages fresh, use schema for entity clarity, ignore unsupported llms.txt ranking claims, and measure AI share of voice weekly.
Do not optimize for “AI” as one surface. ChatGPT, Perplexity, AI Overviews, and Gemini use different retrieval systems, crawler rules, and citation formats.
Do not over-credit on-page tricks. The original GEO paper found strong lifts from citations, quotations, and statistics, but later controlled work shows domain authority, passage match, freshness, and page quality dominate.
Do not bury the answer. Engines retrieve chunks, not essays. Paragraph four rarely wins when paragraph one from a competitor gives the direct answer, the number, and the source.
Do measure. A GEO program without a prompt library is opinion management. A GEO program with 100 to 500 prompts, citation SOV, and discovery-gap reports becomes an operating system.
Key Takeaways
- AI answers are already mainstream. ChatGPT reached 900M weekly active users in February 2026. Google AI Overviews reached 2B monthly users. Gemini passed 750M monthly active users.
- Classic rankings are not enough. Ahrefs found 76% of AI Overview citations come from URLs outside the organic top 10.
- JavaScript is still a crawlability failure. Vercel and related testing found roughly 69% of AI crawlers cannot execute JavaScript, including GPTBot, OAI-SearchBot, ClaudeBot, and PerplexityBot.
- Crawler intent matters. GPTBot crawl volume fell 87% in 2025 while OAI-SearchBot rose 312%, per Etavrian, because publishers learned to block training and allow retrieval.
- Robots.txt is not enforcement. TollBit measured roughly 30% of AI bot requests ignoring robots.txt by Q4 2025.
- Perplexity is broad but shallow. arXiv 2604.25707 measured Perplexity at about 16 sources per prompt with 0.0646 absorption per citation; ChatGPT used 6.88 sources with 0.2713 absorption.
- Schema is infrastructure, not magic. Ahrefs’ 1,885-page study found adding JSON-LD did not increase AI Overview citations; presence fell 4.6%.
Why GEO Matters Now
The scale is no longer speculative. ChatGPT at 900M weekly active users is a distribution layer, not a chatbot curiosity. Google AI Overviews at 2B monthly users means generated answers sit directly in the world’s largest search product. Gemini at 750M monthly active users means Google has a second AI discovery surface outside the traditional SERP.
At the same time, the click supply is shrinking. SparkToro found only 374 of every 1,000 U.S. Google searches end in a click to the open web. Gartner predicted traditional search volume would drop 25% by 2026 as chatbots and agents absorb more queries.
That is the zero-click problem in operational terms: rank tracking can look stable while the actual decision surface moves upstream.
The economics are uneven. Perplexity referrals converted at 10.5% versus 1.76% for Google organic in Seer Interactive’s reported data. That does not mean every AI referral converts six times better. It does mean the clicks that survive citation filtering can be high-intent.
The right conclusion is not “traffic is dead.” It is narrower: traffic is being rationed by citation systems. If you are not cited, you are often not considered.
How AI Engines Decide What To Cite
Every major grounded answer engine runs a version of the same pipeline:
- Discover URLs.
- Fetch pages.
- Extract readable text.
- Chunk the page into passages.
- Embed or otherwise represent those chunks.
- Index them.
- Retrieve candidates for a user prompt.
- Rerank candidates for authority, match, freshness, and safety.
- Ground the generated answer in selected passages.
- Render citations.
GEO work maps to that pipeline. Robots.txt affects fetch. Static HTML affects extraction. Answer-first structure affects chunk quality. Freshness and authority affect reranking. Tables, statistics, and source links affect grounding. AI share of voice measures the citation layer.
The engines differ in the details.
| Engine | Retrieval pattern | Citation behavior | Operational note |
|---|---|---|---|
| ChatGPT | OAI-SearchBot for search index; ChatGPT-User for user-triggered fetches | Fewer sources, higher per-source absorption | Allow OAI-SearchBot if you want ChatGPT search visibility |
| Perplexity | PerplexityBot index plus Perplexity-User live fetches | Many sources, low per-source absorption | Freshness and passage match are unusually important |
| Google AI Overviews | Googlebot and Google Search index | Broad, SEO-shaped source pool | Google says no special schema or AI file is required |
| Gemini | Google ecosystem plus grounding APIs | Depends on product surface | Google-Extended controls training/grounding in some Google systems, not AI Overviews |
| Claude | Claude-SearchBot and Claude-User where available | Strong citation metadata in API contexts | Anthropic split training, search, and user fetchers in 2026 |
The central shift from SEO is the unit of value. SEO optimizes the page. GEO optimizes the passage. A 3,000-word article does not get cited. A 70-word chunk inside it does.
That is why the best GEO content reads almost blunt at the paragraph level: claim first, number second, source third, implication fourth.
Crawler Access: The Robots.txt Split
The first production mistake is treating all AI bots the same.
Training crawlers ingest content for model training. Retrieval crawlers fetch or index content so a product can cite it. User-action fetchers retrieve a URL because a person asked an assistant to inspect it. Those jobs have different economics.
OpenAI documents the split clearly. OAI-SearchBot is used to surface websites in ChatGPT search results. GPTBot is used to crawl content that may be used in training foundation models. ChatGPT-User is triggered by user actions and is not used for automatic search indexing.
That lets you make the obvious operator choice: allow retrieval, block training.
When we run this split for publisher and B2B sites, the verification step is always server logs. After a deploy, we expect OAI-SearchBot and PerplexityBot to keep returning 200s. We expect GPTBot and ClaudeBot to fall to 403 at the edge or stop at robots.txt if the vendor honors it. If GPTBot traffic keeps coming from verified OpenAI IP ranges after Disallow, the robots update has not propagated. If it keeps coming from non-verified IPs, you have a spoofing problem, not a robots problem.
A current baseline file looks like this:
# ===== TRAINING / INGESTION: BLOCK =====
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: Applebot-Extended
Disallow: /
User-agent: Google-Extended
Disallow: /
# ===== SEARCH / RETRIEVAL: ALLOW =====
User-agent: OAI-SearchBot
Allow: /
User-agent: Claude-SearchBot
Allow: /
User-agent: PerplexityBot
Allow: /
# ===== USER-ACTION FETCHERS: ALLOW =====
User-agent: ChatGPT-User
Allow: /
User-agent: Claude-User
Allow: /
User-agent: Perplexity-User
Allow: /
# ===== GOOGLE SEARCH: DO NOT BLOCK IF YOU WANT AIO VISIBILITY =====
User-agent: Googlebot
Allow: /
User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /api/
Allow: /
The Google line matters. Blocking Google-Extended does not remove you from AI Overviews. Google says AI features in Search are controlled through Googlebot and standard Search preview controls such as nosnippet, max-snippet, data-nosnippet, and noindex. Google-Extended is for limiting training and grounding in some other Google systems.
Robots.txt is a declaration, not a fence. TollBit measured roughly 30% non-compliance by Q4 2025. Cloudflare responded at internet scale: on July 1, 2025, it changed the default to block AI crawlers unless publishers opt in, and it launched Pay Per Crawl, where crawlers can receive 402 Payment Required responses unless they present payment intent.
If you use Cloudflare, Fastly, Akamai, or your own WAF, the production rule is simple: match user-agent and verified IP range. User-agent alone is spoofable. IP alone is too broad. The conjunction is the control.
There is a cost to blocking everything. Rutgers and Wharton researchers found top-30 news publishers that blocked AI bots saw roughly a 23% total traffic drop and 14% human traffic drop. Raptive found no measurable traffic effect for mid-sized sites. The difference is brand scale: household-name publishers are part of the answer ecosystem; smaller sites usually are not.
The default for most technical operators is therefore: block training, allow retrieval, enforce at the edge, and re-audit quarterly.
Content Extractability: What Crawlers Can Read
A page that renders beautifully in Chrome can be blank to an AI crawler.
Vercel and MERJ analyzed more than 500M GPTBot fetches and found no evidence of JavaScript execution. Capconvert reported GPTBot downloads .js files in 11.5% of requests and ClaudeBot in 23.84%, but downloading is not rendering. The crawler receives the script. It does not run the app.
The practical number to carry is 69%: roughly 69% of AI crawlers tested cannot execute JavaScript.
That creates a split reality:
| Page behavior | Googlebot | GPTBot / OAI-SearchBot / ClaudeBot / PerplexityBot |
|---|---|---|
| Server-rendered article HTML | Readable | Readable |
| Static site generation | Readable | Readable |
| Client-rendered React shell | Usually readable after WRS | Often blank |
| JSON-LD injected by Tag Manager | Often visible to Google | Often invisible |
| Content behind JS-gated tabs | Risky | Usually invisible |
Test it with curl before you buy tools:
curl -s -A "GPTBot" https://www.example.com/page/ | less
curl -s -A "OAI-SearchBot" https://www.example.com/page/ | less
curl -s -A "ClaudeBot" https://www.example.com/page/ | less
curl -s -A "PerplexityBot" https://www.example.com/page/ | less
If the article body, H2s, tables, and JSON-LD are not in the raw response, you are not doing GEO yet. You are hoping non-rendering crawlers behave like Googlebot. Most do not.
The fix is boring and durable: SSR or SSG. Next.js Server Components, Astro, SvelteKit, Rails, Django, static Markdown pipelines, and server-rendered docs all work. Client islands are fine for calculators and filters. The canonical content must exist before hydration.
Then structure the content for extraction:
| Pattern | Implementation |
|---|---|
| BLUF answer | Put the direct answer in the first 100 to 150 words of a section |
| Question headers | Use H2/H3s that match real prompts |
| One claim per paragraph | Keep citation-worthy claims self-contained |
| Tables for comparisons | Use real Markdown/HTML tables, not images |
| Numbers with sources | Pair each statistic with named attribution |
| Visible dates | Show datePublished and honest dateModified |
| Visible author | Link byline to a real bio page |
Perplexity in particular follows the BLUF rule. It lifts the top of the page. Paragraph four usually loses.
Structured Data: What Schema Actually Does
Schema.org is useful. It is not a magic citation lever.
The strongest controlled study is Ahrefs’ May 2026 analysis of 1,885 pages that added JSON-LD between August 2025 and March 2026. Google AI Overview presence did not rise. It fell 4.6%. AI Mode moved +2.4% and ChatGPT +2.2%, both inside statistical noise.
That result matches Google’s own guidance: there are no special schema.org requirements for AI Overviews or AI Mode. Google says the same SEO fundamentals apply and that structured data should match visible page text.
So why use schema?
Use it for entity clarity, rich-result eligibility, and internal consistency. The important schema types for GEO-adjacent work are Article, NewsArticle, BlogPosting, Person, Organization, Product, Review, BreadcrumbList, and sometimes Dataset.
The highest-value property is sameAs. It ties your Person or Organization node to canonical references such as Wikidata, Wikipedia, LinkedIn, GitHub, ORCID, or verified profiles. It does not “submit” an entity to Google’s Knowledge Graph. It reduces ambiguity.
A clean author node looks like this:
{
"@context": "https://schema.org",
"@type": "Person",
"@id": "https://example.com/authors/jane-doe#person",
"name": "Jane Doe",
"jobTitle": "Senior AI Search Analyst",
"worksFor": {
"@type": "Organization",
"@id": "https://example.com/#organization"
},
"knowsAbout": [
"Generative Engine Optimization",
"Information Retrieval",
"AI Search"
],
"sameAs": [
"https://www.wikidata.org/wiki/Q000",
"https://www.linkedin.com/in/janedoe",
"https://orcid.org/0000-0000-0000-0000"
]
}
Attribute density beats schema presence. Capconvert reported 61.7% citation rate for rich Product + Review schema versus 41.6% for generic schema. Pages with no schema were cited 59.8% of the time in that study, which is the warning: sparse boilerplate can underperform doing nothing.
Also update your FAQ assumptions. FAQPage rich results were fully retired on May 7, 2026. FAQPage remains valid vocabulary and can still be parsed, but it no longer earns the old Google accordion treatment. Keep it only where the FAQ is visible and useful. Do not invent Q&A blocks for markup.
Freshness And Authority
Freshness is not a substitute for authority. It is the tiebreaker that often decides which authoritative page gets cited.
Perplexity exposes the signal most clearly. Its Sonar API has recency filters and separate publication-date and last-updated filters. SE Ranking’s 216,524-page study estimated freshness at roughly 44.2% of Perplexity ranking weight. Georion reported 76.4% of Perplexity citations going to pages updated within 30 days; that is a vendor figure without a fully public methodology, so treat the exact number as directional.
Still, the implementation is straightforward:
- Show a visible
datePublished. - Show a visible and honest
dateModified. - Emit matching dates in Article schema.
- Update pages when facts, product names, crawlers, APIs, or measurements change.
- Keep a changelog for pillar pages where readers need to trust the update history.
The cadence that works for competitive GEO pages is 30 days for volatile topics and 90 days for stable evergreen. Do not fake freshness. Engines and users can detect a page where only the timestamp changed.
Authority remains the heavier layer. Ahrefs found only 12% average overlap between AI citations and Google top-10 results across engines, but Perplexity was the most Google-aligned at 28.6%. That means conventional SEO equity transfers unevenly, not never.
The GEO16 framework names freshness, passage match, and page quality as top pillars. That maps to the field evidence: a fresh page with weak authority might get crawled, but a fresh page with strong entity signals, clear authorship, original data, and clean passage structure gets cited.
Llms.txt: The Honest Verdict
Publish llms.txt if you have documentation, an API, an SDK, or agent-facing content. Do not expect it to lift AI answer citations.
That is the cleanest reading of the evidence. Profound, iPullRank, Search Engine Land, Trakkr, SE Ranking, OtterlyAI, and Thomas Peham’s GEO experiments all converge on the same answer: no controlled test has found a reliable citation-rate lift from adding llms.txt.
The file is often misunderstood because it answers a different question.
| File | Question it answers | Standard status |
|---|---|---|
robots.txt |
May this crawler fetch this path? | RFC 9309 |
sitemap.xml |
What URLs exist, and when did they change? | Established search standard |
llms.txt |
What content matters most for an LLM or agent? | 2024 proposal |
llms-full.txt |
Can I load a whole curated corpus into context? | Community convention |
For answer engines, llms.txt is neutral. For coding agents and RAG pipelines, it is useful. Cloudflare, Anthropic, Stripe, Cursor, and many documentation platforms publish it because agents can fetch a clean Markdown map faster than crawling a whole docs site.
A sane llms.txt for a technical publication is short:
# Example Research Library
> Independent technical analysis for senior AI engineers and operators.
## Core Guides
- Generative Engine Optimization: The 2026 Playbook
- AI Share of Voice: Measurement Framework
- AI Crawler Control: Robots.txt and Edge Enforcement
## References
- AI crawler inventory
- Schema.org implementation notes
- llms.txt policy note
The anti-pattern is treating llms.txt as a ranking spell. It is not. Ship it because it costs little and helps intentional agents. Keep it out of your citation forecast.
Getting Cited By Perplexity
Perplexity is the easiest major engine to inspect because citations are central to the product. It is also the easiest to misread.
The arXiv 2604.25707 study measured Perplexity at 16.35 sources per prompt and 0.0646 absorption per citation. That means it distributes citations widely, but each citation has low influence on the final answer. ChatGPT showed the opposite pattern: 6.88 sources and 0.2713 absorption.
Perplexity’s four practical signals are:
| Signal | What to do |
|---|---|
| Freshness | Refresh competitive pages every 30 days; emit dateModified |
| Passage match | Put the direct answer under a question-style H2 |
| Authority | Earn mentions and links from trusted industry sources |
| Specificity | Use numbers, entities, comparisons, and tables |
Perplexity avoids stale, anonymous, thin, and hard-to-fetch content. JS-gated pages lose. Generic affiliate roundups lose. Undated evergreen pages lose when a recent competitor gives the same answer with a timestamp.
The Perplexity checklist is short:
- Allow
PerplexityBotandPerplexity-User. - Verify requests against Perplexity’s published IP JSON.
- Put an answer capsule above the fold.
- Use
dateModified. - Add author and
sameAs. - Include at least 10 specific entities or numbers in long-form pages.
- Re-query the target prompts in a clean session after the next crawl.
One caution: Georion’s reported 4.1x citation lift for definitive above-the-fold answers is vendor data without public methodology. The pattern is right. The multiplier is not yet a law.
Getting Cited By Google AI Overviews
Google’s public guidance is deliberately conservative: there are no special optimizations, machine-readable files, or schema requirements for AI Overviews or AI Mode. The best practices are the same Search fundamentals: allow crawling, make content findable, ensure important text is textual, match structured data to visible content, and use Search Console.
But the empirical layer adds nuance. Ahrefs found 76% of AI Overview citations come from outside the top 10 organic results. That means AI Overviews are not simply a decorated SERP. They use Google’s index, but retrieval and synthesis can surface different URLs.
AI Mode differs from AI Overviews in interface and query behavior. AI Overviews appear in standard Search results for selected queries. AI Mode is a more conversational, fan-out search experience. Both are Google Search AI features, and Google’s control layer remains Googlebot plus standard preview controls.
For Google, the playbook is:
- Do not block Googlebot.
- Do not assume Google-Extended controls AI Overviews.
- Keep critical content in text, not images or scripts.
- Use
nosnippet,max-snippet, anddata-nosnippetonly when you understand the citation tradeoff. - Keep schema accurate, but do not expect JSON-LD to lift AIO citations.
- Build pages that answer sub-queries, not just the head query.
The last point matters because Google’s AI systems often decompose a user query into multiple related searches. A page that answers one precise sub-question can be cited even when it does not rank for the broader head term.
Getting Cited By ChatGPT
ChatGPT search visibility runs through OpenAI’s crawler split. OpenAI says OAI-SearchBot is used to surface websites in ChatGPT search features. Sites opted out of OAI-SearchBot will not be shown in ChatGPT search answers, though they may still appear as navigational links. GPTBot is separate and may be used for training.
So the core ChatGPT control is:
User-agent: OAI-SearchBot
Allow: /
User-agent: GPTBot
Disallow: /
ChatGPT cites fewer sources than Perplexity, but each source carries more of the generated answer. That is the operational meaning of 6.88 sources and 0.2713 absorption in arXiv 2604.25707. For ChatGPT, being one of six sources matters more than being one of sixteen in Perplexity.
ChatGPT tends to reward pages that are:
- crawlable by OAI-SearchBot;
- available as raw HTML;
- written in clean explanatory prose;
- backed by source links and named data;
- strong on entity authority;
- not blocked by CDN bot rules.
When we debug ChatGPT absence, the failure is usually one of four things: OAI-SearchBot blocked, content rendered client-side, no page matching the prompt intent, or insufficient authority relative to sources already cited.
The fix is not adding a paragraph that says “ChatGPT should cite this.” The fix is a page that answers the exact prompt better than the cited competitor, in HTML the crawler can read, on a domain the retriever trusts.
Measuring AI Share Of Voice
AI share of voice is the weighted share of citations and mentions your brand receives across a representative prompt library.
The formula we use is:
SOV(brand) = avg over prompts of [avg over engines of
(sum over citations k of c(k,brand) / r_k)]
Here c(k,brand) is 1 when citation k belongs to the brand, and r_k is a position-decay term. A 1/log(1+k) curve is usually more stable than 1/k because it does not make every citation after the first feel worthless.
Track two metrics:
| Metric | Meaning |
|---|---|
| Citation SOV | Linked citation presence only |
| Combined SOV | Citations plus unlinked brand mentions |
Citation SOV is stricter. Combined SOV captures model awareness. Do not blend them without labeling.
A defensible prompt library has 100 to 500 prompts across five strata:
| Stratum | Example |
|---|---|
| Branded | “Is Acme good for enterprise AI observability?” |
| Category | “Best AI observability tools for regulated teams” |
| Comparison | “Acme vs LangSmith vs Arize” |
| Problem-led | “How do I detect hallucinations in production agents?” |
| Long-tail | Real sales, support, and Search Console queries |
Run each prompt at least 3 times per engine. LLM answers are non-deterministic, and one run is a sample of one. Report medians and interquartile ranges. If you have 100 prompts and true SOV around 5%, expect error bands around ±5 to 7 points unless you expand the sample.
Google AI Overviews is the measurement exception. As of June 2026, it has no public API. ChatGPT, Perplexity, Gemini, and Claude expose programmatic citation paths. AI Overviews requires browser-based sampling or a vendor that handles it.
The highest-value output is the discovery-gap audit: every prompt where a competitor is cited and you are absent. For each gap, save the prompt, engine, cited competitor URL, cited passage, your closest page, and the suspected cause.
Causes usually collapse into six buckets:
| Gap cause | Fix |
|---|---|
| No matching page | Create a focused answer page |
| Weak passage match | Rewrite the relevant section answer-first |
| Stale page | Update data and dateModified |
| JS invisibility | SSR or SSG the content |
| Low authority | Earn external mentions and links |
| Blocked crawler | Fix robots/WAF rules |
Tools can help: Profound, Otterly, Ahrefs Brand Radar, AthenaHQ, Scrunch, Peec, and open-source trackers all solve parts of this. The tool matters less than keeping the prompt set fixed and archiving raw outputs.
Content Patterns That Earn Citations
The reliable pattern is not “write for AI.” It is “make the evidence easy to lift.”
A good citation paragraph has four properties:
- It answers the question directly.
- It contains a specific number, named entity, or date.
- It links or attributes the source.
- It stands alone outside the rest of the article.
Weak:
AI crawlers are becoming more important, and many websites need to think about how their content appears to them.
Strong:
Vercel and MERJ found no evidence of JavaScript execution across more than 500M GPTBot fetches, which means client-rendered article bodies can be invisible to ChatGPT search even when Googlebot indexes them.
Use definitions as quotable primitives:
Generative engine optimization is the practice of making content retrievable, extractable, and citable by AI answer engines.
Use comparison tables whenever a user would naturally compare options. Engines cite structured comparisons because they reduce synthesis work.
Use original data where possible. The original GEO paper reported large gains from adding citations, quotations, and statistics, with maximum lifts around 40% on its PAWC metric. Later work such as C-SEO Bench makes the effect size less certain in production, but the direction remains useful: numbers and sources make passages easier to ground.
For long-form pages, aim for at least 10 specific entities or numbers. Past roughly 19, returns appear to diminish in vendor studies, but the broader point is simpler: generic prose loses to concrete evidence.
The GEO Audit Checklist
Here is the complete checklist. Work through it, and your score updates live with prioritized recommendations for what to fix next.
GEO Audit Checklist
Check off what you have done. Your score updates live, with prioritized recommendations for what to fix next.
0 of 50 weight points
Crawler Access
robots.txt & edge rules
Content Extractability
static HTML & structure
Structured Data
schema.org & entities
Freshness & Authority
dates, authors, sameAs
Measurement
AI share of voice
Content Patterns
quotable, citable writing
Your top 5 priorities
Use the checklist as an operating review, not a one-time launch task. GEO decays when crawlers change, source preferences shift, models update, and your “fresh” pages age out of the retrieval set.
What This Means For You
Run GEO as a 30-day operating rhythm.
This week, fix access and visibility. Deploy the robots split. Allow OAI-SearchBot, Claude-SearchBot, PerplexityBot, and relevant user fetchers. Block GPTBot, ClaudeBot, CCBot, Bytespider, and other training crawlers if your policy is training opt-out. Verify with logs, not assumptions. Run curl -A against your top 20 pages and confirm the article body is in raw HTML.
This month, fix the pages that already matter. Take your top 20 commercial or strategic pages and add answer-first sections, current dates, visible authors, source-backed statistics, comparison tables, and accurate Article/Person/Organization schema. Publish llms.txt if you have docs, but do not spend a sprint hand-curating it for citation lift.
Then measure. Build 100 to 300 prompts from Search Console, sales calls, support tickets, and known comparison queries. Run them across ChatGPT, Perplexity, Gemini, Claude, Copilot, and sampled AI Overviews. Create a discovery-gap report. Assign the top 10 gaps to content, engineering, or authority work.
The discipline is not complicated. It is just cross-functional. Engineering owns fetch and render. Editorial owns extractability and evidence. Growth owns SOV measurement. Leadership owns the policy call on training crawler access.
The old SEO dashboard told you where you ranked. The GEO dashboard tells you whether the answer engines know you enough to cite you.
Sources
- Ahrefs Brand Radar and AI visibility data
- Ahrefs: AI search overlap study
- Ahrefs: schema and AI citations study
- OpenAI: Overview of OpenAI crawlers
- Perplexity: Perplexity Crawlers
- Google Search Central: AI features and your website
- Google Search Central: structured data introduction
- Google Search Central: FAQPage structured data
- Cloudflare: Content Independence Day
- Cloudflare: What is Pay Per Crawl?
- Cloudflare: Perplexity stealth crawler report
- Vercel: The rise of the AI crawler
- SparkToro: 2024 zero-click search study
- Gartner: search volume will drop 25% by 2026
- GEO: Generative Engine Optimization, arXiv 2311.09735
- From Citation Selection to Citation Absorption, arXiv 2604.25707
- GEO16 framework, arXiv 2509.10762
- C-SEO Bench, arXiv 2506.11097
- llms.txt proposal
- Jeremy Howard: llms.txt proposal
- RFC 9309: Robots Exclusion Protocol
- Schema.org sameAs
- Perplexity Sonar filters
- Perplexity Sonar API docs
- Gemini API grounding with Google Search
- Anthropic citations documentation
- Profound: AI platform citation patterns
- OtterlyAI: llms.txt experiment
- Search Engine Journal: llms.txt for AI SEO
- Seer Interactive: ChatGPT conversion learnings
