What is generative engine optimization?

Generative engine optimization is the practice of making your content retrievable, extractable, and citable by AI answer engines such as ChatGPT, Perplexity, Gemini, and Google AI Overviews. It overlaps with SEO, but the optimized unit changes from a ranked page to a cited passage.

How is GEO different from SEO?

SEO optimizes pages for rankings and clicks. GEO optimizes passages for retrieval, grounding, and citation inside synthesized answers. The technical foundation still matters, but the final competition happens at the chunk and citation layer.

Should I block GPTBot?

Most publishers should block GPTBot for training while allowing OAI-SearchBot for ChatGPT search visibility. OpenAI documents these controls as independent, so you can opt out of model-training ingestion without opting out of ChatGPT search citations.

Does llms.txt improve AI citations?

No controlled test has shown a citation-rate lift from llms.txt as of June 2026. It is still useful for agent infrastructure, coding tools, and documentation retrieval, but it should not be sold as a ChatGPT, Perplexity, or AI Overview ranking lever.

Does schema.org help AI engines cite pages?

Schema helps machines understand entities and qualify pages for rich results, but the best controlled evidence does not show a citation lift. Ahrefs tracked 1,885 pages that added JSON-LD and found AI Overview presence dropped 4.6%, while ChatGPT and AI Mode changes stayed inside noise.

How do you measure AI share of voice?

Build a fixed prompt library, run each prompt across engines, capture citations and mentions, then weight appearances by citation position and prompt importance. Report citation SOV and combined SOV separately, and use discovery gaps to decide what content to fix next.

The GEO Playbook: Getting Cited by AI Engines

Ahrefs found the number that explains why GEO exists: 76% of Google AI Overview citations come from pages outside the top 10 organic results.

That breaks the old operating model. A page can lose the classic SERP and still win the answer. A page can rank first and never appear in the paragraph the buyer actually reads.

Generative engine optimization is the discipline that sits in that gap. It is not SEO with new labels. It is the work of making a page crawlable by AI retrieval bots, readable without JavaScript, easy to chunk into answer passages, trusted enough to rerank, and measurable across ChatGPT, Perplexity, Google AI Overviews, Gemini, Claude, and Copilot.

The citation is now the scarce unit. Treat it that way.

TL;DR

GEO is the operating layer for AI citations. The workflow is: allow retrieval crawlers, control training crawlers, ship static HTML, write answer-first passages, keep pages fresh, use schema for entity clarity, ignore unsupported llms.txt ranking claims, and measure AI share of voice weekly.

Do not optimize for “AI” as one surface. ChatGPT, Perplexity, AI Overviews, and Gemini use different retrieval systems, crawler rules, and citation formats.

Do not over-credit on-page tricks. The original GEO paper found strong lifts from citations, quotations, and statistics, but later controlled work shows domain authority, passage match, freshness, and page quality dominate.

Do not bury the answer. Engines retrieve chunks, not essays. Paragraph four rarely wins when paragraph one from a competitor gives the direct answer, the number, and the source.

Do measure. A GEO program without a prompt library is opinion management. A GEO program with 100 to 500 prompts, citation SOV, and discovery-gap reports becomes an operating system.

Key Takeaways

AI answers are already mainstream. ChatGPT reached 900M weekly active users in February 2026. Google AI Overviews reached 2B monthly users. Gemini passed 750M monthly active users.
Classic rankings are not enough. Ahrefs found 76% of AI Overview citations come from URLs outside the organic top 10.
JavaScript is still a crawlability failure. Vercel and related testing found roughly 69% of AI crawlers cannot execute JavaScript, including GPTBot, OAI-SearchBot, ClaudeBot, and PerplexityBot.
Crawler intent matters. GPTBot crawl volume fell 87% in 2025 while OAI-SearchBot rose 312%, per Etavrian, because publishers learned to block training and allow retrieval.
Robots.txt is not enforcement. TollBit measured roughly 30% of AI bot requests ignoring robots.txt by Q4 2025.
Perplexity is broad but shallow. arXiv 2604.25707 measured Perplexity at about 16 sources per prompt with 0.0646 absorption per citation; ChatGPT used 6.88 sources with 0.2713 absorption.
Schema is infrastructure, not magic. Ahrefs’ 1,885-page study found adding JSON-LD did not increase AI Overview citations; presence fell 4.6%.

AI Overview citations from outside Google top 10

Why GEO Matters Now

The scale is no longer speculative. ChatGPT at 900M weekly active users is a distribution layer, not a chatbot curiosity. Google AI Overviews at 2B monthly users means generated answers sit directly in the world’s largest search product. Gemini at 750M monthly active users means Google has a second AI discovery surface outside the traditional SERP.

At the same time, the click supply is shrinking. SparkToro found only 374 of every 1,000 U.S. Google searches end in a click to the open web. Gartner predicted traditional search volume would drop 25% by 2026 as chatbots and agents absorb more queries.

That is the zero-click problem in operational terms: rank tracking can look stable while the actual decision surface moves upstream.

The economics are uneven. Perplexity referrals converted at 10.5% versus 1.76% for Google organic in Seer Interactive’s reported data. That does not mean every AI referral converts six times better. It does mean the clicks that survive citation filtering can be high-intent.

The right conclusion is not “traffic is dead.” It is narrower: traffic is being rationed by citation systems. If you are not cited, you are often not considered.

How AI Engines Decide What To Cite

Every major grounded answer engine runs a version of the same pipeline:

Discover URLs.
Fetch pages.
Extract readable text.
Chunk the page into passages.
Embed or otherwise represent those chunks.
Index them.
Retrieve candidates for a user prompt.
Rerank candidates for authority, match, freshness, and safety.
Ground the generated answer in selected passages.
Render citations.

GEO work maps to that pipeline. Robots.txt affects fetch. Static HTML affects extraction. Answer-first structure affects chunk quality. Freshness and authority affect reranking. Tables, statistics, and source links affect grounding. AI share of voice measures the citation layer.

The engines differ in the details.

Engine	Retrieval pattern	Citation behavior	Operational note
ChatGPT	OAI-SearchBot for search index; ChatGPT-User for user-triggered fetches	Fewer sources, higher per-source absorption	Allow OAI-SearchBot if you want ChatGPT search visibility
Perplexity	PerplexityBot index plus Perplexity-User live fetches	Many sources, low per-source absorption	Freshness and passage match are unusually important
Google AI Overviews	Googlebot and Google Search index	Broad, SEO-shaped source pool	Google says no special schema or AI file is required
Gemini	Google ecosystem plus grounding APIs	Depends on product surface	Google-Extended controls training/grounding in some Google systems, not AI Overviews
Claude	Claude-SearchBot and Claude-User where available	Strong citation metadata in API contexts	Anthropic split training, search, and user fetchers in 2026

The central shift from SEO is the unit of value. SEO optimizes the page. GEO optimizes the passage. A 3,000-word article does not get cited. A 70-word chunk inside it does.

That is why the best GEO content reads almost blunt at the paragraph level: claim first, number second, source third, implication fourth.

Crawler Access: The Robots.txt Split

The first production mistake is treating all AI bots the same.

Training crawlers ingest content for model training. Retrieval crawlers fetch or index content so a product can cite it. User-action fetchers retrieve a URL because a person asked an assistant to inspect it. Those jobs have different economics.

OpenAI documents the split clearly. OAI-SearchBot is used to surface websites in ChatGPT search results. GPTBot is used to crawl content that may be used in training foundation models. ChatGPT-User is triggered by user actions and is not used for automatic search indexing.

That lets you make the obvious operator choice: allow retrieval, block training.

When we run this split for publisher and B2B sites, the verification step is always server logs. After a deploy, we expect OAI-SearchBot and PerplexityBot to keep returning 200s. We expect GPTBot and ClaudeBot to fall to 403 at the edge or stop at robots.txt if the vendor honors it. If GPTBot traffic keeps coming from verified OpenAI IP ranges after Disallow, the robots update has not propagated. If it keeps coming from non-verified IPs, you have a spoofing problem, not a robots problem.

A current baseline file looks like this:

# ===== TRAINING / INGESTION: BLOCK =====
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: Google-Extended
Disallow: /

# ===== SEARCH / RETRIEVAL: ALLOW =====
User-agent: OAI-SearchBot
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

# ===== USER-ACTION FETCHERS: ALLOW =====
User-agent: ChatGPT-User
Allow: /

User-agent: Claude-User
Allow: /

User-agent: Perplexity-User
Allow: /

# ===== GOOGLE SEARCH: DO NOT BLOCK IF YOU WANT AIO VISIBILITY =====
User-agent: Googlebot
Allow: /

User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /api/
Allow: /

The Google line matters. Blocking Google-Extended does not remove you from AI Overviews. Google says AI features in Search are controlled through Googlebot and standard Search preview controls such as nosnippet, max-snippet, data-nosnippet, and noindex. Google-Extended is for limiting training and grounding in some other Google systems.

Robots.txt is a declaration, not a fence. TollBit measured roughly 30% non-compliance by Q4 2025. Cloudflare responded at internet scale: on July 1, 2025, it changed the default to block AI crawlers unless publishers opt in, and it launched Pay Per Crawl, where crawlers can receive 402 Payment Required responses unless they present payment intent.

AI bot requests ignoring robots.txt

If you use Cloudflare, Fastly, Akamai, or your own WAF, the production rule is simple: match user-agent and verified IP range. User-agent alone is spoofable. IP alone is too broad. The conjunction is the control.

There is a cost to blocking everything. Rutgers and Wharton researchers found top-30 news publishers that blocked AI bots saw roughly a 23% total traffic drop and 14% human traffic drop. Raptive found no measurable traffic effect for mid-sized sites. The difference is brand scale: household-name publishers are part of the answer ecosystem; smaller sites usually are not.

The default for most technical operators is therefore: block training, allow retrieval, enforce at the edge, and re-audit quarterly.

Content Extractability: What Crawlers Can Read

A page that renders beautifully in Chrome can be blank to an AI crawler.

Vercel and MERJ analyzed more than 500M GPTBot fetches and found no evidence of JavaScript execution. Capconvert reported GPTBot downloads .js files in 11.5% of requests and ClaudeBot in 23.84%, but downloading is not rendering. The crawler receives the script. It does not run the app.

The practical number to carry is 69%: roughly 69% of AI crawlers tested cannot execute JavaScript.

AI crawler JavaScript capability

That creates a split reality:

Page behavior	Googlebot	GPTBot / OAI-SearchBot / ClaudeBot / PerplexityBot
Server-rendered article HTML	Readable	Readable
Static site generation	Readable	Readable
Client-rendered React shell	Usually readable after WRS	Often blank
JSON-LD injected by Tag Manager	Often visible to Google	Often invisible
Content behind JS-gated tabs	Risky	Usually invisible

Test it with curl before you buy tools:

bash

curl -s -A "GPTBot" https://www.example.com/page/ | less
curl -s -A "OAI-SearchBot" https://www.example.com/page/ | less
curl -s -A "ClaudeBot" https://www.example.com/page/ | less
curl -s -A "PerplexityBot" https://www.example.com/page/ | less

If the article body, H2s, tables, and JSON-LD are not in the raw response, you are not doing GEO yet. You are hoping non-rendering crawlers behave like Googlebot. Most do not.

The fix is boring and durable: SSR or SSG. Next.js Server Components, Astro, SvelteKit, Rails, Django, static Markdown pipelines, and server-rendered docs all work. Client islands are fine for calculators and filters. The canonical content must exist before hydration.

Then structure the content for extraction:

Pattern	Implementation
BLUF answer	Put the direct answer in the first 100 to 150 words of a section
Question headers	Use H2/H3s that match real prompts
One claim per paragraph	Keep citation-worthy claims self-contained
Tables for comparisons	Use real Markdown/HTML tables, not images
Numbers with sources	Pair each statistic with named attribution
Visible dates	Show `datePublished` and honest `dateModified`
Visible author	Link byline to a real bio page

Perplexity in particular follows the BLUF rule. It lifts the top of the page. Paragraph four usually loses.

Structured Data: What Schema Actually Does

Schema.org is useful. It is not a magic citation lever.

The strongest controlled study is Ahrefs’ May 2026 analysis of 1,885 pages that added JSON-LD between August 2025 and March 2026. Google AI Overview presence did not rise. It fell 4.6%. AI Mode moved +2.4% and ChatGPT +2.2%, both inside statistical noise.

That result matches Google’s own guidance: there are no special schema.org requirements for AI Overviews or AI Mode. Google says the same SEO fundamentals apply and that structured data should match visible page text.

So why use schema?

Use it for entity clarity, rich-result eligibility, and internal consistency. The important schema types for GEO-adjacent work are Article, NewsArticle, BlogPosting, Person, Organization, Product, Review, BreadcrumbList, and sometimes Dataset.

The highest-value property is sameAs. It ties your Person or Organization node to canonical references such as Wikidata, Wikipedia, LinkedIn, GitHub, ORCID, or verified profiles. It does not “submit” an entity to Google’s Knowledge Graph. It reduces ambiguity.

A clean author node looks like this:

json

{
  "@context": "https://schema.org",
  "@type": "Person",
  "@id": "https://example.com/authors/jane-doe#person",
  "name": "Jane Doe",
  "jobTitle": "Senior AI Search Analyst",
  "worksFor": {
    "@type": "Organization",
    "@id": "https://example.com/#organization"
  },
  "knowsAbout": [
    "Generative Engine Optimization",
    "Information Retrieval",
    "AI Search"
  ],
  "sameAs": [
    "https://www.wikidata.org/wiki/Q000",
    "https://www.linkedin.com/in/janedoe",
    "https://orcid.org/0000-0000-0000-0000"
  ]
}

Attribute density beats schema presence. Capconvert reported 61.7% citation rate for rich Product + Review schema versus 41.6% for generic schema. Pages with no schema were cited 59.8% of the time in that study, which is the warning: sparse boilerplate can underperform doing nothing.

AI citation rate by schema richness

Also update your FAQ assumptions. FAQPage rich results were fully retired on May 7, 2026. FAQPage remains valid vocabulary and can still be parsed, but it no longer earns the old Google accordion treatment. Keep it only where the FAQ is visible and useful. Do not invent Q&A blocks for markup.

Freshness And Authority

Freshness is not a substitute for authority. It is the tiebreaker that often decides which authoritative page gets cited.

Perplexity exposes the signal most clearly. Its Sonar API has recency filters and separate publication-date and last-updated filters. SE Ranking’s 216,524-page study estimated freshness at roughly 44.2% of Perplexity ranking weight. Georion reported 76.4% of Perplexity citations going to pages updated within 30 days; that is a vendor figure without a fully public methodology, so treat the exact number as directional.

Still, the implementation is straightforward:

Show a visible datePublished.
Show a visible and honest dateModified.
Emit matching dates in Article schema.
Update pages when facts, product names, crawlers, APIs, or measurements change.
Keep a changelog for pillar pages where readers need to trust the update history.

The cadence that works for competitive GEO pages is 30 days for volatile topics and 90 days for stable evergreen. Do not fake freshness. Engines and users can detect a page where only the timestamp changed.

Authority remains the heavier layer. Ahrefs found only 12% average overlap between AI citations and Google top-10 results across engines, but Perplexity was the most Google-aligned at 28.6%. That means conventional SEO equity transfers unevenly, not never.

The GEO16 framework names freshness, passage match, and page quality as top pillars. That maps to the field evidence: a fresh page with weak authority might get crawled, but a fresh page with strong entity signals, clear authorship, original data, and clean passage structure gets cited.

Llms.txt: The Honest Verdict

Publish llms.txt if you have documentation, an API, an SDK, or agent-facing content. Do not expect it to lift AI answer citations.

That is the cleanest reading of the evidence. Profound, iPullRank, Search Engine Land, Trakkr, SE Ranking, OtterlyAI, and Thomas Peham’s GEO experiments all converge on the same answer: no controlled test has found a reliable citation-rate lift from adding llms.txt.

The file is often misunderstood because it answers a different question.

File	Question it answers	Standard status
`robots.txt`	May this crawler fetch this path?	RFC 9309
`sitemap.xml`	What URLs exist, and when did they change?	Established search standard
`llms.txt`	What content matters most for an LLM or agent?	2024 proposal
`llms-full.txt`	Can I load a whole curated corpus into context?	Community convention

For answer engines, llms.txt is neutral. For coding agents and RAG pipelines, it is useful. Cloudflare, Anthropic, Stripe, Cursor, and many documentation platforms publish it because agents can fetch a clean Markdown map faster than crawling a whole docs site.

A sane llms.txt for a technical publication is short:

markdown

# Example Research Library

> Independent technical analysis for senior AI engineers and operators.

## Core Guides
- Generative Engine Optimization: The 2026 Playbook
- AI Share of Voice: Measurement Framework
- AI Crawler Control: Robots.txt and Edge Enforcement

## References
- AI crawler inventory
- Schema.org implementation notes
- llms.txt policy note

The anti-pattern is treating llms.txt as a ranking spell. It is not. Ship it because it costs little and helps intentional agents. Keep it out of your citation forecast.

Getting Cited By Perplexity

Perplexity is the easiest major engine to inspect because citations are central to the product. It is also the easiest to misread.

The arXiv 2604.25707 study measured Perplexity at 16.35 sources per prompt and 0.0646 absorption per citation. That means it distributes citations widely, but each citation has low influence on the final answer. ChatGPT showed the opposite pattern: 6.88 sources and 0.2713 absorption.

Citation breadth and absorption by engine

Perplexity’s four practical signals are:

Signal	What to do
Freshness	Refresh competitive pages every 30 days; emit `dateModified`
Passage match	Put the direct answer under a question-style H2
Authority	Earn mentions and links from trusted industry sources
Specificity	Use numbers, entities, comparisons, and tables

Perplexity avoids stale, anonymous, thin, and hard-to-fetch content. JS-gated pages lose. Generic affiliate roundups lose. Undated evergreen pages lose when a recent competitor gives the same answer with a timestamp.

The Perplexity checklist is short:

Allow PerplexityBot and Perplexity-User.
Verify requests against Perplexity’s published IP JSON.
Put an answer capsule above the fold.
Use dateModified.
Add author and sameAs.
Include at least 10 specific entities or numbers in long-form pages.
Re-query the target prompts in a clean session after the next crawl.

One caution: Georion’s reported 4.1x citation lift for definitive above-the-fold answers is vendor data without public methodology. The pattern is right. The multiplier is not yet a law.

Getting Cited By Google AI Overviews

Google’s public guidance is deliberately conservative: there are no special optimizations, machine-readable files, or schema requirements for AI Overviews or AI Mode. The best practices are the same Search fundamentals: allow crawling, make content findable, ensure important text is textual, match structured data to visible content, and use Search Console.

But the empirical layer adds nuance. Ahrefs found 76% of AI Overview citations come from outside the top 10 organic results. That means AI Overviews are not simply a decorated SERP. They use Google’s index, but retrieval and synthesis can surface different URLs.

AI Mode differs from AI Overviews in interface and query behavior. AI Overviews appear in standard Search results for selected queries. AI Mode is a more conversational, fan-out search experience. Both are Google Search AI features, and Google’s control layer remains Googlebot plus standard preview controls.

For Google, the playbook is:

Do not block Googlebot.
Do not assume Google-Extended controls AI Overviews.
Keep critical content in text, not images or scripts.
Use nosnippet, max-snippet, and data-nosnippet only when you understand the citation tradeoff.
Keep schema accurate, but do not expect JSON-LD to lift AIO citations.
Build pages that answer sub-queries, not just the head query.

The last point matters because Google’s AI systems often decompose a user query into multiple related searches. A page that answers one precise sub-question can be cited even when it does not rank for the broader head term.

Getting Cited By ChatGPT

ChatGPT search visibility runs through OpenAI’s crawler split. OpenAI says OAI-SearchBot is used to surface websites in ChatGPT search features. Sites opted out of OAI-SearchBot will not be shown in ChatGPT search answers, though they may still appear as navigational links. GPTBot is separate and may be used for training.

So the core ChatGPT control is:

User-agent: OAI-SearchBot
Allow: /

User-agent: GPTBot
Disallow: /

ChatGPT cites fewer sources than Perplexity, but each source carries more of the generated answer. That is the operational meaning of 6.88 sources and 0.2713 absorption in arXiv 2604.25707. For ChatGPT, being one of six sources matters more than being one of sixteen in Perplexity.

ChatGPT tends to reward pages that are:

crawlable by OAI-SearchBot;
available as raw HTML;
written in clean explanatory prose;
backed by source links and named data;
strong on entity authority;
not blocked by CDN bot rules.

When we debug ChatGPT absence, the failure is usually one of four things: OAI-SearchBot blocked, content rendered client-side, no page matching the prompt intent, or insufficient authority relative to sources already cited.

The fix is not adding a paragraph that says “ChatGPT should cite this.” The fix is a page that answers the exact prompt better than the cited competitor, in HTML the crawler can read, on a domain the retriever trusts.

Measuring AI Share Of Voice

AI share of voice is the weighted share of citations and mentions your brand receives across a representative prompt library.

The formula we use is:

text

SOV(brand) = avg over prompts of [avg over engines of
             (sum over citations k of c(k,brand) / r_k)]

Here c(k,brand) is 1 when citation k belongs to the brand, and r_k is a position-decay term. A 1/log(1+k) curve is usually more stable than 1/k because it does not make every citation after the first feel worthless.

Track two metrics:

Metric	Meaning
Citation SOV	Linked citation presence only
Combined SOV	Citations plus unlinked brand mentions

Citation SOV is stricter. Combined SOV captures model awareness. Do not blend them without labeling.

A defensible prompt library has 100 to 500 prompts across five strata:

Stratum	Example
Branded	“Is Acme good for enterprise AI observability?”
Category	“Best AI observability tools for regulated teams”
Comparison	“Acme vs LangSmith vs Arize”
Problem-led	“How do I detect hallucinations in production agents?”
Long-tail	Real sales, support, and Search Console queries

Run each prompt at least 3 times per engine. LLM answers are non-deterministic, and one run is a sample of one. Report medians and interquartile ranges. If you have 100 prompts and true SOV around 5%, expect error bands around ±5 to 7 points unless you expand the sample.

Google AI Overviews is the measurement exception. As of June 2026, it has no public API. ChatGPT, Perplexity, Gemini, and Claude expose programmatic citation paths. AI Overviews requires browser-based sampling or a vendor that handles it.

The highest-value output is the discovery-gap audit: every prompt where a competitor is cited and you are absent. For each gap, save the prompt, engine, cited competitor URL, cited passage, your closest page, and the suspected cause.

Causes usually collapse into six buckets:

Gap cause	Fix
No matching page	Create a focused answer page
Weak passage match	Rewrite the relevant section answer-first
Stale page	Update data and `dateModified`
JS invisibility	SSR or SSG the content
Low authority	Earn external mentions and links
Blocked crawler	Fix robots/WAF rules

Tools can help: Profound, Otterly, Ahrefs Brand Radar, AthenaHQ, Scrunch, Peec, and open-source trackers all solve parts of this. The tool matters less than keeping the prompt set fixed and archiving raw outputs.

Content Patterns That Earn Citations

The reliable pattern is not “write for AI.” It is “make the evidence easy to lift.”

A good citation paragraph has four properties:

It answers the question directly.
It contains a specific number, named entity, or date.
It links or attributes the source.
It stands alone outside the rest of the article.

Weak:

AI crawlers are becoming more important, and many websites need to think about how their content appears to them.

Strong:

Vercel and MERJ found no evidence of JavaScript execution across more than 500M GPTBot fetches, which means client-rendered article bodies can be invisible to ChatGPT search even when Googlebot indexes them.

Use definitions as quotable primitives:

Generative engine optimization is the practice of making content retrievable, extractable, and citable by AI answer engines.

Use comparison tables whenever a user would naturally compare options. Engines cite structured comparisons because they reduce synthesis work.

Use original data where possible. The original GEO paper reported large gains from adding citations, quotations, and statistics, with maximum lifts around 40% on its PAWC metric. Later work such as C-SEO Bench makes the effect size less certain in production, but the direction remains useful: numbers and sources make passages easier to ground.

For long-form pages, aim for at least 10 specific entities or numbers. Past roughly 19, returns appear to diminish in vendor studies, but the broader point is simpler: generic prose loses to concrete evidence.

The GEO Audit Checklist

Here is the complete checklist. Work through it, and your score updates live with prioritized recommendations for what to fix next.

Use the checklist as an operating review, not a one-time launch task. GEO decays when crawlers change, source preferences shift, models update, and your “fresh” pages age out of the retrieval set.

What This Means For You

Run GEO as a 30-day operating rhythm.

This week, fix access and visibility. Deploy the robots split. Allow OAI-SearchBot, Claude-SearchBot, PerplexityBot, and relevant user fetchers. Block GPTBot, ClaudeBot, CCBot, Bytespider, and other training crawlers if your policy is training opt-out. Verify with logs, not assumptions. Run curl -A against your top 20 pages and confirm the article body is in raw HTML.

This month, fix the pages that already matter. Take your top 20 commercial or strategic pages and add answer-first sections, current dates, visible authors, source-backed statistics, comparison tables, and accurate Article/Person/Organization schema. Publish llms.txt if you have docs, but do not spend a sprint hand-curating it for citation lift.

Then measure. Build 100 to 300 prompts from Search Console, sales calls, support tickets, and known comparison queries. Run them across ChatGPT, Perplexity, Gemini, Claude, Copilot, and sampled AI Overviews. Create a discovery-gap report. Assign the top 10 gaps to content, engineering, or authority work.

The discipline is not complicated. It is just cross-functional. Engineering owns fetch and render. Editorial owns extractability and evidence. Growth owns SOV measurement. Leadership owns the policy call on training crawler access.

The old SEO dashboard told you where you ranked. The GEO dashboard tells you whether the answer engines know you enough to cite you.

The GEO Playbook: Getting Cited by AI Engines

TL;DR

Key Takeaways

Why GEO Matters Now

How AI Engines Decide What To Cite

Crawler Access: The Robots.txt Split

Content Extractability: What Crawlers Can Read

Structured Data: What Schema Actually Does

Freshness And Authority

Llms.txt: The Honest Verdict

Getting Cited By Perplexity

Getting Cited By Google AI Overviews

Getting Cited By ChatGPT

Measuring AI Share Of Voice

Content Patterns That Earn Citations

The GEO Audit Checklist

GEO Audit Checklist

Crawler Access

Content Extractability

Structured Data

Freshness & Authority

Measurement

Content Patterns

Your top 5 priorities

What This Means For You

Sources

Frequently asked questions