generative engine optimization guide

Llms.txt in mid-2026: does the AI-crawler manifest actually get you cited?

The evidence says no for answer engines, yes for coding agents. Here's how to tell which one you're optimizing for.

June 15, 202610 min read
llms.txtllms-full.txtAI crawler manifest
Llms.txt in mid-2026: does the AI-crawler manifest actually get you cited?

Addingllms.txtto your site will not get you cited more often by ChatGPT, Perplexity, or Google AI Overviews. Four independent experiments in 2025 found no measurable citation lift, and Google's Gary Illyes called the format "at best neutral, similar to the keywords meta tag."

The file went from a September 2024 proposal by Answer.AI's Jeremy Howard to roughly 10% of the top-million domains by early 2026, yet its claimed superpower as an AI crawler manifest never materialized.

That's the bad news for anyone who shippedllms.txtexpecting GEO gains. The good news is more useful: the file is quietly winning a different job entirely.

TL;DR

In mid-2026,llms.txthas no measurable effect on AI answer engine citations or rankings. The "it's dead for SEO" camp is right. But it's a genuinely useful curated context format for coding agents, RAG pipelines, and MCP servers, which is why Anthropic, Cloudflare, Stripe, and Cursor all publish one.

Ship it if you have docs or an API. Don't expect AI Overview traffic from it.

What is llms.txt? It's a Markdown file at your site root that gives LLMs a curated, token-dense map of your most important pages, designed to be read at inference time when a full website won't fit in a context window.

Key takeaways

  • No major answer engine (OpenAI, Perplexity, Google, Anthropic) has confirmed usingllms.txtas a ranking or retrieval signal.
  • Controlled experiments converge on a null result: a Trakkr study returned p=0.85; an SE Ranking 300K-domain study found a +0.4% lift inside the noise band.
  • The real value is agent context: coding tools like Cursor and Claude Code, plus RAG pipelines and MCP servers, consume it directly. -llms-full.txtis a separate community convention, not part of the original spec.
  • It does not replacerobots.txtorsitemap.xml; the three answer different questions.

What is llms.txt and who created it?

llms.txtis a proposed Markdown file, placed at/llms.txt, that hands an LLM a curated index of a site's key content with human-written descriptions. Jeremy Howard, co-founder of Answer.AI and fast.ai and a former Kaggle president, published it on September 3, 2024.

The canonical spec at llmstxt.org frames the motivation plainly: context windows are too small to swallow whole websites, and converting HTML full of nav, ads, and JavaScript into clean text is "difficult and imprecise."

The format is deliberately tiny. A conforming file needs only an H1 title, ideally followed by a blockquote summary, then H2 sections listing links as- [Name](url): note. Howard's FastHTML framework was the reference implementation, and its docs became the template most published examples copy.

It is still a proposal. There's no W3C, IETF, or schema.org recognition, no version numbers, and no governance body.

What is llms-full.txt, and how is it different?

llms-full.txtinlines the full body text of every listed page into a single file, sized to drop into a long-context model in one shot.llms.txtis the curated index;llms-full.txtis the whole corpus.

Here's the wrinkle practitioners get wrong:llms-full.txtis not in the original spec. It grew out of FastHTML's internalllms-ctx-full.txtpattern and was popularized when Mintlify rolled it out platform-wide on November 14, 2024 in collaboration with Anthropic. Treat it as a sibling convention layered on top of Howard's proposal, not part of it. Per-page.mdmirrors (a clean Markdown version of each page at the same URL plus.md) are the part that genuinely is in the original spec.

Does ChatGPT, Perplexity, or Google actually read llms.txt?

No major answer engine has publicly committed to reading/llms.txtfor retrieval or ranking, and experimental tests show no measurable citation effect. The engine-by-engine picture:

Engine Public statement Verdict
OpenAI (ChatGPT, SearchGPT) No mention in GPTBot/OAI-SearchBot docs No statement found
Perplexity No mention in PerplexityBot help center No statement found
Google AI Overviews / Gemini Illyes: "at best neutral, similar to the keywords meta tag" Explicitly declined
Anthropic (Claude) Publishesllms.txtbut hasn't confirmed ClaudeBot consumes it No confirmed use

Google's position is the cleanest evidence in the file. The Illyes comment comparesllms.txtto the long-deprecated<meta name="keywords">tag, which publishers once assumed mattered and engines now ignore. Google's Lighthouse audit even flags a missingllms.txtas a neutral recommendation, not a ranking factor.

For the other three, the silence is informative. OpenAI's bots page lists three crawlers and the metadata they fetch, with no mention of the file.

The experiments all point the same way

  • Trakkr (Q1 2025): 30 domains with and withoutllms.txt; p=0.85, no detectable citation effect.
  • SE Ranking (Q3 2025): 300,000 domains; a +0.4% lift well inside measurement noise.
  • OtterlyAI (Q4 2025): 100 domains over 90 days; sites that added the file saw a 0.1% absolute change in AI Overview citation rates, consistent with random variation.
  • GEO Experiments 2026 (Thomas Peham): "llms.txt presence did not move the needle" was an explicit finding.
Measured AI-citation lift from adding llms.txtSE Ranking (300K domains)0.4%OtterlyAI (100 domains, 90d)0.1%
Measured AI-citation lift from adding llms.txt

Two honest caveats. Citation frequency is the easy thing to measure; whether a model uses your content correctly once retrieved is harder and not well covered by public data. And none of these tests rules out an engine reading the file for some narrow purpose, like seeding a documentation panel, without affecting answer ranking.

So is llms.txt dead, or just misunderstood?

The argument has two camps and they're both right, because they're answering different questions.

For ranking and citation in answer engines, llms.txt is dead. No public commitment, four null experiments, and Google's explicit dismissal. The skeptics, including Search Engine Journal and the Tulabot "technical dead end" post, make a structural point too: the format has no priority metadata, nolastmod, no enforcement, and no validation, so the signal it carries is too weak to move retrieval.

For agent workflows, RAG, and coding-agent context, it's alive and growing. This is where the original SEO framing missed the actual use case.

The strongest evidence in the "alive" column:

  • Coding agents fetch it by default. A widely cited Sourcegraph builder-skill hierarchy ranksllms.txtas the top source: "hand-curated by the project, AI-optimized, token-dense, always current. This is the ideal source when it exists."
  • First-party infrastructure adoption. Cloudflare ships a curated /llms.txt plus per-productllms-full.txt, and its Windsurf agent-setup docs walk through loading docs into an agent.
  • The Mintlify, Anthropic collaboration. Anthropic's docsllms.txt is among the most carefully curated in the ecosystem. The company whose models power most agent workflows both publishes and (informally) consumes the format.
  • RAG and MCP. Practitioners increasingly treatllms.txtas the clean content layer for internal RAG and MCP-based context loading.

One contested claim to flag: whetherllms.txtchanges which pages an engine cites, even if it doesn't change whether your domain gets cited. Optimist case studies report page-level effects, but their methodology is correlational and lacks control groups. The rigorous studies found no page-level effect either. Treat the optimistic claims as directional, not proven.

How is llms.txt different from robots.txt and sitemap.xml?

The three files solve three different problems.robots.txtanswers "may I crawl?",sitemap.xmlanswers "what exists?", andllms.txtanswers "what matters?"

Dimension robots.txt sitemap.xml llms.txt
Standardized 1994; RFC 9309 (2022) 2005, 2006 (sitemaps.org) 2024 (proposal)
Format Plain-text directives XML schema Markdown
Purpose Allow/disallow crawlers Enumerate every URL Curate the key subset
Audience Crawler bots Search crawlers LLMs and agents
Enforcement Voluntary Voluntary None
Typical size Dozens of lines Thousands of URLs 1, 200 lines

A few practical consequences. You can block an engine inrobots.txtand still publishllms.txtfor the ones you allow. The file does not help Google discover pages it would otherwise miss, and it has no effect on crawl budget or indexing rate. As llmtxt.info notes, they're complementary, not redundant.

How to implement llms.txt correctly

Here's a conforming file for an example API docs site:

markdown
# Acme API

> Acme is a payments-as-a-service platform. These docs cover the REST API,
> SDKs (Node, Python, Go), webhooks, and common integration recipes.

## Docs
- Quickstart: first payment in under 10 minutes
- Authentication: API keys, OAuth, rotating credentials
- Webhooks: event types, signatures, replay protection

## SDKs
- Node SDK
- Python SDK

## Optional
- Changelog
- Blog

The H1 is the only required element. The blockquote is the first context an LLM grabs, so make it a real one-paragraph "what is this" frame. The## Optionalsection is a community convention (from Mintlify), not the spec, used to mark genuinely secondary content that consumers can skip when context is tight.

If you'd rather not hand-author it, the toolchain is mature: Mintlify and ReadMe auto-generate it, Firecrawl's/llmstxtendpoint generates one for any URL, and there are plugins for Docusaurus, MkDocs, Nuxt Content, Vite, and WordPress.

Watch for the anti-patterns Ken Imoto catalogued across 30 live files: wrong H1 casing, malformed blockquotes, links pointing at HTML instead of.mdsiblings (which defeats the whole point), embedded crawler directives, and stuffing important pages into## Optional. The biggest is the HTML-vs-.mdmistake: if your links resolve to HTML, the agent gets the same nav-and-ads soup the format was meant to avoid.

What this means for you

Match the effort to the audience. If you're optimizing for answer-engine citations, this file is not your lever; spend that time on the content quality and authority signals the engines actually weight.

If your site is… Do this
Public API / SDK docs Ship it. Hand-curate 30, 80 lines plusllms-full.txt. Agents will use it.
OSS library / framework docs Ship it via a generator. Most toolchains auto-produce one.
B2B SaaS with self-serve docs Auto-generate, then hand-edit the blockquote and sections.
News / blog / marketing De-prioritize. Generate if free; don't hand-curate.
Small marketing site, no docs Skip it. Spend the time onrobots.txtandsitemap.xml.
Enterprise / regulated content Ship it carefully; you control the RAG consumer, so curation pays off.

Concrete next steps: audit whether/llms.txtand/llms-full.txtreturn 200 and parse cleanly. Always use.mdsuffixes on linked pages. Reference the file fromrobots.txtwithAllow: /llms.txtif you want to be explicit. And don't measure success by AI Overview traffic, because it won't move. Measure it by whether the coding agents and RAG pipelines that consume your docs get cleaner, more current context.

Shipllms.txtif you have documentation, an SDK, or an API. Expect it to help machines that read your docs on purpose, not engines that rank them.

Sources

Frequently asked questions

Does llms.txt improve AI citations or rankings?

No. As of mid-2026, no major answer engine has confirmed using llms.txt for ranking or retrieval, and four independent experiments (Trakkr, SE Ranking, OtterlyAI, Peham) found no measurable citation lift. Google's Gary Illyes called it 'at best neutral, similar to the keywords meta tag.'

What is the difference between llms.txt and llms-full.txt?

Llms.txt is a short curated Markdown index of your key pages with one-line descriptions. Llms-full.txt inlines the full body text of every listed page into one file for long-context models. Only llms.txt is in the original spec; llms-full.txt is a community convention popularized by Mintlify.

How is llms.txt different from robots.txt and sitemap.xml?

Robots.txt controls crawler access (may I crawl?), sitemap.xml enumerates every indexable URL (what exists?), and llms.txt curates the subset that matters with human-written descriptions (what matters?). They are complementary, not substitutes.

Should I add llms.txt to my site?

Ship it if you have public API docs, an SDK, or developer documentation, because coding agents and RAG pipelines genuinely use it. Skip the hand-curation effort for marketing and news sites, where the agent value is low and the SEO value is null.

Who created llms.txt and when?

Jeremy Howard, co-founder of Answer.AI and fast.ai, published the proposal on September 3, 2024 at llmstxt.org. FastHTML, his Python web framework, was the reference implementation. It remains a proposal with no W3C or IETF standing.