AI Visibility

What Is llms.txt? Everything You Need to Know

A close look at how generative answers source their citations, what zero-click search really looks like in 2026, and the editorial decisions that move the needle.

Sona Team
Editorial Team · Apr 21, 2026
 14 min read
 Share

Contents

01   Introduction
02   What changed in AI search
03   The data behind zero-click
04   Why ChatGPT cites pages
05   A playbook for publishers
06   Where this goes next
Check your AI visibility free
See if ChatGPT, Perplexity & Google AIO can find and cite your site.
Run Free Audit

What Our Clients Say

"Really, really impressed with how we're able to get this amazing data ...and action it based upon what that person did is just really incredible."

Josh Carter
Josh Carter
Director of Demand Generation, Pavilion

"The Sona Revenue Growth Platform has been instrumental in the growth of Collective.  The dashboard is our source of truth for CAC and is a key tool in helping us plan our marketing strategy."

Hooman Radfar
Co-founder and CEO, Collective

"The Sona Revenue Growth Platform has been fantastic. With advanced attribution, we’ve been able to better understand our lead source data which has subsequently allowed us to make smarter marketing decisions."

Alan Braverman
Founder and CEO, Textline

An llms.txt file is a plain Markdown document placed at your website's root (`/llms.txt`) that gives AI language models (ChatGPT, Perplexity, Gemini, Claude) a clean, structured summary of your site's most important content. Unlike robots.txt or sitemap.xml, it operates at inference time, guiding AI engines toward the pages that matter most when they construct a response. First proposed by Jeremy Howard in September 2024, it is the fastest-growing new file type in Answer Engine Optimization (AEO).

What exactly is an llms.txt file?

An llms.txt file is a Markdown-formatted text document hosted at the root of your website (`yourdomain.com/llms.txt`) that gives large language models a curated, human- and machine-readable overview of your site's content, without the clutter of HTML.

The file was proposed by Jeremy Howard in September 2024 and is defined at the official spec. A valid llms.txt contains one H1 title, one blockquote summary, and zero or more H2-delimited sections containing links to key pages or Markdown versions of those pages. No XML tags, no directives, no metadata schemas.

Modern websites are layered with JavaScript rendering, cookie banners, navigation menus, ad scripts, and dynamic content that make it hard for LLMs to extract meaningful information at inference time. As Zeo Agency explains, llms.txt strips that complexity away and hands AI systems like ChatGPT, Gemini, and Claude a clean Markdown document they can process accurately.

If robots.txt is a "no trespassing" sign and sitemap.xml is a street map, llms.txt is a curated tour guide written specifically for AI.

Here is a minimal valid example:

Acme SaaS

Acme SaaS is a B2B workflow automation platform for operations teams at mid-market companies. This site includes product documentation, pricing, an API reference, and a blog covering automation best practices.

Docs

Pricing

Blog

The blockquote is the most important element. It is the first thing an LLM reads and sets the interpretive frame for everything that follows.

Why does llms.txt matter for AI search and Answer Engine Optimization (AEO)?

llms.txt gives your website a structured signal that helps LLMs select, summarize, and cite your content over competitors who haven't optimized for AI reading, as AI engines increasingly answer questions directly rather than returning blue-link results.

Googlebot crawls for ranking signals: backlinks, page speed, keyword density. LLMs consume content at inference time for comprehension signals: structure, clarity, authority, and context. A page that ranks first on Google but is buried in JavaScript noise is effectively invisible to an AI constructing a response. According to available data, 60% of Google searches end without a click, meaning ranking first no longer guarantees traffic.

As covered in this AEO explainer, llms.txt tells AI engines which of your pages are authoritative, which are documentation, and which are policies. That reduces the probability that an AI hallucinates or misrepresents your brand.

GitBook's analysis makes the B2B SaaS case directly: documentation-heavy sites gain the most from llms.txt because AI engines are queried about product functionality and API behavior constantly. When a developer asks ChatGPT how to authenticate with your API, the accuracy of that answer depends partly on whether your site has made its documentation readable to AI.

This is why llms.txt is one of 17 checks run by Sona AI Visibility, a free audit tool that tells B2B marketers whether AI engines can discover, read, and cite their site. Based on Sona's data, 3 in 4 websites fail at least one AI readability check. llms.txt absence is one of the fastest fixes available.

Zero-click search is at 60%. If your site isn't structured for AI citation, you're invisible to the channel that's replacing the click. Run a free AI visibility audit to see where you stand.

How does llms.txt compare to robots.txt and sitemap.xml?

robots.txt controls which pages crawlers can access, sitemap.xml tells search engines which URLs exist, and llms.txt explains what your content means and which parts matter most to an AI reading your site to answer a user's question.

The critical distinction is timing. robots.txt and sitemap.xml are crawler-time files. llms.txt is an inference-time file, operating when an LLM is actively constructing a response and needs to understand your site's content hierarchy in seconds.

As the official llmstxt.org spec clarifies, llms.txt complements rather than replaces the other two files: robots.txt handles access permissions, sitemap.xml handles URL discovery, and llms.txt provides curated Markdown overviews and structured content references. Hostinger's comparison puts it plainly: llms.txt explicitly prioritizes valuable content for AI parsing in a way neither robots.txt nor sitemap.xml does.

Some sites also publish a companion `llms-full.txt` containing the complete text of all pages, giving LLMs full context without fetching individual URLs. This is useful for documentation-heavy sites where completeness matters more than curation.

Featurerobots.txtsitemap.xmlllms.txtPrimary purposeControl crawler accessIndex all URLsGuide AI content understandingFormatPlain text directivesXMLMarkdownTarget audienceAll web crawlersSearch engine indexersLarge language models (LLMs)Operating momentCrawl timeCrawl/index timeInference timeContent includedAllow/disallow rulesURL list + metadataCurated page summaries + linksExplains content meaning?NoNoYesSupports AI citation?Indirectly (access)Indirectly (discovery)Directly (context + priority)Formal standard?Yes (RFC 9309)Yes (sitemaps.org)Proposed (llmstxt.org, 2024)

How do I create an llms.txt file?

Creating an llms.txt file requires no special tools: a text editor, knowledge of your site's most important pages, and the Markdown format defined at llmstxt.org.

Step 1: Open a plain text editor. Notepad, VS Code, or any editor that saves plain text works. Do not use a word processor. Formatting artifacts will break the file.

Step 2: Write your H1. This is your site or brand name. Example: `# Acme SaaS`

Step 3: Write your blockquote summary. Write 2-4 sentences describing what your site does and who it serves. Assume the LLM has never heard of your company. Example:

```

Acme SaaS is a B2B workflow automation platform for operations teams at mid-market manufacturing companies. This site includes product documentation, an API reference, pricing information, and a blog covering automation strategy.

```

Step 4: Create H2 sections for content categories. Common categories for B2B SaaS sites include Docs, Pricing, Blog, API Reference, and Legal. Under each heading, list links with brief descriptions.

Step 5: Link to Markdown versions of pages where available. As GitBook recommends, linking to `.md` files rather than HTML pages gives LLMs cleaner parsing and better comprehension. HTML links still work if Markdown versions aren't available.

Step 6: Save as `llms.txt`. The filename is fixed by the spec. Do not name it `llms.md` or `llms-index.txt`.

Step 7: Upload to your root directory. The file must be accessible at `yourdomain.com/llms.txt`.

Here is a realistic B2B SaaS example:

```

Acme SaaS

Acme SaaS automates approval workflows for operations and finance teams at mid-market B2B companies (100-1,000 employees). Products include a no-code workflow builder, a REST API, and native integrations with Salesforce, HubSpot, and NetSuite. Pricing starts at $299/month.

Docs

Pricing

Blog

Legal

```

As Hostinger's implementation guide notes, including page titles, content indexes, and optional site structure descriptions gives AI crawlers the context they need to accurately represent your content.

How do I implement llms.txt on my website?

Place the file at `yourdomain.com/llms.txt`, ensure it's publicly accessible, and verify your robots.txt does not block AI crawlers like GPTBot.

The official spec at llmstxt.org is explicit: `/llms.txt` is the only valid path. Subdirectory paths like `/docs/llms.txt` are not recognized. The file must be Markdown, parseable by standard regex parsers and LLMs alike.

Platform-specific notes:

  • WordPress: Upload via FTP/SFTP to your public root directory. Yoast SEO now includes native llms.txt generation as a built-in feature, removing the manual step entirely.
  • Webflow: Add as a static file in Project Settings under the "Custom Code" or file upload section.
  • Next.js and custom builds: Place the file in the `/public` directory. It will be served at the root path automatically.
  • Mintlify: Native llms.txt generation is built in. As Mintlify's own analysis notes, docs platforms like Mintlify generate the file automatically and recommend pairing it with `llms-full.txt` for complete site text coverage.

Verify your setup:

  • Visit `yourdomain.com/llms.txt` in a browser. It should render as plain text, not trigger a download or return a 404.
  • Confirm GPTBot is not listed under `Disallow` in your `robots.txt`. A blocked GPTBot makes your llms.txt unreachable regardless of how well it's written.
  • Check that all links in your llms.txt resolve to live pages. Broken links actively mislead AI engines.
  • Confirm the blockquote summary accurately describes your current product and audience.

Sona AI Visibility runs a live GPTBot probe as part of its 17-check audit, so you can confirm crawler access alongside your llms.txt validation in a single 30-second scan.

What are the best practices and current standards for llms.txt?

The llms.txt spec is still a proposal rather than a ratified standard, but a clear set of best practices has emerged from early adopters.

Do:

  • Write a blockquote summary that reads like a one-paragraph brief for an AI. Include what you do, who you serve, and what the site contains.
  • Group links into logical H2 sections (Docs, Pricing, Blog, Legal, API) rather than dumping all URLs in a single list. LLMs use section headings as content hierarchy signals.
  • Use descriptive link text. "REST API authentication guide" tells an LLM more than "click here."
  • Link to Markdown versions of pages where available. As a TDS Archive analysis of llms.txt explains, Markdown eliminates layout noise and lets LLMs focus on content structure.
  • Keep the file updated. Zeo Agency's guidance is direct: stale files with broken links or outdated descriptions actively mislead the AI engines you're trying to influence.

Don't:

  • Treat llms.txt as a substitute for well-structured content. It amplifies good content. It cannot rescue thin, unstructured pages.
  • Include every page on your site. A 200-link llms.txt is harder for an LLM to parse than a focused 20-link version.
  • Block GPTBot in robots.txt and expect llms.txt to function. The file is useless if the crawler cannot reach it.
  • Set it and forget it. Product changes, pricing updates, and new documentation should trigger an llms.txt review.

Mintlify's honest assessment is worth reading: llms.txt is genuinely useful for documentation-heavy sites but is not yet a universal standard, and its impact varies by use case. For B2B SaaS companies with API docs, pricing pages, and comparison content, the signal-to-effort ratio is high. The file takes under an hour to implement and costs $0.

Should B2B SaaS companies implement llms.txt now or wait?

Implement it now. Not because it's a proven ranking factor, but because it's a zero-cost, low-effort signal that positions your site for the AI search shift already underway. Early movers consistently capture disproportionate AI citation share before the window closes.

llms.txt is an emerging convention, not a universal standard. That's exactly why early adoption creates competitive advantage: when a convention becomes standard, the differentiation window closes. The practitioners already acting on this are visible in the r/SEO community discussion around llms.txt, where debate has shifted from "is this real?" to "how do we implement it well?"

SaaS companies with documentation, API references, pricing pages, and comparison content benefit most. These are the exact pages AI engines are queried about most often. When a buyer asks Perplexity "what's the best workflow automation tool for mid-market ops teams," the AI pulls from sites it can read accurately. A well-formed llms.txt increases the probability your site is one of them.

Implementation takes under an hour for most sites, costs $0, and requires no ongoing infrastructure beyond periodic updates. The risk of not doing it is the current default state for most websites: invisible to AI engines while competitors optimize.

llms.txt is one piece of a larger puzzle. AI engines also evaluate schema markup, content freshness, named authors, GPTBot access, and structured heading hierarchy when deciding whether to cite a site. All of these signals can be audited in under 30 seconds using Sona AI Visibility, the free tool that runs 17 checks across crawlability, schema markup, content structure, and freshness.

Frequently asked questions

Can you explain what an llms.txt file is and why it's useful?

An llms.txt file is a Markdown document at your website root that gives AI language models (ChatGPT, Perplexity, Gemini, Claude) a clean, structured overview of your site's most important content. It bypasses the HTML noise (navigation, ads, JavaScript) that makes it hard for LLMs to accurately read and cite your site at inference time. Proposed by Jeremy Howard in September 2024, it is the emerging standard for AI-readable website metadata.

How do I set up an llms.txt file for my website to improve AI indexing?

Create a plain Markdown file with three components: (1) an H1 containing your site name, (2) a blockquote summary of what your site does and who it serves, and (3) H2-delimited sections linking to your most important pages with brief descriptions. Save it as `llms.txt` and upload it to your website's root directory so it's accessible at `yourdomain.com/llms.txt`. Ensure your robots.txt does not block AI crawlers like GPTBot, or the file will be unreachable regardless of its quality.

What benefits does using an llms.txt file bring for SEO and AI models?

llms.txt doesn't directly affect traditional SEO rankings. Its primary benefit is Answer Engine Optimization (AEO): it helps AI engines like ChatGPT and Perplexity understand which of your pages are authoritative, what your site covers, and how to accurately summarize and cite your content in AI-generated responses. As zero-click search reaches 60% of all Google queries, being cited by AI engines is increasingly more valuable than ranking first in blue-link results.

Why do AI models like ChatGPT use llms.txt files when accessing website content?

AI models use llms.txt because it gives them a pre-processed, structured summary of a site's content in Markdown, a format they can parse cleanly without rendering JavaScript, stripping navigation elements, or interpreting complex HTML layouts. A well-formed llms.txt reduces ambiguity and increases the likelihood that the AI accurately represents the site's content and cites the right pages.

Could you guide me through the steps to create an effective llms.txt file?

Yes: (1) Open a text editor. (2) Write `# Your Brand Name` as the H1. (3) Add a `> blockquote` summary of 2-4 sentences describing what your site does and who it serves. (4) Create `## Section` headings for content categories like Docs, Pricing, Blog, and API. (5) Under each heading, list links with descriptive anchor text and short descriptions. (6) Optionally link to `.md` (Markdown) versions of your pages for cleaner AI parsing. (7) Save as `llms.txt` and upload to your root directory. Verify the file is live at `yourdomain.com/llms.txt` before considering the implementation complete.

Is llms.txt the same as robots.txt?

No. robots.txt controls which pages web crawlers are allowed to access. It is a permission file operating at crawl time. llms.txt is a content guidance file that tells AI language models what your site contains and which pages matter most, operating at inference time. robots.txt opens the door; llms.txt explains what's inside. If robots.txt blocks GPTBot, your llms.txt will never be read.

Does llms.txt guarantee my site will be cited by ChatGPT or Perplexity?

No. llms.txt is a signal, not a guarantee. AI engines make citation decisions based on multiple factors including content quality, authority, freshness, and schema markup. llms.txt improves your site's readability for AI engines and increases the probability of accurate citation, but it works best as part of a broader AI visibility strategy. Sona AI Visibility audits all 17 AI readability signals at once, showing you where the largest gaps are across crawlability, schema, content structure, and freshness.

How is llms.txt different from a sitemap.xml?

A sitemap.xml is a comprehensive index of all your website's URLs, designed to help search engine crawlers discover pages. An llms.txt is a curated selection of your most important content, written in natural Markdown with descriptions, designed to help AI language models understand your site, not just find it. sitemap.xml answers "what pages exist?"; llms.txt answers "what does this site mean and what should an AI know about it?" Both files serve distinct purposes and should coexist in a complete AI visibility setup.

Last updated: April 2026

Sona Team
Editorial Team

The team behind Sona's research, guides, and AI visibility insights.

#AI Search
#Data & Studies
#Publishing
#SEO
#llmstxt, #AISearch, #GenerativeAI, #B2BSaaS, #SEO, #AEO, #LLMs, #ContentDiscoverability
×