AI search engines are now primary entry points for business discovery. Traditional SEO gets you found on Google. AI SEO — also called Generative Engine Optimization (GEO) — gets your business cited inside AI-generated answers. This guide covers the exact steps to make your website readable, discoverable, and citable by AI systems.
An HTML page costs an AI agent ~16,000 tokens to process. The same content in markdown costs ~3,150 — an 80% reduction. AI systems prefer structured, machine-readable content. This guide shows you how to deliver it.
Create llms.txt
llms.txt is the AI equivalent of robots.txt. Proposed by Jeremy Howard (Answer.AI), it tells LLMs what your site is about and where to find structured content. Place it at the root: https://yourdomain.com/llms.txt
# Company Name > One-sentence description of who you are and what you do. ## Core Pages - [Products](https://yourdomain.com/docs/products.md): what you sell - [FAQs](https://yourdomain.com/docs/faqs.md): common questions ## Optional - [Full Content](https://yourdomain.com/llms-full.txt): all content merged
Key rules
- Lead with a
>blockquote — AI uses this to decide relevance before fetching anything else - Link to markdown files, not HTML pages (80% more token-efficient)
- Include an
llms-full.txtfor AI agents that want everything in one fetch - Keep it concise — AI agents parse this before deciding what to retrieve
Update robots.txt for AI Crawlers
Many sites accidentally block AI crawlers with aggressive Disallow rules. Explicitly allow the bots that matter:
| Bot User-Agent | Platform |
|---|---|
GPTBot | ChatGPT (training) |
OAI-SearchBot | ChatGPT (search / RAG) |
PerplexityBot | Perplexity |
Claude-Web | Claude |
anthropic-ai | Anthropic crawlers |
Google-Extended | Gemini training |
cohere-ai | Cohere |
Meta-ExternalAgent | Meta AI |
User-agent: * Allow: / User-agent: GPTBot Allow: / User-agent: PerplexityBot Allow: / Sitemap: https://yourdomain.com/sitemap.xml
Create Markdown Content Files
Create a docs/md/ directory with one file per topic. AI agents retrieve these instead of parsing your HTML.
Recommended files
README.md— company overview, key stats, quick linksproducts.md— what you sell, features, how it worksfaqs.md— Q&A format (AI loves direct FAQ structure)pricing.md— declarative pricing factsintegrations.md— what connects with your platform
Write for AI extraction
- Use
##headings for every major section — AI uses these as chunk boundaries - Write declarative statements: "Bermuda processes 50,000 quotes per month" — not "we're industry-leading"
- Put the most important fact in the first sentence of each section
- Use bullet lists — AI extracts these as structured facts
- Include the canonical URL at the top of each file
Also create llms-full.txt at the root — all markdown files merged into one. AI agents that want a complete picture get it in a single HTTP request.
Create sitemap.xml
Without a sitemap, AI crawlers may miss your content entirely. Include your markdown files alongside HTML pages, and keep <lastmod> current — Perplexity heavily cites content less than one year old.
<url> <loc>https://yourdomain.com/docs/md/products.md</loc> <lastmod>2026-03-17</lastmod> <priority>0.7</priority> </url>
Set Content-Type: text/markdown Headers
AI agents like Claude Code and OpenCode send Accept: text/markdown request headers. Your server should respond with the correct MIME type so they get clean markdown.
Vercel (vercel.json)
{
"headers": [
{
"source": "/(.*\\.md)",
"headers": [
{ "key": "Content-Type", "value": "text/markdown; charset=utf-8" }
]
}
]
}
Cloudflare
Enable Markdown for Agents in the Cloudflare dashboard (Beta, free for Pro+ plans). Cloudflare automatically converts any HTML page to markdown when an AI agent requests it with Accept: text/markdown — no separate markdown files needed.
Nginx
location ~* \.md$ {
add_header Content-Type "text/markdown; charset=utf-8";
}
Add JSON-LD Structured Data
Structured data helps both traditional search engines and AI systems understand your content type, author, and organization. Add to every HTML page:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Organization",
"name": "Your Company",
"url": "https://yourdomain.com",
"description": "What your company does.",
"contactPoint": {
"@type": "ContactPoint",
"telephone": "+1-000-000-0000",
"email": "info@yourdomain.com"
}
}
</script>
For blog and guide pages, also add Article and FAQPage schemas. This page uses both.
Content Strategy for AI Citation
AI systems cite content that is authoritative, specific, and easy to extract. Your website alone is not enough — AI pulls from multiple sources.
Write for RAG, not just readers
RAG (Retrieval-Augmented Generation) is how ChatGPT search and Perplexity work: they search the web live, pull chunks of content, and feed them into the AI model. Your content competes to be retrieved and selected.
- Gets cited: specific numbers, FAQ format, clear section headings, summary sentences, bullet lists
- Gets ignored: vague claims, long unbroken paragraphs, content buried under navigation
Build presence on sources AI cites most
- Wikipedia / Crunchbase / G2 — complete these profiles thoroughly
- LinkedIn — publish thought leadership articles with your target keywords
- Reddit — participate genuinely in relevant communities
- YouTube — create transcribed video content (AI reads transcripts)
- Industry publications — digital PR and editorial mentions carry more weight than backlinks
Use identical company descriptions on LinkedIn, Crunchbase, X, and GitHub. Brand consistency across platforms accelerates AI visibility — some companies see ChatGPT visibility improvements within days.
Quick Checklist
llms.txtat root with company summary and markdown linksllms-full.txtat root with all content mergedrobots.txtexplicitly allowing GPTBot, PerplexityBot, Claude-Websitemap.xmlincluding markdown document URLsdocs/md/directory with topic-specific markdown filesContent-Type: text/markdownheaders configured for.mdfiles- JSON-LD structured data on all key pages
- Consistent brand descriptions on Crunchbase, LinkedIn, G2
- FAQ content written in declarative Q&A format
- Google Analytics segment for AI traffic (ChatGPT, Perplexity, Claude)
Frequently Asked Questions
SEO (Search Engine Optimization) targets Google's ranking algorithm to get blue links. GEO (Generative Engine Optimization) targets AI language models to get cited inside AI-generated answers. Both matter — but GEO requires structured, machine-readable content rather than keyword density.
No. robots.txt controls crawler access (which bots can crawl which pages). llms.txt provides AI-readable content about your site's structure. You need both.
ChatGPT uses RAG (web search) for current information. Allowing GPTBot and OAI-SearchBot in robots.txt, having a sitemap, and providing structured content increases the probability of being retrieved and cited in ChatGPT answers.
RAG-based systems (Perplexity, ChatGPT search) can reflect changes within days to weeks. Foundation model training data has cutoffs — influence there is longer-term and requires external citation building.
AI systems can process HTML, but it costs 5–10x more tokens. Clean HTML with good semantic structure performs better than average, but dedicated markdown files give AI agents the most efficient path to your content.
See This in Action
We applied every step in this guide to thebermuda.us. Explore our implementation:
llms.txt llms-full.txt robots.txt sitemap.xml
Talk to Our Team