What is Generative Engine Optimization (GEO) and how is it different from SEO?

GEO is the practice of structuring your site so AI assistants like ChatGPT, Claude, and Perplexity can find, parse, quote, and recommend it. It is adjacent to SEO but optimizes for citation slots inside an AI-generated answer rather than ranking against a list of blue links. SEO ranks pages; GEO ranks sentences. The work overlaps but the editorial habits differ — GEO rewards crisp factual claims, structured data, and quotable summaries far more than keyword density.

What is llms.txt and where does it go on my website?

llms.txt is a markdown file at the root of your site (https://yourdomain.com/llms.txt, never under a subdirectory) that tells AI agents what your most important pages are, in priority order, with one-sentence descriptions. It was proposed by Jeremy Howard in 2024 and is checked first by every major AI assistant when crawling a domain in 2026. A typical file lists product, blog, and about pages with a one-line summary of each.

Which AI crawlers should I allow in robots.txt?

If you want AI assistants to cite you, allow GPTBot (OpenAI/ChatGPT), ClaudeBot and anthropic-ai (Anthropic), PerplexityBot, Google-Extended (Google AI Overviews and Gemini), Applebot-Extended, CCBot (Common Crawl), and cohere-ai. Most CDN security defaults block these by default. Verify with curl -A "GPTBot/1.0" against your llms.txt — you should get a 200 response, not a 403.

Does AI search optimization require structured data (JSON-LD)?

Strongly recommended in 2026. AI assistants read JSON-LD tags to extract author, date, headline, FAQ pairs, and breadcrumb hierarchy without parsing the rendered HTML. Ship Article (or BlogPosting), FAQPage, and BreadcrumbList schemas on every meaningful page. The biggest mistake is structured data that contradicts the visible content — assistants cross-check, and mismatch is a direct de-rank signal.

How long until my site starts appearing in ChatGPT and Perplexity results?

For a small to mid-size site that ships the five core GEO moves (llms.txt, AI-crawler robots.txt, JSON-LD structured data, quotable sentences, citation-friendly headings), references typically start appearing within 30–60 days and become consistent for top topics by day 90. The lag is mostly crawl frequency rather than ranking lag — once an assistant has indexed you, citation depends on whether your sentences are quotable.

What are the most common GEO mistakes that prevent AI assistants from citing my site?

Three patterns we see repeatedly: (1) accidentally blocking AI crawlers via CDN security defaults — check that GPTBot, ClaudeBot, and PerplexityBot can actually fetch your llms.txt; (2) marketing-heavy copy that does not survive summarization — vague claims like "the most powerful platform" get filtered out, while concrete claims like "answers in under 2 seconds with citations" get quoted; (3) missing or contradictory structured data, which assistants cross-check against visible content.

How to make your website AI-search-ready: a 2026 GEO checklist

By Saurav · saavos

[!TLDR] Generative Engine Optimization (GEO) is the new layer of work that makes your site cite-worthy for ChatGPT browsing, Claude research, Perplexity, and Google AI Overviews. The five things that matter most in 2026: a clean llms.txt at your site root, an AI-crawler-friendly robots.txt, structured data on every meaningful page (Article + FAQPage + BreadcrumbList JSON-LD), short factual sentences agents can quote without rewriting, and explicit citation-friendly headings. Do those five and AI assistants will start referencing you within weeks. Skip them and you stay invisible to the fastest-growing search surface of the decade.

Update 2026-05-18 (post-publish correction): Google's AI Optimization Guide (published 2026-05-15) explicitly states: "You don't need to create new machine readable files, AI text files, markup, or Markdown to appear in generative AI search" and "Structured data isn't required for generative AI search." That directly contradicts items #1 (llms.txt) and #3 (JSON-LD) in the table below, where I called them the highest-leverage GEO moves.

The honest correction: schema and llms.txt still help Googlebot understand your pages, and non-Google LLMs (Perplexity, Claude, ChatGPT) do use llms.txt. Keep them. But per Google's own guide, the actual lever for AI Overview citations is non-commodity content with a unique point of view — something no machine-readable file can manufacture for you. The "must-do" framing in the table has been moderated: items #1 and #3 are useful for page-understanding and non-Google crawlers, not AI Overview entry tickets.

What GEO actually is (and what it isn't)

Generative Engine Optimization is the practice of structuring your site so that AI assistants can find you, parse you, quote you accurately, and recommend you to the user who asked the question. It's adjacent to SEO but the optimization targets are different. SEO ranks against a list of blue links. GEO competes for citation slots inside an AI-generated answer — typically 1 to 5 references, with the top one driving the lion's share of click-through.

Three concrete differences:

The result format is one paragraph, not ten links. AI assistants synthesize. If your content can't be summarized in 30 words, it won't be quoted in 30 words.
The ranking signal is quotability, not just authority. Pages that read like marketing copy get filtered out; pages with crisp factual claims and clear definitions get quoted directly.
The visit pattern is referral-style, not search-style. A user reads "according to saavos.com, RAG is the right choice for X" and sometimes clicks through. The visit volume per query is lower; the intent is much higher.

GEO is not about gaming AI assistants with prompt injections, hidden text, or schema markup that doesn't match content. Anthropic, OpenAI, and Perplexity all explicitly de-rank pages that try this. The work is honest: be quotable, be structured, be cite-worthy.

The 5 highest-leverage GEO moves in 2026

Ranked by how much they move the needle for a small or mid-size site, based on what the GEO research community has converged on as of mid-2026.

Rank	Move	Effort	Effect	Where it lives
1	Publish `llms.txt` at site root	30 min	Largest single signal — directly tells AI agents what to index	`/llms.txt`
2	Allowlist AI crawlers in `robots.txt`	5 min	Prevents accidental opt-out via default deny patterns	`/robots.txt`
3	Article + FAQPage + BreadcrumbList JSON-LD	2–4 hrs	Helps assistants extract Q&A and attribute correctly	Each post
4	Crisp factual sentences with concrete data	Ongoing	Quotability — the actual ranking signal	Body content
5	Citation-friendly H2s and TL;DR boxes	1 hr per post	Agents prefer extracting from labeled sections	Page structure

The first two are one-time and high-leverage. The last three are an editorial habit you build over months.

1. Publish `llms.txt` (the largest single signal)

llms.txt is a markdown file at the root of your site that tells AI agents what your most important pages are, in priority order, with descriptions. It was proposed by Jeremy Howard in 2024 and adoption hit critical mass in 2025 — by 2026, every major AI assistant checks it first when crawling a domain.

Put it at https://example-domain.com/llms.txt (NOT under /static/ or in a subdirectory). Format:

# Your Site Name

> One-line description of what your site does.

## Product

- [Homepage](https://example-domain.com/): Product overview
- [Pricing](https://example-domain.com/pricing): What it costs

## Blog

- [Most important post](https://example-domain.com/blog/post): One-sentence description
- [Second post](https://example-domain.com/blog/another): One-sentence description

## About

- [About the team](https://example-domain.com/about): Who built this

## Optional

- [RSS Feed](https://example-domain.com/blog/rss.xml): Latest updates

Three rules: descriptions should be a single declarative sentence, the order matters (most important first), and update it whenever you ship a new top-level page. AI assistants do not retry hourly — a stale llms.txt will keep recommending stale pages for weeks.

2. Allowlist AI crawlers in `robots.txt`

Most sites running CDN security defaults block AI crawlers without realizing it. The default Cloudflare bot-management ruleset, for example, denies GPTBot, ClaudeBot, PerplexityBot, and Google-Extended unless you explicitly allow them.

Add this block to /robots.txt:

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: CCBot
Allow: /

User-agent: Applebot-Extended
Allow: /

User-agent: cohere-ai
Allow: /

User-agent: anthropic-ai
Allow: /

Whether to allow each is a content-strategy call. If you want AI assistants to cite you, you have to let them in. Disallowing AI crawlers and then wondering why you don't appear in ChatGPT browsing results is the #1 self-inflicted GEO mistake we see.

If your CDN has a separate "AI bot" toggle, flip it to "allow." Verify with curl -A "GPTBot/1.0" https://example-domain.com/llms.txt — you should get a 200, not a 403.

3. Article + FAQPage + BreadcrumbList JSON-LD

Every meaningful page should emit structured data via JSON-LD <script> tags. AI assistants read these to extract author, date, headline, FAQ pairs, and breadcrumb hierarchy without having to parse the rendered HTML.

Three schemas you must ship:

Article (or BlogPosting) — every blog post and content page:

<script type="application/ld+json">
  {
    "@context": "https://schema.org",
    "@type": "BlogPosting",
    "headline": "Post title here",
    "datePublished": "2026-05-05",
    "dateModified": "2026-05-05",
    "author": {
      "@type": "Person",
      "name": "Author Name",
      "url": "https://example-domain.com/about/author"
    },
    "publisher": {
      "@type": "Organization",
      "name": "Your Brand",
      "url": "https://example-domain.com"
    },
    "mainEntityOfPage": "https://example-domain.com/blog/post-slug"
  }
</script>

FAQPage — any post with FAQ-style content:

<script type="application/ld+json">
  {
    "@context": "https://schema.org",
    "@type": "FAQPage",
    "mainEntity": [
      {
        "@type": "Question",
        "name": "How do I do X?",
        "acceptedAnswer": {
          "@type": "Answer",
          "text": "You do X by..."
        }
      }
    ]
  }
</script>

BreadcrumbList — every page that's deeper than the homepage:

<script type="application/ld+json">
  {
    "@context": "https://schema.org",
    "@type": "BreadcrumbList",
    "itemListElement": [
      {
        "@type": "ListItem",
        "position": 1,
        "name": "Home",
        "item": "https://example-domain.com/"
      },
      {
        "@type": "ListItem",
        "position": 2,
        "name": "Blog",
        "item": "https://example-domain.com/blog"
      },
      { "@type": "ListItem", "position": 3, "name": "Post Title" }
    ]
  }
</script>

The single most-cited mistake: structured data that contradicts the visible content. AI assistants cross-check. If your headline claims one thing and the <h1> says another, the page gets de-prioritized.

4. Write quotable sentences

This is the editorial habit, and it's where most sites fail.

A quotable sentence is:

One claim, one number. "Setup takes 5 minutes" beats "setup is fast and easy and you can be up and running in just a few minutes once you've gathered your sources and..."
Subject-verb-object, not passive voice. "Claude indexes your site" beats "your site is indexed by Claude."
Concrete and dated. "In 2026, RAG platforms charge $19–$199/month" beats "pricing varies by platform."
Self-contained. A sentence that requires three paragraphs of context above it to make sense will not get extracted.
Free of marketing puff. "The most powerful AI chatbot ever" gets stripped; "answers with citations from your own docs" gets quoted.

The test: open ChatGPT, paste your H2 as a question, see if any of your sentences survive when summarized. If they all collapse into "this site offers chatbots," you have marketing copy, not quotable content.

5. Citation-friendly headings and TL;DR boxes

AI assistants prefer extracting from labeled sections. Two patterns that compound:

TL;DR / summary at the top. A 50-to-100-word summary at the start of every long post. Use a callout box (> [!TLDR] in markdown is a common convention) so the structure is unmistakable. Many AI extraction pipelines preferentially quote from the first labeled summary on a page.

Question-style H2s and H3s. "How does RAG handle updates?" gets cited more often than "Updates and refresh cadence." When users ask AI assistants questions, the agents look for headings that match the user's phrasing. Mirror likely user queries in your headings.

Bonus pattern: a comparison table near the top with concrete numbers (price, speed, accuracy) is the single most-quoted element across the assistant ecosystem. Tables get extracted verbatim and cited as authoritative reference data.

What does NOT move the GEO needle

A short anti-pattern list, by frequency of wasted effort:

Keyword density. AI assistants don't care how many times "best AI chatbot" appears in your page. They care whether your sentence about it is quotable.
Long-form for length's sake. A 4,000-word post that says less than a 1,500-word post will be cited less. Length is not a signal; density is.
Hidden text or alt-text stuffing. Detected and de-prioritized by every major assistant.
Backlinks alone. Backlinks still matter for SEO, but their GEO weight is a fraction of what they used to be. Quotable content wins more citations than well-linked content.
Schema markup that doesn't match content. As above — assistants cross-check. Mismatch is a direct de-rank signal.

A 30-day GEO rollout for a typical site

Week 1: Audit. Pull your top 20 traffic pages. For each, check (a) is there structured data? (b) does the H1 match a likely user question? (c) is the TL;DR quotable? (d) does robots.txt allow AI crawlers? Most teams find 60–80% of pages need work.

Week 2: Foundations. Ship llms.txt, fix robots.txt, add BreadcrumbList JSON-LD site-wide. These are one-time and unlock everything else.

Week 3: Per-page upgrades. Add Article + FAQPage JSON-LD to every post and high-intent landing page. Rewrite TL;DRs to be 50–100 words with concrete claims.

Week 4: Editorial. Update your style guide so future content ships with quotable sentences, question-style headings, and tables of concrete data. This is the habit that compounds.

After 30 days, monitor: do you start showing up in ChatGPT browsing? Set the bot to "Browse with Bing" mode and ask a question your site should answer. Repeat in Claude (with web search), Perplexity, and Google AI Overviews. By day 60 you should see references; by day 90 they should be consistent for your top topics.

What this looks like done well

saavos is built around exactly this pattern: every page emits Article + FAQPage + BreadcrumbList structured data, our llms.txt advertises every blog post with a one-sentence description, our robots.txt explicitly opts AI crawlers in, and every post ships with a quotable TL;DR. If you want to see how the pieces fit together, the source is open inspectable in your browser's view-source.

If you want a structured list to work through before deploying, the AI chatbot evaluation checklist covers the configuration questions that affect how well a bot performs under GEO-optimized content. And if you're still clarifying what kind of tool saavos actually is before deciding whether it fits your stack, what saavos is not draws the lines clearly.

Preview saavos — paste your URL, get a chatbot that's already optimized for AI citation, no GEO consultant required. Or see our pricing for what each paid tier unlocks.

How to make your website AI-search-ready: a 2026 GEO checklist

What GEO actually is (and what it isn't)

The 5 highest-leverage GEO moves in 2026

1. Publish `llms.txt` (the largest single signal)

2. Allowlist AI crawlers in `robots.txt`

3. Article + FAQPage + BreadcrumbList JSON-LD

4. Write quotable sentences

5. Citation-friendly headings and TL;DR boxes

What does NOT move the GEO needle

A 30-day GEO rollout for a typical site

What this looks like done well

QUESTIONS, already
ANSWERED.

FREE TOOLS YOU CAN use right now.

llms.txt Generator

Schema Markup Generator (JSON-LD)

FAQ Schema Generator

FIVE MINUTES FROM NOW,
YOUR SITE CAN sell itself.

How to make your website AI-search-ready: a 2026 GEO checklist

What GEO actually is (and what it isn't)

The 5 highest-leverage GEO moves in 2026

1. Publish llms.txt (the largest single signal)

2. Allowlist AI crawlers in robots.txt

3. Article + FAQPage + BreadcrumbList JSON-LD

4. Write quotable sentences

5. Citation-friendly headings and TL;DR boxes

What does NOT move the GEO needle

A 30-day GEO rollout for a typical site

What this looks like done well

QUESTIONS, alreadyANSWERED.

FREE TOOLS YOU CAN use right now.

llms.txt Generator

Schema Markup Generator (JSON-LD)

FAQ Schema Generator

FIVE MINUTES FROM NOW,YOUR SITE CAN sell itself.

1. Publish `llms.txt` (the largest single signal)

2. Allowlist AI crawlers in `robots.txt`

QUESTIONS, already
ANSWERED.

FIVE MINUTES FROM NOW,
YOUR SITE CAN sell itself.