By Saurav | Founder of saavos | Building in public toward $10k MRR
[!TLDR] Semrush studied 11,882 prompts across ChatGPT Search, Google AI Mode, and Perplexity and found three content qualities that correlate most with AI citation: clarity (+32.8%), EEAT signals (+30.6%), and Q&A format (+25.5%). Schema markup, llms.txt, and content chunking do not drive citation — Google said so verbatim on May 15, 2026. This is what I changed about how I write for saavos after reading those findings.
I build saavos — an AI chatbot that trains on your website and answers visitor questions. Part of building this in public is figuring out how to show up when someone asks ChatGPT "what's a good AI chatbot for my SaaS." That question drives real buyer intent, and the answer comes from AI, not a search result page.
So I checked the research.
Semrush published a content study (semrush.com/blog/content-optimization-ai-search-study) covering 11,882 prompts across ChatGPT Search, Google AI Mode, and Perplexity. The study ran July 15–August 6, 2025. They compared 304,805 URLs that got cited by AI systems against 921,614 URLs that ranked on Google but didn't get cited.
That's the kind of sample size that produces directional signal, not just noise.
Five content qualities correlated with citation. In descending order:
The verbatim finding: "Content that leads with clear answers, demonstrates expertise, and uses structured formatting gets cited more often."
That sounds obvious until you look at what they found does NOT move the needle.
Google's AI Optimization Guide (developers.google.com/search/docs/fundamentals/ai-optimization-guide) dropped on May 15, 2026. Three sentences I had to read twice:
"You don't need to create new machine readable files, AI text files, markup, or Markdown to appear in generative AI search."
"There's no requirement to break your content into tiny pieces for AI to better understand it."
"Structured data isn't required for generative AI search, and there's no special schema.org markup you need to add."
I spent real time on my llms.txt file, my schema markup, and my JSON-LD. That time wasn't wasted — those elements still help Google understand the site. But the idea that a well-formed llms.txt is what gets you cited? Google just said no.
The guide's actual direction: "Create the content yourself based on what you know about the topic, and consider what in-depth experience you can bring to your content." The contrast they draw is between commodity content — generic tips based on common knowledge — and non-commodity content: unique expert or experienced takes.
A few months ago I read every "GEO checklist" I could find. Most of them led with llms.txt and schema.org. I spent a Saturday adding Organization schema and cleaning up my JSON-LD. Fine. Good hygiene.
But those posts buried the thing that actually matters: whether you have anything specific to say.
The Semrush clarity finding (+32.8%) is the signal I keep coming back to. "Clarity" in their framework means leading with a direct answer, not burying the point under context. Every blog post I've written where the answer to the actual buyer question is in paragraph seven is a post that loses on this dimension.
EEAT (+30.6%) is harder to game because it's not about formatting. It's about whether there's a real person behind the content, with verifiable credentials, who has actually done the thing they're writing about. On saavos, my EEAT signal is: I'm the founder, I've tested every competitor (with real invoices), and I write from first-person experience. An AI couldn't replicate that post. That's the point.
Q&A format (+25.5%) is the most actionable lever if you're starting from scratch. Every piece of content that has a genuine question-and-answer block becomes quotable. AI systems retrieve content partly by matching a query to an existing answer. If your page literally contains the question the user asked, in the format they'd search, followed by a specific 40-80 word answer — you're a better citation candidate than a page that answers the same question in running prose buried under six subheadings.
After reading the Semrush study and the Google guide, I made three changes to how I write here.
First: every post now has an explicit FAQ section with real buyer questions. Not "what is a chatbot" (no one Googles that for product decisions) but "does Chatbase delete your bot if you don't log in" and "what's the cheapest chatbot that trains on Notion." Those are the questions buyers actually ask.
Second: I killed the passive voice in my opening paragraphs. "Content clarity correlates with citation rate" is a sentence that earns a +32.8% modifier on a 304,805-URL study. That kind of specific opening is what AI systems pull. "In today's AI landscape, many platforms offer various solutions" is what gets ignored.
Third: I started hyperlinking every named external source. The Semrush EEAT data shows that citing trusted sources with proper attribution correlates with a significant visibility improvement. Citing Zendesk without a link is weaker than citing Zendesk's benchmark reports with one. I've been retrofitting this on older posts.
Domain authority is the uncomfortable variable nobody wants to talk about.
Semrush's January 2026 technical study (semrush.com/blog/technical-seo-impact-on-ai-search-study) looked at 5 million cited URLs and found: "Authority Score is the #1 predictor of whether AI search engines will cite content." Sites in the low-authority tier received 0–4 citations. Sites in the top authority tier received 79+.
saavos has a low referring-domain count. I know this. The content quality signals are necessary but not sufficient — they're the foundation, not the whole building. The building also requires backlinks, which come from being cited by real humans on real platforms (forums, newsletters, other blogs).
The path I'm on is: write posts that are genuinely specific and useful (EEAT, clarity, Q&A), get those posts cited in places where the target reader lives (indie hacker forums, SaaS newsletters, X threads about chatbot tools), accumulate the referring domains over time. There's no faster version of this that doesn't involve fabricating signal.
Get the next post in your inbox
Honest writing on building, embedding, and shipping AI chatbots. No spam. Unsubscribe anytime.
According to Google's own AI Optimization Guide (published May 15, 2026), structured data is not required for generative AI search. Semrush's technical study found schema present on ~40% of AI-cited pages, but as correlation — cited pages are generally well-optimized, not cited because of their schema. Adding schema helps with classic Google rich results and page understanding, but it is not a citation lever for AI systems. Source: developers.google.com/search/docs/fundamentals/ai-optimization-guide, confirmed 2026-05-15.
Semrush's study of 11,882 prompts (July–August 2025) found content clarity and summarization correlated +32.8% with AI citation — the highest of any quality measured. In practice: lead with the direct answer, use specific numbers, don't bury the point under context paragraphs. The study covered ChatGPT Search, Google AI Mode, and Perplexity against a 304,805-URL positive sample. Source: semrush.com/blog/content-optimization-ai-search-study/.
Google's AI Optimization Guide states: 'You don't need to create new machine readable files, AI text files, markup, or Markdown to appear in generative AI search.' llms.txt helps AI systems navigate your content — useful for documentation-heavy sites. But it is not a citation lever per Google's own guide. Source: developers.google.com/search/docs/fundamentals/ai-optimization-guide, published 2026-05-15.
Semrush found EEAT signals correlated +30.6% with AI citation rate. In practice: content by a named author with verifiable credentials who writes from documented first-hand experience gets cited more than anonymous or generic content. The clearest EEAT signal for a solo-founder SaaS is writing from actual experience — real numbers, real timelines, honest conflict-of-interest disclosures. Source: semrush.com/blog/content-optimization-ai-search-study/.
Yes — Semrush's study found Q&A format correlated +25.5% with AI citation across ChatGPT Search, Google AI Mode, and Perplexity. If a page contains the exact question a user asked, plus a specific 40–80-word answer, it is a higher-quality retrieval candidate than prose covering the same ground indirectly. Notably, FAQ rich results as a SERP feature were deprecated by Google on May 7, 2026. The citation benefit is about AI retrieval, not SERP accordion display.
Domain authority, per Semrush's 2026 technical study of 5 million cited URLs. Sites with fewer than 200 referring domains received 0–4 AI citations; high-authority sites received 79+. Content quality signals (clarity, EEAT, Q&A format) are necessary but not sufficient — they work within the authority tier your domain is already in. The practical path: write useful, specific content, get cited in communities your buyers read, and accumulate domain authority over time. Source: semrush.com/blog/technical-seo-impact-on-ai-search-study/, January 2026.
Builds tools for solopreneurs and small SaaS teams who don't have an afternoon to spare.
Paste your URL. Train your bot. Drop one script tag. No credit card.