By Saurav | Founder of saavos | Building in public toward $10k MRR
[!TLDR] If your AI chatbot is saying "I don't know" more than 20% of the time, the model is almost never the problem. The three real causes: source gaps (the answer is on a page you didn't train), chunk size (the page is trained but the relevant paragraph got split away from its context), and fallback wording (the bot is actually answering wrong, not refusing). All three are fixable without upgrading your plan or switching platforms. Here is how to diagnose which one you have.
I built saavos on a site that used to get the same 12 questions every week. After I trained the bot, about 8 of those 12 got handled correctly. Four kept coming back to me. Three of those four were not model failures. Two were source gaps. One was a chunking problem. The model was fine.
This is the most common pattern I see in chatbot logs. The bot is not dumb. It just doesn't have the right information, or the right information got mangled on the way in.
Here's how to figure out which.
Most platforms export conversation logs as CSV or JSON. saavos does this from the dashboard. Pull every response where the bot said something like:
Group those by question topic. You want 20–30 examples. If you have fewer than 20, you don't have enough data to diagnose anything useful yet. Give it another two weeks of traffic.
Once you have your list, the diagnosis is straightforward.
The bot can only answer questions from content it was trained on. If a visitor asks "Do you offer annual billing?" and the answer is on your pricing page but you only trained the bot on your homepage and FAQ — it cannot answer. It doesn't know what it doesn't know, so it falls back to "I don't have that information."
How to check: Take one "I don't know" question from your logs. Copy the exact phrase a visitor used. Go to your bot's training sources. Search for the answer yourself. If you find it in under 30 seconds on a page that is NOT in your training sources, that's a source gap. Every single one.
The fix is unglamorous. Add the missing pages to your training set. This takes about 5 minutes per page, and it resolves the majority of "I don't know" complaints without touching the model, the prompt, or the plan tier.
Common pages that founders forget to add: pricing, changelog/release notes, integration docs, onboarding checklist, FAQ subpages (not just the main FAQ), team/about page (for questions about company background).
For a deeper guide on source selection, the training your chatbot on a PDF knowledge base post covers format-specific quirks.
This one is trickier. Your answer exists in the training set, but the bot still can't find it.
Here's why this happens: when a chatbot ingests a web page or document, it breaks the content into chunks — short segments that get stored as vectors. When a visitor asks a question, the bot retrieves the most relevant chunks and generates a response from them. If the chunk that contains the answer was split in a bad place, the context goes missing.
A practical example: you have a pricing page with a section that says "Our Business plan includes everything in Starter plus white-labeling." If the ingestion split that sentence across two chunks, one chunk says "Our Business plan includes everything" — which is not a complete answer — and the other chunk starts with "plus white-labeling" without the plan name attached. Neither chunk retrieves cleanly for a question like "does the Business plan include white-labeling?"
Pinecone's chunking strategy guide goes into the technical depth on this if you want it. For most hosted chatbot platforms, you can't control chunk size directly. But you can restructure your source content to work better with whatever chunking the platform uses.
The practical fix: rewrite the relevant section so the key fact is self-contained in one paragraph. One question, one answer, one paragraph. Don't rely on context from adjacent paragraphs. If "Business plan includes white-labeling" is important, make a sentence that says exactly that, not one that assumes the reader remembers the previous sentence.
For pages where multiple features are listed, use a table or a clear heading-per-feature structure. Chunked content that has strong headings retrieves better because the heading usually stays attached to the section it belongs to.
This one sounds obvious but it's easy to miss in a bulk log review: sometimes the bot IS answering, just badly enough that the response might as well be a "don't know."
Check your logs for answers that technically mention the topic but give wrong details, cite the wrong plan, or give a hedged non-answer. "I believe we offer some integrations, but you might want to verify with the team" is technically a response. It is not helpful. A visitor reading it will write to you anyway.
This cause is less about source gaps and more about how questions are being matched to source content. If the visitor's phrasing doesn't match your trained content's phrasing, retrieval quality drops. Your pricing page might say "white-label embed" and a visitor asks "can I remove the saavos branding?" Those mean the same thing. The bot might not connect them.
The fix here is two-pronged. First, look at the actual questions in your logs and identify the natural-language patterns visitors use. Then go back to your source content and add those phrases. Not keyword stuffing — just make sure the answer to "can I remove the branding?" appears somewhere near where "white-label" is explained.
Second, improve your fallback message itself. The difference between a bad fallback and a good one is specificity. "I don't have that information — you can reach the team at support@example.com" is better than "I don't have that information." The post on what to do when your chatbot can't answer covers this in more detail.
If you've checked source gaps, fixed chunking issues, reviewed your fallback messages, and you're still getting consistent "I don't know" responses on questions where the trained content is clear and well-structured — then yes, model quality may be a factor.
Specifically: complex multi-hop reasoning ("If I'm on the Business plan and I have 3 team members, what does the API rate limit per seat look like?") requires the bot to synthesize across multiple pieces of content. Smaller, cheaper models struggle with this more than larger ones.
But this is genuinely uncommon below 500 monthly conversations. Most "I don't know" problems are infrastructure, not intelligence.
The AI chatbot evaluation checklist has a section on model quality indicators if you want to know what to look for.
If you have 30 minutes today and want to know your current "I don't know" distribution across these three causes, here's what to do:
Source gaps: add the pages. Chunking issues: restructure those sections with explicit, self-contained sentences or a table. Review again in two weeks.
Most people find that fixing source gaps alone cuts their fallback rate by more than half.
One distinction worth making: "I don't know" is a refusal. Hallucination is the opposite — the bot answers confidently with wrong information. They have different root causes and different fixes. This post is about refusals. The preventing AI chatbot hallucinations post covers the other direction.
Both are worth fixing. But if you have a high fallback rate AND a hallucination problem at the same time, fix the source gaps first. Adding more accurate sources to the training set reduces both problems simultaneously.
After fixing the source gaps and restructuring two pages on my own site, the four questions that kept coming back to my inbox dropped to one. That last one was a legitimate multi-hop reasoning question that I just handle manually now. The bot handles the other three. That's a 75% improvement from two source additions and 20 minutes of content restructuring.
No plan upgrade. No platform switch.
If your chatbot is falling back more than you'd like, the 12 questions to ask before you switch chatbot vendors is a useful pre-switch checklist. In most cases the vendor switch isn't what fixes it.
Get the next post in your inbox
Honest writing on building, embedding, and shipping AI chatbots. No spam. Unsubscribe anytime.
The most common cause is a source gap: the answer exists on your site but on a page that was never added to the bot's training set. The second most common cause is chunking — the page is trained but the relevant paragraph got split from its context during ingestion, so retrieval fails. Model quality is rarely the root cause below 500 monthly conversations. Start by exporting your fallback responses and checking whether the answer exists on a page that is NOT in your training sources.
Export your last 30 days of conversation logs and filter for fallback responses. For each failed question, find the answer yourself on your site. If you find it in under 60 seconds on a page that is NOT in your training sources, that is a source gap — add the page. If the page is already trained but the content is dense without clear headings, restructure that section so the key fact is self-contained in one sentence or table row. Most founders who do this audit find that source gaps account for 60% or more of their fallbacks, and fixing them requires no plan upgrade.
A source gap is when the answer to a visitor's question exists on your website but was never added to the chatbot's training set. The bot has no knowledge of that content, so it falls back to a refusal response. Common missing sources: pricing page, changelog, integration docs, onboarding checklist, FAQ subpages, and the about page. Adding these pages to your training set typically resolves the majority of 'I don't know' responses within a few minutes.
Chunking is the process by which a chatbot platform breaks ingested pages into small text segments for vector storage. When a page gets split at the wrong place, a key fact can end up separated from its context. The fix is to restructure the source content so each key fact is self-contained in a single paragraph or table row, rather than relying on sentence-to-sentence context.
Pull 20 fallback responses from your conversation logs. For each, check whether the answer exists on a trained page. If 12 or more of those 20 cases involve topics where the relevant page was never added to training, you have a source problem, not a model problem. Below 500 monthly conversations, source gaps cause the majority of failures.
A good fallback is specific and routes the visitor to a real next step. 'I don't have that information — you can reach the team at support@yoursite.com' is better than a bare refusal. Even better: 'I don't have that information right now, but the pricing page covers billing questions and the docs cover integrations.' The goal is to give the visitor a route, not just a refusal.
Builds tools for solopreneurs and small SaaS teams who don't have an afternoon to spare.
Paste your URL. Train your bot. Drop one script tag. No credit card.