By Saurav | Founder of saavos | Building in public toward $10k MRR
[!TLDR] A chatbot that takes 2–3 seconds to start responding feels like it's thinking. One that takes 8–15 seconds before the first word appears feels broken — most users close it before it finishes. The difference is almost always streaming vs batch delivery, not model quality. Platforms that show text token-by-token as it generates have a structural UX advantage over ones that wait for a full response before displaying anything. Here's how to check which one you have, what it costs in engagement, and what you can actually do about it.
I launched saavos's first live instance on a personal project site. After a day of real traffic I pulled the conversation logs to see how it was going.
There were about 40 conversations. Twelve of them showed a user typing a question, then nothing — no follow-up, no bot response, just a session end. I was confused. The bot was returning answers. I tested it myself and it worked fine.
Then I realized: I was testing it on my laptop, sitting in front of the server. The real users were mobile visitors hitting a cold API with extra network hops. Some of those conversations had a 12-second gap between question submission and first response character. The users left before the answer came.
That was not a model problem. It was a latency problem.
Here's the rough model for how users experience chatbot response time:
Under 3 seconds: The user sees "typing" indicators and waits. This feels like a human on the other end thinking. Acceptable.
3–8 seconds: The wait starts to feel wrong. Users hover over the close button. Some start re-reading their question, wondering if it was a bad question. Engagement drops but doesn't collapse.
Over 8 seconds with no visible progress: The user assumes the bot is broken, the connection dropped, or the request failed silently. Most close the chat. You don't get a second chance.
These numbers come from the pattern I see in my own conversation logs, not a controlled study. But they match what the general web performance research shows for form submissions — the same psychological threshold where "this is loading" becomes "this is broken." Google's Web Vitals documentation uses similar reference points for interaction-to-next-paint expectations.
The chatbot-specific wrinkle: users forgive slow page loads more easily than slow conversational responses. A form that takes 4 seconds is annoying. A chatbot that takes 4 seconds to reply creates a qualitatively different experience — it breaks the conversational frame. Talking to something that pauses 4 seconds between sentences does not feel like conversation.
Most AI chatbots are calling a large language model API behind the scenes. Those APIs support two delivery modes:
Batch: The model generates the full response, then sends the whole thing at once. You see nothing until the model is done. If the response is 200 words and the model takes 8 seconds to generate it, you stare at a blank chat bubble for 8 seconds, then 200 words appear simultaneously.
Streaming: The model sends tokens as it generates them. You see the first word almost immediately — usually in under a second — and the rest streams in. The total time to a complete response is similar, but the perceived responsiveness is completely different. At 1 second you already see "Based on your pricing page, annual billing..." and the user is reading while the answer finishes.
Streaming is the architectural choice that separates 2-second-feel from 30-second-feel. The model isn't faster. The delivery is.
saavos uses streaming on all paid tiers. The free tier (Claude Haiku) also streams, but Haiku generates tokens faster than Sonnet so the difference is less visible. On Starter and above (Claude Sonnet 4.6), streaming is what makes the response feel immediate even on complex questions where the model takes 6–8 seconds to finish generating a full answer.
A few reasons:
Caching complexity. If you cache bot responses for identical questions, batch is simpler to cache. Streaming requires you to decide whether to stream the cached copy token-by-token (unnecessary) or detect the cache hit and switch to a fast-path batch response. Some platforms take the path of least resistance and batch everything.
Markdown rendering timing. Streaming raw markdown that includes tables or code blocks causes visual glitches if the renderer tries to parse partial markdown mid-stream. Platforms that render rich markdown sometimes delay until they have a complete block. This is solvable but it requires extra engineering. Some platforms skip it.
Cost accounting. A few platforms meter usage by API call, not by token. With streaming, the call starts early and ends later. With batch, the call is synchronous and easier to count. This is a billing-system problem, not a product decision, but it can influence what ships.
If you're evaluating a chatbot platform and wondering whether it streams: open the chat widget, ask a medium-complexity question, and watch the response area. If the first word appears within 2 seconds and the text builds character by character, it's streaming. If nothing happens for 4+ seconds and then a full paragraph appears — it's batching.
Mobile connections introduce latency that desktop tests hide. A test from your laptop on Wi-Fi might show 800ms to first token. The same request from a 4G phone mid-download shows 3–4 seconds. If you're optimizing for mobile visitors (and most SaaS marketing sites are majority mobile), you need to test from a mobile device — or at least throttle your browser's network in devtools to simulate a 3G or slow-4G connection.
Lazy-loading the chat widget matters here too. If your chatbot widget blocks or delays the page load, visitors with slower connections are penalized twice: once on the page load, once on the first response. saavos's embed script loads asynchronously and only initializes after the main page content is interactive. That's not something you can control at the chatbot platform level if it's already handled — but if your Lighthouse score drops after adding a chat widget, it's worth checking.
If you're on a platform that streams: not much to tune. Make sure your system prompt is concise — longer prompts consume the model's initial attention budget before it starts generating, adding 200–500ms to every response. A system prompt over 500 words is measurable.
If you're on a platform that batches: check whether there's a streaming option in the settings. Some platforms ship streaming as a beta toggle. If there's no option and response times are consistently above 8 seconds in real usage — not localhost testing — it's worth switching platforms. The UX impact of 8+ seconds to first response outweighs most other chatbot features.
Test your own setup right now: open your chatbot on a mobile device (or simulate one in browser devtools, throttled to Slow 4G). Ask a 20-word question. Time from send to first visible character. Under 3 seconds: fine. Over 8 seconds: you're losing users who never tell you why.
Response time doesn't get the same blog-post energy as "AI accuracy" or "chatbot pricing." But in terms of whether visitors actually get value from your bot before they close it — latency is the variable that governs everything else. A 90% accurate bot that answers in 12 seconds serves fewer users than a 85% accurate bot that starts streaming in 1.5 seconds. The visitor who leaves at second 8 never sees the accuracy.
If you want to check where saavos falls on the latency curve: the free tier is live, no card required, and you can run a real test on your own site in about five minutes.
Get the next post in your inbox
Honest writing on building, embedding, and shipping AI chatbots. No spam. Unsubscribe anytime.
An AI chatbot should show its first visible text within 2–3 seconds of submission. Under 3 seconds feels conversational and responsive. Between 3 and 8 seconds, users start to disengage. Over 8 seconds with no visible activity, most users assume the chatbot is broken and close it. The most important metric is time to first token — the gap before anything appears — not total response time.
Streaming means the chatbot sends words to the user as the AI model generates them, rather than waiting for the full response to complete before displaying anything. A streaming chatbot shows its first word in under a second. A batch-mode chatbot shows nothing for 6–10 seconds, then delivers the entire answer at once. Both take the same time to finish, but streaming feels dramatically faster because users see progress immediately instead of staring at a blank chat bubble.
The most common causes: (1) the platform uses batch delivery instead of streaming — you see nothing until the model finishes generating; (2) cold API starts — the server spins up fresh for each request, adding 2–4 seconds before the model even starts; (3) a very long system prompt consuming model attention before generation begins; (4) mobile network latency adding 2–3 seconds to what looks fine on Wi-Fi. Test from a mobile device on a real network to see what your visitors experience, not just from your laptop on broadband.
Yes. Users who see no response within 8 seconds typically close the chat and move on — they never see the answer, regardless of its quality. Visitors on mobile connections are disproportionately affected because mobile networks add 1–3 seconds of latency on top of model generation time. A chatbot with 85% accuracy that streams answers in 1.5 seconds serves more visitors than a 90% accurate chatbot that batches answers in 12 seconds, because fewer users wait long enough to read the better answer.
Open your chatbot widget on a real mobile device (not desktop) and submit a typical question. Time from when you tap Send to when the first text appears. Under 3 seconds: good. 3–8 seconds: borderline — check whether your platform supports streaming. Over 8 seconds: investigate immediately. You can also simulate a mobile connection in Chrome devtools: open devtools, Network tab, set throttle to Slow 4G, then test. This catches the gap between your fast localhost testing and what real visitors see.
Builds tools for solopreneurs and small SaaS teams who don't have an afternoon to spare.
Paste your URL. Train your bot. Drop one script tag. No credit card.