By Saurav | Founder of saavos | Building in public toward $10k MRR
[!TLDR] "Training ChatGPT on your website" almost always means RAG (retrieval-augmented generation), not fine-tuning. RAG indexes your site, retrieves the relevant chunks at query time, and feeds them to a frontier model — so the bot answers from your content with citations and updates whenever your site does. Custom GPTs are easier but live inside ChatGPT.com. Fine-tuning teaches style, not facts, and is almost never the right tool for product knowledge. For 95% of teams, RAG over a crawl of your site is the answer.
Five different things, depending on who's asking:
When somebody says "I want to train ChatGPT on our website," they almost always mean #3 — they want a chatbot that knows their content and lives on their site. The confusion is mostly OpenAI's branding fault: it doesn't separate the model (GPT-4o) from the product (ChatGPT.com) from the deployment pattern (API call, Custom GPT, embedded chatbot).
| Approach | What it does | Setup time | Cost (small site) | Updates when site changes | Lives on your site |
|---|---|---|---|---|---|
| Custom GPT (ChatGPT.com) | System prompt + 20 files | 30 min | $20/mo per ChatGPT Plus user | Manual re-upload | No — only inside ChatGPT |
| Fine-tuning a model | Adjusts model weights | 1–7 days | $50–$500 + per-token | Re-train per change | No — needs hosting |
| RAG chatbot on your site | Retrieves at query time | 5 min – 2 wks | $0–$199/mo | Automatic re-index | Yes — embed widget |
These solve different problems. Custom GPTs are good for personal or internal-team use. Fine-tuning teaches a model how to write. RAG teaches a model what your business knows. They're not really alternatives in the way "Postgres or MySQL" are alternatives — and you can run more than one.
Three reasons, in order of importance.
Updates are free. When your pricing page changes, a RAG chatbot reflects it the next time someone asks. Fine-tuned models don't — you re-train. Custom GPTs need a manual file re-upload. For a website that ships changes weekly, the operational overhead of anything except RAG is immediately painful.
Citations are possible. Because RAG retrieves specific chunks, the bot can attach a "source: /pricing" link to every claim. Fine-tuning blends content into the model's weights — there's no longer a per-claim source you can cite. For a public-facing chatbot that earns trust by being verifiable, this matters more than almost anything else.
Hallucinations are reducible, not just hopeable. A well-tuned RAG pipeline with a fallback message ("I don't know — email support@yourbusiness.com") fails visibly when retrieval finds nothing. Fine-tuned models confidently invent things outside their training set, and you'd never know unless you tested every possible question.
When RAG isn't enough, it's rare but real: if your bot needs to imitate a specific writing voice (a customer-service tone unique to your brand) more than it needs current facts, you might add fine-tuning on top of RAG. If your bot needs to do multi-step reasoning that your existing pages don't capture (e.g., "given an X budget, recommend Y plan"), you'll need application logic on top of retrieval. Neither replaces RAG; both layer on.
The mechanics are simpler than the marketing makes them sound. Five steps:
Anything that ingests a URL — saavos, Chatbase, Wonderchat, Botsonic. The DIY route (LangChain or LlamaIndex plus a vector DB) takes 2–8 weeks of engineering, and unless retrieval quality is your competitive moat, it's almost never worth the time. Managed platforms in 2026 are production-grade.
The platform crawls your public pages. For sites under 100 pages this finishes in under a minute. The crawler typically respects robots.txt and follows internal links up to a configurable depth.
Each ~500-token chunk gets converted into a 1,536-dimensional vector (typically via OpenAI's text-embedding-3-small). For a 50-page site this takes 1–2 minutes. The embeddings are stored in the platform's vector database — you don't see them, you don't manage them.
This is the one most people skip. If retrieval finds nothing, what should the bot say? "I'm not sure about that — please email us at support@yourbusiness.com" is infinitely better than letting the model improvise. A visible fail beats an invented answer every time, and visitors respect the honesty.
A single <script> tag before </body>. The page renders first, the widget loads after — zero impact on Largest Contentful Paint or Time to Interactive. On saavos the snippet looks like this:
<script src="https://saavos.com/embed.js" data-bot="your-slug" defer></script>
That's the entire training process for 95% of teams. The complexity that used to live in retrieval pipelines now lives inside the platform.
In rough order of how often we see them:
For RAG on a small-to-medium site:
text-embedding-3-small.For fine-tuning, the math looks different and worse for most teams: $50–$500 to train, then per-token inference forever, plus your own hosting if you're not on OpenAI's hosted fine-tuning. And every site change means re-training.
For a Custom GPT, you pay $20/mo per user for ChatGPT Plus — but only the people you share the GPT with can use it, and they all need their own ChatGPT account. Useless for public customer support.
For paid API access: no, by default. As of 2026, both OpenAI and Anthropic explicitly do not use API-submitted data to train shared models — this is part of their enterprise contract terms and applies to all paid API usage. Free-tier ChatGPT.com conversations are different — those can be used for training unless you opt out in settings.
For platforms in between (saavos, Chatbase, Wonderchat, etc.), you're trusting the platform to pass your data through to the underlying API without retaining it for cross-tenant training. Always check the data processing addendum. saavos stores conversation history in your own dedicated Postgres tables, never feeds it back into model training, and keeps each customer's index isolated from every other tenant.
If you want to "train ChatGPT on your website" for actual customer use — a public support bot, a sales assistant, anything visitors will see — go with RAG via a managed platform. Test saavos, Chatbase, and Wonderchat with your real content; pick the one with the best fallback handling and citation UX for your specific audience. The full evaluation usually takes a Saturday afternoon.
Start free on saavos — paste your URL, get a working chatbot in 5 minutes, no credit card required for the forever-free tier. Or see our pricing for paid-tier specifics when you outgrow the free 50/month.
Get the next post in your inbox
Honest writing on building, embedding, and shipping AI chatbots. No spam. Unsubscribe anytime.
Yes, but the word "train" is misleading. What most teams want is retrieval-augmented generation (RAG): the model itself does not change, but at query time it looks up the relevant chunks of your site and uses them to answer. The reply is grounded in your content with citations. True training (fine-tuning) is a different process that adjusts the model's weights and is rarely the right tool for product knowledge.
Custom GPTs live inside ChatGPT.com — visitors need a paid ChatGPT account to use them and the GPT only updates when you re-upload your files. A RAG chatbot lives on your own site as an embedded widget, anyone can use it without an account, and it auto-updates whenever your source content changes. For public customer-facing use, the embedded chatbot wins on every dimension.
No. Fine-tuning changes the model's underlying weights to teach it a specific style or pattern (legal writing, customer service tone). It is poor for teaching facts because the model still hallucinates and you cannot cite a source. For factual knowledge about your products, pricing, or policies, RAG is the right tool. Fine-tuning sometimes makes sense as a complement to RAG, never as a replacement.
For a managed RAG platform like saavos, Chatbase, or Wonderchat, $0 to $49 per month covers most small sites — including the model inference cost. Embedding your initial 100-page crawl costs around $0.02 in OpenAI fees, usually bundled into the subscription. Fine-tuning is more expensive: $50–$500 to train plus per-token inference. Custom GPTs are $20/month per user for ChatGPT Plus and only work inside ChatGPT.com.
For RAG platforms with auto re-crawl, typically within 24 hours of you publishing the change. Some platforms re-index instantly when you click a refresh button. Fine-tuned models do not update at all without re-training, which makes them poor for fast-changing content like pricing or product specs. Always pick a platform that lets you trigger a manual re-index after a major site update.
For paid API access, no — both OpenAI and Anthropic explicitly do not use API-submitted data to train shared models as of 2026. Free-tier ChatGPT.com conversations are different and can be used for training unless you opt out in settings. Managed platforms like saavos pass your data through the API without retaining it for cross-tenant training; always check each provider's data processing addendum before launch.
Builds tools for solopreneurs and small SaaS teams who don't have an afternoon to spare.
Paste your URL. Train your bot. Drop one script tag. No credit card.