RAG vs fine-tuning: how we pick for client builds
A decision tree we use when a product needs LLMs: when retrieval wins, when weights need to change, and what it costs in time and money.
Most products don’t need a custom model on day one. They need correct answers from their data with an honest latency and cost envelope.
When RAG is enough
- Your knowledge changes often (docs, policies, tickets, Notion).
- You can tolerate occasional retrieval misses that you fix with better chunking, metadata, and evals.
- You want to ship in weeks, not retrain in a loop.
RAG keeps the model generic; the “memory” lives in a store you control. For many B2B tools, that’s the right default.
When fine-tuning (or domain adaptation) enters the picture
- The model must consistently follow a strict style or schema (e.g. legal phrasing, internal classification taxonomies) that few-shot prompts can’t stabilize.
- You have a stable, reviewable dataset and a process to refresh it.
- You’ve already tightened the eval and RAG is still the wrong shape for the product.
How we actually decide
We prototype with RAG + strong eval questions from the client, then measure. If the failure mode is “doesn’t know our edge cases,” we improve data and retrieval. If the failure mode is “can’t behave like this every time,” we talk fine-tuning or small specialist models.
No dogma - just evidence from your traffic and your risk tolerance.