Three letters that show up everywhere in AI conversations — in product demos, vendor pitches, and the sentence "oh, we'll just use RAG for that" — and almost nobody stops to say what they mean. If you've been nodding along, this one's for you.
RAG matters to working professionals more than most jargon, because it's the answer to a frustration you've probably already had: why doesn't this AI tool know anything about my company, my documents, my actual work? RAG is how you fix that — and understanding it changes what you think these tools can do for you.
This piece is part of our Terminology Tamer series, alongside our guides to large language models and AI agents. By the end, you'll be able to define RAG in one sentence, picture how it works, and recognise it in the tools you already use.
The one-sentence answer
RAG — retrieval-augmented generation — is a technique that lets an AI tool look things up in a specific set of documents before it answers, so its response is grounded in that source material rather than only its training.
Read that again with the frustration in mind. A plain language model only knows what it was trained on — a general snapshot of the internet, frozen at some point, with nothing about your business in it. RAG bolts on a step: first go and find the relevant material, then write the answer using it.
The three words even spell out the recipe. Retrieval — find the relevant documents. Augmented — add them to the question. Generation — write the answer from that combined material.
The open-book exam
Here's the mental model that makes it stick.
A plain language model answering from training alone is sitting a closed-book exam. It's relying entirely on what it happened to memorise. For general knowledge that's often fine — but ask about your company's refund policy and it has nothing to go on, so it either admits that or, worse, makes something up.
RAG turns it into an open-book exam. Before answering, the tool is handed the exact pages it needs — your policy document, your product manual, last quarter's report — and told "answer using these." It's still the same capable writer. It just isn't working from memory any more. It's working from the source in front of it.
That's the whole idea. And it's why RAG is the most common way companies make AI genuinely useful on their own information.
🧠 Quick Challenge: Your team wants an AI assistant that can answer staff questions about your 200-page internal HR handbook, accurately and with references. Based on what you've read, what's the best fit?
- A) A plain chatbot, asked the questions directly
- B) A RAG setup that retrieves the relevant handbook sections before answering
- C) Asking staff to read the handbook themselves
Answer: B) A RAG setup. A plain chatbot has never seen your handbook and would guess or hallucinate. RAG retrieves the specific sections that match each question and grounds the answer in them — which is exactly the "open-book exam" we just described, and why it can cite real references.
How RAG actually works
You don't need the engineering, but the shape is worth seeing — it's three steps.
- Retrieve. When you ask a question, the system searches a collection of documents — your files, a knowledge base, a website — and pulls out the passages most relevant to your question.
- Augment. Those passages get added to your question behind the scenes, so the model receives both "here's what was asked" and "here's the relevant source material."
- Generate. The model writes its answer using that supplied material, ideally pointing back to which source each part came from.
The payoff is in step three: because the answer is built from retrieved sources rather than memory, it can be more current, more specific to you, and far easier to trust — you can check it against the source it cited.