Generating Grounded Answers with RAG
The final step in RAG: inject retrieved context into the prompt and instruct Claude to answer only from that context.
Basic RAG Answer Generation
import anthropic
client = anthropic.Anthropic()
def rag_answer(query: str, retrieved_chunks: list[dict]) -> str:
"""Generate an answer grounded in retrieved context."""
# Format context
context_parts = []
for i, chunk in enumerate(retrieved_chunks, 1):
context_parts.append(
f"[Source {i}: {chunk['title']}]\n{chunk['content']}"
)
context = "\n\n---\n\n".join(context_parts)
system_prompt = """You are a precise question-answering assistant.
Answer the user's question using ONLY the provided context.
If the answer is not in the context, say "I don't have enough information to answer that."
Always cite which source(s) you used in your answer."""
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
system=system_prompt,
messages=[{
"role": "user",
"content": f"""Context:
{context}
Question: {query}"""
}],
)
return response.content[0].text
# Complete RAG pipeline
def ask(question: str) -> str:
chunks = retrieve(question, n_results=4)
return rag_answer(question, chunks)
answer = ask("What is the difference between claude-haiku and claude-sonnet?")
print(answer)
Streaming RAG Responses
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
async function* streamRagAnswer(query: string, chunks: Array<{content: string; title: string}>) {
const context = chunks
.map((c, i) => `[Source ${i + 1}: ${c.title}]\n${c.content}`)
.join("\n\n---\n\n");
const stream = await client.messages.stream({
model: "claude-sonnet-4-5",
max_tokens: 1024,
system:
"Answer using only the provided context. Cite sources. If unknown, say so.",
messages: [
{
role: "user",
content: `Context:\n${context}\n\nQuestion: ${query}`,
},
],
});
for await (const chunk of stream) {
if (
chunk.type === "content_block_delta" &&
chunk.delta.type === "text_delta"
) {
yield chunk.delta.text;
}
}
}
// Usage
for await (const token of streamRagAnswer("What is tool use?", retrievedChunks)) {
process.stdout.write(token);
}
Preventing Hallucination
The most important instruction in a RAG system is telling Claude what to do when context is insufficient:
ANTI_HALLUCINATION_PROMPT = """
Rules for answering:
1. Answer ONLY from the provided context. Do not use prior knowledge.
2. If the context does not contain the answer, respond with:
"The provided documents do not contain enough information to answer this question."
3. Quote directly from sources when possible.
4. Cite source numbers at the end of every factual claim: [Source 1], [Source 2].
"""
Explicit grounding instructions are what separate a trustworthy RAG system from one that confidently makes things up.