Embeddings and Vector Databases
Retrieval-Augmented Generation (RAG) allows Claude to answer questions grounded in your own documents — without fine-tuning. The core components are embeddings (numerical representations of text) and a vector database (optimized for similarity search).
What Are Embeddings?
An embedding converts a piece of text into a vector of floating-point numbers that encodes semantic meaning. Texts with similar meanings produce vectors that are close together in high-dimensional space.
import anthropic
import numpy as np
client = anthropic.Anthropic()
def embed(text: str) -> list[float]:
"""Generate an embedding using the Voyage embedding model via Anthropic."""
# In practice, use voyage-3 or voyage-3-lite via the Voyage API
# For demo, we'll show the structure
response = client.beta.messages.create(
model="claude-haiku-4-5",
max_tokens=1,
messages=[{"role": "user", "content": text}],
# Embeddings use a separate endpoint in production
)
# Real embedding call:
# import voyageai
# vo = voyageai.Client()
# return vo.embed([text], model="voyage-3").embeddings[0]
return [0.0] * 1024 # Placeholder
def cosine_similarity(a: list[float], b: list[float]) -> float:
a_arr = np.array(a)
b_arr = np.array(b)
return float(np.dot(a_arr, b_arr) / (np.linalg.norm(a_arr) * np.linalg.norm(b_arr)))
ChromaDB: Local Vector Store
ChromaDB is a developer-friendly vector database that runs locally — ideal for development and small-to-medium datasets.
import chromadb
from chromadb.utils import embedding_functions
import anthropic
# Use a real embedding function
embedding_fn = embedding_functions.SentenceTransformerEmbeddingFunction(
model_name="all-MiniLM-L6-v2"
)
chroma_client = chromadb.Client()
collection = chroma_client.create_collection(
name="knowledge_base",
embedding_function=embedding_fn,
)
# Add documents
documents = [
"Claude is Anthropic's AI assistant, known for being helpful, harmless, and honest.",
"The Anthropic API uses the Messages format with roles: user and assistant.",
"Tool use allows Claude to call external functions and APIs during a conversation.",
"RAG stands for Retrieval-Augmented Generation — it grounds LLM answers in documents.",
]
collection.add(
documents=documents,
ids=[f"doc_{i}" for i in range(len(documents))],
)
# Query
results = collection.query(
query_texts=["How does Claude use external tools?"],
n_results=2,
)
print(results["documents"][0])
# ['Tool use allows Claude to call external functions...', 'Claude is Anthropic\'s AI assistant...']
Pinecone: Production Vector Store
For production with millions of vectors:
from pinecone import Pinecone
pc = Pinecone(api_key="YOUR_PINECONE_API_KEY")
index = pc.Index("knowledge-base")
# Upsert vectors (embeddings must be pre-generated)
index.upsert(vectors=[
{"id": "doc_0", "values": embed("Claude is Anthropic's AI..."), "metadata": {"text": "Claude is..."}},
{"id": "doc_1", "values": embed("Tool use allows Claude..."), "metadata": {"text": "Tool use..."}},
])
# Query
query_vector = embed("How do tools work in Claude?")
results = index.query(vector=query_vector, top_k=3, include_metadata=True)
for match in results["matches"]:
print(f"Score: {match['score']:.3f} | {match['metadata']['text'][:80]}")
Vector databases are optimized for approximate nearest neighbor search — finding the most semantically relevant documents in milliseconds even at scale.