When building search tools, intelligent assistants, or AI-driven Q&A systems, one of the most foundational decisions you’ll make is how to retrieve relevant content. Most systems historically use keyword-based search—great for basic use cases, but easily confused by natural language or synonyms.
That’s where embedding-based retrieval comes in.
In this guide, I’ll break down:
- The difference between keyword and embedding-based retrieval
- Real-world pros and cons
- A step-by-step implementation using OpenAI and Pinecone
- An alternative local setup using Chroma
Keyword Search vs. Embedding Search
Keyword-Based Retrieval
How it works:
Searches for exact matches between your query and stored content. Works best when both use the same words.
Example:
Query: "What is vector search?"
Returns docs with the exact phrase "vector search"
.
Pros:
- Very fast and low-resource
- Easy to explain why a match was returned
- Great for structured and exact-match data
Cons:
- Doesn’t understand synonyms or phrasing differences
- Fails if the words aren’t an exact match
Embedding-Based Retrieval (Semantic Search)
How it works:
Both queries and documents are converted into dense vectors using machine learning models (like OpenAI’s text-embedding-ada-002
). The system compares their semantic similarity, not just their words.
Example:
Query: "How does semantic search work?"
Returns docs about “meaning-based search” even if the words are different.
Pros:
- Understands intent, not just keywords
- Great for unstructured content and natural queries
- Can surface more relevant results even if phrasing is varied
Cons:
- More computationally intensive
- Results are harder to explain (based on vector math)
- Requires pre-trained models and a vector database
Feature Comparison Table
Feature | Keyword-Based Retrieval | Embedding-Based Retrieval |
---|---|---|
Search Logic | Matches words exactly | Matches by meaning |
Flexibility | Low | High |
Speed | Fast | Slower |
Resource Use | Low | Higher |
Explainability | High | Low |
Best For | Structured search | Chatbots, recommendation, unstructured data |
Common Tools | Elasticsearch, Solr | Pinecone, Chroma, FAISS |
Setting Up Embedding-Based Retrieval
Let’s build a basic semantic search system using:
- OpenAI (
text-embedding-ada-002
) - Pinecone (hosted vector DB)
- Chroma (optional local alternative)
1. Choose Your Tools
Embedding model:
OpenAI’s text-embedding-ada-002
or a local Hugging Face model.
Vector database:
Cloud: Pinecone (scalable, managed)
Local: Chroma (open-source, lightweight)
2. Install Required Libraries
pip install openai pinecone-client chromadb
3. Set API Keys
export OPENAI_API_KEY="your-openai-key"
export PINECONE_API_KEY="your-pinecone-key"
In Python:
import openai
openai.api_key = "your-openai-key"
4. Generate Embeddings
def get_embedding(text):
response = openai.Embedding.create(
input=text,
model="text-embedding-ada-002"
)
return response['data'][0]['embedding']
documents = [
{"id": "1", "text": "This is an introduction to embedding-based search."},
{"id": "2", "text": "Embedding-based retrieval finds similar meanings."},
]
for doc in documents:
doc['embedding'] = get_embedding(doc["text"])
5. Store in Pinecone
import pinecone
pinecone.init(api_key="your-pinecone-key", environment="us-east-1")
index_name = "embeddings-index"
pinecone.create_index(index_name, dimension=1536)
index = pinecone.Index(index_name)
to_upsert = [(doc['id'], doc['embedding'], {"text": doc["text"]}) for doc in documents]
index.upsert(vectors=to_upsert)
6. Perform a Semantic Search
query = "How does semantic search work?"
query_embedding = get_embedding(query)
results = index.query(query_embedding, top_k=5, include_metadata=True)
for match in results["matches"]:
print(f"ID: {match['id']} | Score: {match['score']}")
print(f"Text: {match['metadata']['text']}\n")
Optional: Use Chroma for Local Embedding Search
import chromadb
client = chromadb.Client()
collection = client.create_collection("documents")
for doc in documents:
collection.add(
documents=[doc["text"]],
embeddings=[doc["embedding"]],
ids=[doc["id"]]
)
query_result = collection.query(query_texts=["How does embedding retrieval work?"], n_results=5)
print(query_result)
Evaluate the Results
Once you’re set up:
- Check result relevance
- Tune your
top_k
or switch models if needed - Add keyword filtering for hybrid search
You now have a foundation for building:
- Intelligent assistants
- Internal knowledge base search
- Chatbots that retrieve based on meaning
What’s Next?
You can scale this up to thousands or millions of documents. Consider:
- Crawling blogs, docs, or Notion pages
- Combining embeddings with filters or metadata
- Using hybrid keyword + embedding pipelines for speed and precision
Leave a Reply