Stage 3 – going beyond keyword search

When building search tools, intelligent assistants, or AI-driven Q&A systems, one of the most foundational decisions you’ll make is how to retrieve relevant content. Most systems historically use keyword-based search—great for basic use cases, but easily confused by natural language or synonyms.

That’s where embedding-based retrieval comes in.

In this guide, I’ll break down:

  • The difference between keyword and embedding-based retrieval
  • Real-world pros and cons
  • A step-by-step implementation using OpenAI and Pinecone
  • An alternative local setup using Chroma

Keyword Search vs. Embedding Search

Keyword-Based Retrieval

How it works:
Searches for exact matches between your query and stored content. Works best when both use the same words.

Example:
Query: "What is vector search?"
Returns docs with the exact phrase "vector search".

Pros:

  • Very fast and low-resource
  • Easy to explain why a match was returned
  • Great for structured and exact-match data

Cons:

  • Doesn’t understand synonyms or phrasing differences
  • Fails if the words aren’t an exact match

Embedding-Based Retrieval (Semantic Search)

How it works:
Both queries and documents are converted into dense vectors using machine learning models (like OpenAI’s text-embedding-ada-002). The system compares their semantic similarity, not just their words.

Example:
Query: "How does semantic search work?"
Returns docs about “meaning-based search” even if the words are different.

Pros:

  • Understands intent, not just keywords
  • Great for unstructured content and natural queries
  • Can surface more relevant results even if phrasing is varied

Cons:

  • More computationally intensive
  • Results are harder to explain (based on vector math)
  • Requires pre-trained models and a vector database

Feature Comparison Table

Feature Keyword-Based Retrieval Embedding-Based Retrieval
Search Logic Matches words exactly Matches by meaning
Flexibility Low High
Speed Fast Slower
Resource Use Low Higher
Explainability High Low
Best For Structured search Chatbots, recommendation, unstructured data
Common Tools Elasticsearch, Solr Pinecone, Chroma, FAISS

Setting Up Embedding-Based Retrieval

Let’s build a basic semantic search system using:

  • OpenAI (text-embedding-ada-002)
  • Pinecone (hosted vector DB)
  • Chroma (optional local alternative)

1. Choose Your Tools

Embedding model:
OpenAI’s text-embedding-ada-002 or a local Hugging Face model.

Vector database:
Cloud: Pinecone (scalable, managed)
Local: Chroma (open-source, lightweight)

2. Install Required Libraries

pip install openai pinecone-client chromadb

3. Set API Keys

export OPENAI_API_KEY="your-openai-key"
export PINECONE_API_KEY="your-pinecone-key"

In Python:

import openai
openai.api_key = "your-openai-key"

4. Generate Embeddings

def get_embedding(text):
    response = openai.Embedding.create(
        input=text,
        model="text-embedding-ada-002"
    )
    return response['data'][0]['embedding']

documents = [
    {"id": "1", "text": "This is an introduction to embedding-based search."},
    {"id": "2", "text": "Embedding-based retrieval finds similar meanings."},
]

for doc in documents:
    doc['embedding'] = get_embedding(doc["text"])

5. Store in Pinecone

import pinecone

pinecone.init(api_key="your-pinecone-key", environment="us-east-1")

index_name = "embeddings-index"
pinecone.create_index(index_name, dimension=1536)

index = pinecone.Index(index_name)

to_upsert = [(doc['id'], doc['embedding'], {"text": doc["text"]}) for doc in documents]
index.upsert(vectors=to_upsert)

6. Perform a Semantic Search

query = "How does semantic search work?"
query_embedding = get_embedding(query)

results = index.query(query_embedding, top_k=5, include_metadata=True)

for match in results["matches"]:
    print(f"ID: {match['id']} | Score: {match['score']}")
    print(f"Text: {match['metadata']['text']}\n")

Optional: Use Chroma for Local Embedding Search

import chromadb

client = chromadb.Client()
collection = client.create_collection("documents")

for doc in documents:
    collection.add(
        documents=[doc["text"]],
        embeddings=[doc["embedding"]],
        ids=[doc["id"]]
    )

query_result = collection.query(query_texts=["How does embedding retrieval work?"], n_results=5)
print(query_result)

Evaluate the Results

Once you’re set up:

  • Check result relevance
  • Tune your top_k or switch models if needed
  • Add keyword filtering for hybrid search

You now have a foundation for building:

  • Intelligent assistants
  • Internal knowledge base search
  • Chatbots that retrieve based on meaning

What’s Next?

You can scale this up to thousands or millions of documents. Consider:

  • Crawling blogs, docs, or Notion pages
  • Combining embeddings with filters or metadata
  • Using hybrid keyword + embedding pipelines for speed and precision

Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *