When to Use Vector Databases
Not every application needs a vector database. Use this decision framework to determine if semantic search adds value to your use case.
Use a Vector Database When
1. Keywords fail to capture intent
If users search for "how to reduce customer churn" but your documents say "strategies for improving retention rates," traditional search fails. Vector databases understand these mean the same thing.
Signs you have this problem:
- Users complain search "doesn't find anything"
- Support teams know answers exist but can't find them
- Same information stored with different terminology across departments
2. You're building RAG (Retrieval-Augmented Generation)
RAG systems need to find relevant context to feed into LLMs. Vector databases excel at retrieving semantically relevant chunks from large document collections.
Common RAG applications:
- Chatbots that answer questions from company documents
- AI assistants that reference internal knowledge bases
- Document Q&A systems
3. You need similarity-based recommendations
Finding "similar" items based on content rather than user behavior patterns.
Examples:
- "Find documents similar to this one"
- "Show products related to what the customer described"
- "Suggest articles on topics this user has read about"
4. You have unstructured data at scale
When you have thousands of documents, images, or other unstructured content that needs to be searchable by meaning.
Do NOT Use a Vector Database When
1. Exact matching is sufficient
If users search for order numbers, product SKUs, or specific names, traditional databases work better and are simpler.
2. Your data is primarily structured
Tabular data with clear columns and relationships belongs in SQL databases. Vector databases add complexity without benefit.
3. You have fewer than 1,000 documents
For small collections, simpler solutions work fine. Consider:
- Full-text search (Elasticsearch, PostgreSQL FTS)
- In-memory embedding comparison
- Simple keyword search
4. Real-time writes are critical
Vector databases optimize for read-heavy workloads. If you need immediate write consistency across nodes, traditional databases may be more appropriate.
Decision Checklist
- Do users struggle to find information using keyword search?
- Is your content described differently across sources?
- Are you building an AI/LLM-powered application?
- Do you need "find similar" functionality?
- Do you have 1,000+ documents or items?
If you checked 3 or more boxes, a vector database likely adds value.
How Vector Databases Work
Vector databases convert your content into embeddings (arrays of numbers) that capture semantic meaning. Similar content produces similar embeddings, enabling search by concept.
The Embedding Process
1. Your Content 2. Embedding Model 3. Vector Database
"reduce churn" → [0.23, -0.45, ...] → Stored & Indexed
"improve retention" → [0.21, -0.47, ...] → Similar vectors cluster
When you search, your query becomes an embedding, and the database finds the closest stored vectors.
Key Concepts
Embeddings: Arrays of 384-1536 numbers representing semantic meaning. Created by models like OpenAI's text-embedding-3-small or open-source alternatives like all-MiniLM-L6-v2.
Similarity metrics: How "closeness" is calculated:
- Cosine similarity: Angle between vectors (most common)
- Euclidean distance: Straight-line distance
- Dot product: For normalized vectors
Indexing algorithms: Enable fast search at scale:
- HNSW: Fast, memory-intensive, most accurate
- IVF: Good balance of speed and memory
- PQ: Compressed vectors, lower memory, less accurate
Connecting to Vector Databases
Here are connection examples for the major vector database platforms.
Pinecone (Managed)
Pinecone is a fully managed service. No infrastructure to maintain.
Installation:
pip install pinecone-client openaiConnection and basic operations:
from pinecone import Pinecone, ServerlessSpec
import openai
# Initialize clients
pc = Pinecone(api_key="YOUR_PINECONE_API_KEY")
openai.api_key = "YOUR_OPENAI_API_KEY"
# Create index (run once)
pc.create_index(
name="my-index",
dimension=1536, # OpenAI embedding dimension
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1")
)
# Connect to index
index = pc.Index("my-index")
# Generate embedding
def get_embedding(text):
response = openai.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
# Upsert documents
documents = [
{"id": "doc1", "text": "How to reduce customer churn"},
{"id": "doc2", "text": "Strategies for improving retention"},
]
vectors = [
{
"id": doc["id"],
"values": get_embedding(doc["text"]),
"metadata": {"text": doc["text"]}
}
for doc in documents
]
index.upsert(vectors=vectors)
# Query
query = "keeping customers from leaving"
query_embedding = get_embedding(query)
results = index.query(
vector=query_embedding,
top_k=5,
include_metadata=True
)
for match in results.matches:
print(f"{match.score:.3f}: {match.metadata['text']}")Weaviate (Self-hosted or Cloud)
Weaviate offers both managed cloud and self-hosted options with built-in vectorization.
Installation:
pip install weaviate-clientConnection and basic operations:
import weaviate
from weaviate.classes.init import Auth
# Connect to Weaviate Cloud
client = weaviate.connect_to_weaviate_cloud(
cluster_url="YOUR_CLUSTER_URL",
auth_credentials=Auth.api_key("YOUR_API_KEY"),
headers={"X-OpenAI-Api-Key": "YOUR_OPENAI_KEY"}
)
# Or connect to local instance
# client = weaviate.connect_to_local()
# Create collection with vectorizer
from weaviate.classes.config import Configure, Property, DataType
client.collections.create(
name="Document",
vectorizer_config=Configure.Vectorizer.text2vec_openai(),
properties=[
Property(name="text", data_type=DataType.TEXT),
Property(name="source", data_type=DataType.TEXT),
]
)
# Add documents (auto-vectorized)
documents = client.collections.get("Document")
documents.data.insert_many([
{"text": "How to reduce customer churn", "source": "support"},
{"text": "Strategies for improving retention", "source": "marketing"},
])
# Query
response = documents.query.near_text(
query="keeping customers from leaving",
limit=5
)
for obj in response.objects:
print(f"{obj.properties['text']}")
client.close()Qdrant (Self-hosted or Cloud)
Qdrant is a high-performance option written in Rust, with advanced filtering capabilities.
Installation:
pip install qdrant-client openaiConnection and basic operations:
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
import openai
# Connect to Qdrant Cloud
client = QdrantClient(
url="YOUR_CLUSTER_URL",
api_key="YOUR_API_KEY"
)
# Or local
# client = QdrantClient("localhost", port=6333)
openai.api_key = "YOUR_OPENAI_API_KEY"
def get_embedding(text):
response = openai.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
# Create collection
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)
# Upsert documents
documents = [
{"id": 1, "text": "How to reduce customer churn"},
{"id": 2, "text": "Strategies for improving retention"},
]
points = [
PointStruct(
id=doc["id"],
vector=get_embedding(doc["text"]),
payload={"text": doc["text"]}
)
for doc in documents
]
client.upsert(collection_name="documents", points=points)
# Query
query_vector = get_embedding("keeping customers from leaving")
results = client.search(
collection_name="documents",
query_vector=query_vector,
limit=5
)
for result in results:
print(f"{result.score:.3f}: {result.payload['text']}")Milvus (Self-hosted)
Milvus is designed for large-scale deployments with billions of vectors.
Installation:
pip install pymilvus openaiConnection and basic operations:
from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType, utility
import openai
# Connect to Milvus
connections.connect("default", host="localhost", port="19530")
openai.api_key = "YOUR_OPENAI_API_KEY"
def get_embedding(text):
response = openai.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
# Define schema
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=1000),
FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=1536)
]
schema = CollectionSchema(fields, description="Document collection")
# Create collection
collection = Collection("documents", schema)
# Create index
index_params = {
"metric_type": "COSINE",
"index_type": "HNSW",
"params": {"M": 16, "efConstruction": 256}
}
collection.create_index("embedding", index_params)
# Insert documents
documents = [
"How to reduce customer churn",
"Strategies for improving retention",
]
embeddings = [get_embedding(doc) for doc in documents]
collection.insert([documents, embeddings])
collection.load()
# Query
query_embedding = get_embedding("keeping customers from leaving")
results = collection.search(
data=[query_embedding],
anns_field="embedding",
param={"metric_type": "COSINE", "params": {"ef": 64}},
limit=5,
output_fields=["text"]
)
for hits in results:
for hit in hits:
print(f"{hit.distance:.3f}: {hit.entity.get('text')}")Use Cases with Examples
Use Case 1: Semantic Document Search
Replace keyword search with meaning-based search across company documents.
Problem: Employees search for "vacation policy" but documents say "PTO guidelines" or "time off procedures."
Solution:
# Index all HR documents
for doc in hr_documents:
vector = get_embedding(doc.content)
index.upsert({
"id": doc.id,
"values": vector,
"metadata": {
"title": doc.title,
"department": "HR",
"content": doc.content[:500] # Preview
}
})
# Search finds semantically similar content
results = index.query(
vector=get_embedding("vacation policy"),
top_k=5,
filter={"department": "HR"}
)Outcome: 70-90% improvement in search relevance.
Use Case 2: RAG for Customer Support
Build a chatbot that answers questions using your knowledge base.
Problem: Support agents spend time searching for answers that exist in documentation.
Solution:
def answer_question(user_question):
# 1. Find relevant context
query_embedding = get_embedding(user_question)
results = index.query(
vector=query_embedding,
top_k=3,
include_metadata=True
)
# 2. Build context from results
context = "\n\n".join([
r.metadata["content"] for r in results.matches
])
# 3. Generate answer with LLM
response = openai.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": f"Answer based on this context:\n{context}"},
{"role": "user", "content": user_question}
]
)
return response.choices[0].message.content, results.matchesOutcome: 40-60% reduction in average handle time.
Use Case 3: Similar Content Recommendations
Recommend related articles, products, or documents.
Problem: Users want to find "more like this" without knowing exact keywords.
Solution:
def find_similar(document_id, exclude_same_category=False):
# Get the document's embedding
doc = index.fetch([document_id])
doc_vector = doc.vectors[document_id].values
doc_category = doc.vectors[document_id].metadata.get("category")
# Find similar documents
filter_condition = None
if exclude_same_category:
filter_condition = {"category": {"$ne": doc_category}}
results = index.query(
vector=doc_vector,
top_k=6, # Extra to exclude self
filter=filter_condition,
include_metadata=True
)
# Exclude the source document
similar = [r for r in results.matches if r.id != document_id][:5]
return similarOutcome: 25-40% increase in content engagement.
Use Case 4: Duplicate Detection
Find semantically similar content that might be duplicates or near-duplicates.
Problem: Different teams create similar documents without knowing others exist.
Solution:
def find_duplicates(threshold=0.95):
duplicates = []
# Get all documents
all_docs = index.list()
for doc_id in all_docs:
doc = index.fetch([doc_id])
vector = doc.vectors[doc_id].values
# Find very similar documents
results = index.query(
vector=vector,
top_k=5,
include_metadata=True
)
for match in results.matches:
if match.id != doc_id and match.score >= threshold:
duplicates.append({
"doc1": doc_id,
"doc2": match.id,
"similarity": match.score
})
return duplicatesOutcome: Identify redundant content, reduce maintenance burden.
Use Case 5: Multi-modal Search
Search images using text descriptions (requires multi-modal embeddings).
Problem: Users want to find product images by describing what they're looking for.
Solution:
from transformers import CLIPProcessor, CLIPModel
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
def get_image_embedding(image):
inputs = processor(images=image, return_tensors="pt")
return model.get_image_features(**inputs)[0].tolist()
def get_text_embedding(text):
inputs = processor(text=text, return_tensors="pt")
return model.get_text_features(**inputs)[0].tolist()
# Index images
for product in products:
image_vector = get_image_embedding(product.image)
index.upsert({
"id": product.id,
"values": image_vector,
"metadata": {"name": product.name, "category": product.category}
})
# Search with text
results = index.query(
vector=get_text_embedding("red summer dress with floral pattern"),
top_k=10
)Choosing the Right Database
| Database | Best For | Deployment | Pricing Model |
|---|---|---|---|
| Pinecone | Production apps, minimal ops | Managed only | Per-vector storage + queries |
| Weaviate | Hybrid search, multi-modal | Cloud or self-hosted | Cloud tiers or free self-hosted |
| Qdrant | Advanced filtering, performance | Cloud or self-hosted | Cloud tiers or free self-hosted |
| Milvus | Billion-scale, enterprise | Self-hosted (Zilliz Cloud) | Free self-hosted, Zilliz plans |
| Chroma | Local development, prototyping | Embedded or self-hosted | Free and open-source |
| pgvector | Existing PostgreSQL, simple needs | PostgreSQL extension | Part of PostgreSQL costs |
Decision Guide
Choose Pinecone if:
- You want zero infrastructure management
- You're building a production application
- You prioritize reliability over cost optimization
Choose Weaviate if:
- You need hybrid search (vector + keyword)
- You want built-in vectorization
- You need multi-tenancy
Choose Qdrant if:
- You need complex filtering with vectors
- Performance is critical
- You want self-hosted with low resource usage
Choose Milvus if:
- You have billions of vectors
- You need enterprise features
- You have infrastructure team capacity
Choose Chroma if:
- You're prototyping or learning
- You want embedded (in-process) database
- You're building local-first applications
Choose pgvector if:
- You already use PostgreSQL
- Your scale is under 1M vectors
- You want to minimize new infrastructure
Quick Reference
Embedding Model Comparison
| Model | Dimensions | Best For | Cost |
|---|---|---|---|
| text-embedding-3-small | 1536 | General purpose, cost-effective | $0.02/1M tokens |
| text-embedding-3-large | 3072 | Higher accuracy needs | $0.13/1M tokens |
| all-MiniLM-L6-v2 | 384 | Self-hosted, fast | Free (open-source) |
| BAAI/bge-large-en | 1024 | High accuracy, self-hosted | Free (open-source) |
Implementation Checklist
Setup:
- Choose vector database based on requirements
- Select embedding model (cost vs. accuracy)
- Set up development environment
- Create collection/index with correct dimensions
Data Pipeline:
- Implement document chunking strategy
- Build embedding generation pipeline
- Handle incremental updates
- Plan for re-embedding when models change
Search Implementation:
- Implement basic similarity search
- Add metadata filtering
- Tune top-k and similarity thresholds
- Handle no-results gracefully
Production:
- Monitor query latency and accuracy
- Set up index backups
- Plan scaling strategy
- Implement rate limiting if needed
Common Pitfalls
-
Wrong chunk size: Too large loses precision, too small loses context. Start with 500-1000 tokens.
-
Ignoring metadata: Always store source info for citations and debugging.
-
Not testing relevance: Build evaluation sets to measure search quality.
-
Over-engineering early: Start simple, add complexity when needed.
-
Forgetting hybrid search: Sometimes keywords + vectors beats vectors alone.
Further Reading:
- RAG Business Guide - Implementing retrieval-augmented generation
- AI Search Optimization - Optimizing for AI search engines