Last updated: January 7, 2025

Implementation Guides

Intermediate

Vector Databases: When to Use Them, How to Connect, and Best Use Cases

Complete guide to vector databases for business applications. Learn when vector databases make sense, how to connect to Pinecone, Weaviate, Qdrant, and others, plus real-world use cases with implementation examples.

Vector DatabasesRAG

Overview

Vector databases store data as mathematical embeddings that capture semantic meaning, enabling search by concept rather than exact keywords. This guide covers three essential questions:

1. **When should you use a vector database?** Decision framework to determine if your use case benefits from semantic search 2. **How do you connect to one?** Code examples for Pinecone, Weaviate, Qdrant, and Milvus 3. **What are the best use cases?** Real-world applications with implementation patterns

**What you will accomplish:** - Evaluate whether your application needs a vector database - Set up connections to major vector database platforms - Implement common patterns: semantic search, RAG, recommendations - Choose the right database for your requirements

When to Use Vector Databases

Not every application needs a vector database. Use this decision framework to determine if semantic search adds value to your use case.

Use a Vector Database When

1. Keywords fail to capture intent

If users search for "how to reduce customer churn" but your documents say "strategies for improving retention rates," traditional search fails. Vector databases understand these mean the same thing.

Signs you have this problem:

Users complain search "doesn't find anything"
Support teams know answers exist but can't find them
Same information stored with different terminology across departments

2. You're building RAG (Retrieval-Augmented Generation)

RAG systems need to find relevant context to feed into LLMs. Vector databases excel at retrieving semantically relevant chunks from large document collections.

Common RAG applications:

Chatbots that answer questions from company documents
AI assistants that reference internal knowledge bases
Document Q&A systems

3. You need similarity-based recommendations

Finding "similar" items based on content rather than user behavior patterns.

Examples:

"Find documents similar to this one"
"Show products related to what the customer described"
"Suggest articles on topics this user has read about"

4. You have unstructured data at scale

When you have thousands of documents, images, or other unstructured content that needs to be searchable by meaning.

Do NOT Use a Vector Database When

1. Exact matching is sufficient

If users search for order numbers, product SKUs, or specific names, traditional databases work better and are simpler.

2. Your data is primarily structured

Tabular data with clear columns and relationships belongs in SQL databases. Vector databases add complexity without benefit.

3. You have fewer than 1,000 documents

For small collections, simpler solutions work fine. Consider:

Full-text search (Elasticsearch, PostgreSQL FTS)
In-memory embedding comparison
Simple keyword search

4. Real-time writes are critical

Vector databases optimize for read-heavy workloads. If you need immediate write consistency across nodes, traditional databases may be more appropriate.

Decision Checklist

Do users struggle to find information using keyword search?
Is your content described differently across sources?
Are you building an AI/LLM-powered application?
Do you need "find similar" functionality?
Do you have 1,000+ documents or items?

If you checked 3 or more boxes, a vector database likely adds value.

How Vector Databases Work

Vector databases convert your content into embeddings (arrays of numbers) that capture semantic meaning. Similar content produces similar embeddings, enabling search by concept.

The Embedding Process

1. Your Content          2. Embedding Model        3. Vector Database
   "reduce churn"    →   [0.23, -0.45, ...]    →   Stored & Indexed
   "improve retention" → [0.21, -0.47, ...]    →   Similar vectors cluster

When you search, your query becomes an embedding, and the database finds the closest stored vectors.

Key Concepts

Embeddings: Arrays of 384-1536 numbers representing semantic meaning. Created by models like OpenAI's text-embedding-3-small or open-source alternatives like all-MiniLM-L6-v2.

Similarity metrics: How "closeness" is calculated:

Cosine similarity: Angle between vectors (most common)
Euclidean distance: Straight-line distance
Dot product: For normalized vectors

Indexing algorithms: Enable fast search at scale:

HNSW: Fast, memory-intensive, most accurate
IVF: Good balance of speed and memory
PQ: Compressed vectors, lower memory, less accurate

Connecting to Vector Databases

Here are connection examples for the major vector database platforms.

Pinecone (Managed)

Pinecone is a fully managed service. No infrastructure to maintain.

Installation:

pip install pinecone-client openai

Connection and basic operations:

from pinecone import Pinecone, ServerlessSpec
import openai
 
# Initialize clients
pc = Pinecone(api_key="YOUR_PINECONE_API_KEY")
openai.api_key = "YOUR_OPENAI_API_KEY"
 
# Create index (run once)
pc.create_index(
    name="my-index",
    dimension=1536,  # OpenAI embedding dimension
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)
 
# Connect to index
index = pc.Index("my-index")
 
# Generate embedding
def get_embedding(text):
    response = openai.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding
 
# Upsert documents
documents = [
    {"id": "doc1", "text": "How to reduce customer churn"},
    {"id": "doc2", "text": "Strategies for improving retention"},
]
 
vectors = [
    {
        "id": doc["id"],
        "values": get_embedding(doc["text"]),
        "metadata": {"text": doc["text"]}
    }
    for doc in documents
]
 
index.upsert(vectors=vectors)
 
# Query
query = "keeping customers from leaving"
query_embedding = get_embedding(query)
 
results = index.query(
    vector=query_embedding,
    top_k=5,
    include_metadata=True
)
 
for match in results.matches:
    print(f"{match.score:.3f}: {match.metadata['text']}")

Weaviate (Self-hosted or Cloud)

Weaviate offers both managed cloud and self-hosted options with built-in vectorization.

Installation:

pip install weaviate-client

Connection and basic operations:

import weaviate
from weaviate.classes.init import Auth
 
# Connect to Weaviate Cloud
client = weaviate.connect_to_weaviate_cloud(
    cluster_url="YOUR_CLUSTER_URL",
    auth_credentials=Auth.api_key("YOUR_API_KEY"),
    headers={"X-OpenAI-Api-Key": "YOUR_OPENAI_KEY"}
)
 
# Or connect to local instance
# client = weaviate.connect_to_local()
 
# Create collection with vectorizer
from weaviate.classes.config import Configure, Property, DataType
 
client.collections.create(
    name="Document",
    vectorizer_config=Configure.Vectorizer.text2vec_openai(),
    properties=[
        Property(name="text", data_type=DataType.TEXT),
        Property(name="source", data_type=DataType.TEXT),
    ]
)
 
# Add documents (auto-vectorized)
documents = client.collections.get("Document")
 
documents.data.insert_many([
    {"text": "How to reduce customer churn", "source": "support"},
    {"text": "Strategies for improving retention", "source": "marketing"},
])
 
# Query
response = documents.query.near_text(
    query="keeping customers from leaving",
    limit=5
)
 
for obj in response.objects:
    print(f"{obj.properties['text']}")
 
client.close()

Qdrant (Self-hosted or Cloud)

Qdrant is a high-performance option written in Rust, with advanced filtering capabilities.

Installation:

pip install qdrant-client openai

Connection and basic operations:

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
import openai
 
# Connect to Qdrant Cloud
client = QdrantClient(
    url="YOUR_CLUSTER_URL",
    api_key="YOUR_API_KEY"
)
 
# Or local
# client = QdrantClient("localhost", port=6333)
 
openai.api_key = "YOUR_OPENAI_API_KEY"
 
def get_embedding(text):
    response = openai.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding
 
# Create collection
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)
 
# Upsert documents
documents = [
    {"id": 1, "text": "How to reduce customer churn"},
    {"id": 2, "text": "Strategies for improving retention"},
]
 
points = [
    PointStruct(
        id=doc["id"],
        vector=get_embedding(doc["text"]),
        payload={"text": doc["text"]}
    )
    for doc in documents
]
 
client.upsert(collection_name="documents", points=points)
 
# Query
query_vector = get_embedding("keeping customers from leaving")
 
results = client.search(
    collection_name="documents",
    query_vector=query_vector,
    limit=5
)
 
for result in results:
    print(f"{result.score:.3f}: {result.payload['text']}")

Milvus (Self-hosted)

Milvus is designed for large-scale deployments with billions of vectors.

Installation:

pip install pymilvus openai

Connection and basic operations:

from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType, utility
import openai
 
# Connect to Milvus
connections.connect("default", host="localhost", port="19530")
 
openai.api_key = "YOUR_OPENAI_API_KEY"
 
def get_embedding(text):
    response = openai.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding
 
# Define schema
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=1000),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=1536)
]
schema = CollectionSchema(fields, description="Document collection")
 
# Create collection
collection = Collection("documents", schema)
 
# Create index
index_params = {
    "metric_type": "COSINE",
    "index_type": "HNSW",
    "params": {"M": 16, "efConstruction": 256}
}
collection.create_index("embedding", index_params)
 
# Insert documents
documents = [
    "How to reduce customer churn",
    "Strategies for improving retention",
]
 
embeddings = [get_embedding(doc) for doc in documents]
 
collection.insert([documents, embeddings])
collection.load()
 
# Query
query_embedding = get_embedding("keeping customers from leaving")
 
results = collection.search(
    data=[query_embedding],
    anns_field="embedding",
    param={"metric_type": "COSINE", "params": {"ef": 64}},
    limit=5,
    output_fields=["text"]
)
 
for hits in results:
    for hit in hits:
        print(f"{hit.distance:.3f}: {hit.entity.get('text')}")

Use Cases with Examples

Use Case 1: Semantic Document Search

Replace keyword search with meaning-based search across company documents.

Problem: Employees search for "vacation policy" but documents say "PTO guidelines" or "time off procedures."

Solution:

# Index all HR documents
for doc in hr_documents:
    vector = get_embedding(doc.content)
    index.upsert({
        "id": doc.id,
        "values": vector,
        "metadata": {
            "title": doc.title,
            "department": "HR",
            "content": doc.content[:500]  # Preview
        }
    })
 
# Search finds semantically similar content
results = index.query(
    vector=get_embedding("vacation policy"),
    top_k=5,
    filter={"department": "HR"}
)

Outcome: 70-90% improvement in search relevance.

Use Case 2: RAG for Customer Support

Build a chatbot that answers questions using your knowledge base.

Problem: Support agents spend time searching for answers that exist in documentation.

Solution:

def answer_question(user_question):
    # 1. Find relevant context
    query_embedding = get_embedding(user_question)
    results = index.query(
        vector=query_embedding,
        top_k=3,
        include_metadata=True
    )
 
    # 2. Build context from results
    context = "\n\n".join([
        r.metadata["content"] for r in results.matches
    ])
 
    # 3. Generate answer with LLM
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": f"Answer based on this context:\n{context}"},
            {"role": "user", "content": user_question}
        ]
    )
 
    return response.choices[0].message.content, results.matches

Outcome: 40-60% reduction in average handle time.

Use Case 3: Similar Content Recommendations

Recommend related articles, products, or documents.

Problem: Users want to find "more like this" without knowing exact keywords.

Solution:

def find_similar(document_id, exclude_same_category=False):
    # Get the document's embedding
    doc = index.fetch([document_id])
    doc_vector = doc.vectors[document_id].values
    doc_category = doc.vectors[document_id].metadata.get("category")
 
    # Find similar documents
    filter_condition = None
    if exclude_same_category:
        filter_condition = {"category": {"$ne": doc_category}}
 
    results = index.query(
        vector=doc_vector,
        top_k=6,  # Extra to exclude self
        filter=filter_condition,
        include_metadata=True
    )
 
    # Exclude the source document
    similar = [r for r in results.matches if r.id != document_id][:5]
    return similar

Outcome: 25-40% increase in content engagement.

Use Case 4: Duplicate Detection

Find semantically similar content that might be duplicates or near-duplicates.

Problem: Different teams create similar documents without knowing others exist.

Solution:

def find_duplicates(threshold=0.95):
    duplicates = []
 
    # Get all documents
    all_docs = index.list()
 
    for doc_id in all_docs:
        doc = index.fetch([doc_id])
        vector = doc.vectors[doc_id].values
 
        # Find very similar documents
        results = index.query(
            vector=vector,
            top_k=5,
            include_metadata=True
        )
 
        for match in results.matches:
            if match.id != doc_id and match.score >= threshold:
                duplicates.append({
                    "doc1": doc_id,
                    "doc2": match.id,
                    "similarity": match.score
                })
 
    return duplicates

Outcome: Identify redundant content, reduce maintenance burden.

Search images using text descriptions (requires multi-modal embeddings).

Problem: Users want to find product images by describing what they're looking for.

Solution:

from transformers import CLIPProcessor, CLIPModel
 
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
 
def get_image_embedding(image):
    inputs = processor(images=image, return_tensors="pt")
    return model.get_image_features(**inputs)[0].tolist()
 
def get_text_embedding(text):
    inputs = processor(text=text, return_tensors="pt")
    return model.get_text_features(**inputs)[0].tolist()
 
# Index images
for product in products:
    image_vector = get_image_embedding(product.image)
    index.upsert({
        "id": product.id,
        "values": image_vector,
        "metadata": {"name": product.name, "category": product.category}
    })
 
# Search with text
results = index.query(
    vector=get_text_embedding("red summer dress with floral pattern"),
    top_k=10
)

Choosing the Right Database

Database	Best For	Deployment	Pricing Model
Pinecone	Production apps, minimal ops	Managed only	Per-vector storage + queries
Weaviate	Hybrid search, multi-modal	Cloud or self-hosted	Cloud tiers or free self-hosted
Qdrant	Advanced filtering, performance	Cloud or self-hosted	Cloud tiers or free self-hosted
Milvus	Billion-scale, enterprise	Self-hosted (Zilliz Cloud)	Free self-hosted, Zilliz plans
Chroma	Local development, prototyping	Embedded or self-hosted	Free and open-source
pgvector	Existing PostgreSQL, simple needs	PostgreSQL extension	Part of PostgreSQL costs

Database

Pinecone

Best For

Production apps, minimal ops

Deployment

Managed only

Pricing Model

Per-vector storage + queries

Database

Weaviate

Best For

Hybrid search, multi-modal

Deployment

Cloud or self-hosted

Pricing Model

Cloud tiers or free self-hosted

Database

Qdrant

Best For

Advanced filtering, performance

Deployment

Cloud or self-hosted

Pricing Model

Cloud tiers or free self-hosted

Database

Milvus

Best For

Billion-scale, enterprise

Deployment

Self-hosted (Zilliz Cloud)

Pricing Model

Free self-hosted, Zilliz plans

Database

Chroma

Best For

Local development, prototyping

Deployment

Embedded or self-hosted

Pricing Model

Free and open-source

Database

pgvector

Best For

Existing PostgreSQL, simple needs

Deployment

PostgreSQL extension

Pricing Model

Part of PostgreSQL costs

Decision Guide

Choose Pinecone if:

You want zero infrastructure management
You're building a production application
You prioritize reliability over cost optimization

Choose Weaviate if:

You need hybrid search (vector + keyword)
You want built-in vectorization
You need multi-tenancy

Choose Qdrant if:

You need complex filtering with vectors
Performance is critical
You want self-hosted with low resource usage

Choose Milvus if:

You have billions of vectors
You need enterprise features
You have infrastructure team capacity

Choose Chroma if:

You're prototyping or learning
You want embedded (in-process) database
You're building local-first applications

Choose pgvector if:

You already use PostgreSQL
Your scale is under 1M vectors
You want to minimize new infrastructure

Quick Reference

Embedding Model Comparison

Model	Dimensions	Best For	Cost
text-embedding-3-small	1536	General purpose, cost-effective	$0.02/1M tokens
text-embedding-3-large	3072	Higher accuracy needs	$0.13/1M tokens
all-MiniLM-L6-v2	384	Self-hosted, fast	Free (open-source)
BAAI/bge-large-en	1024	High accuracy, self-hosted	Free (open-source)

Model

text-embedding-3-small

Dimensions

1536

Best For

General purpose, cost-effective

Cost

$0.02/1M tokens

Model

text-embedding-3-large

Dimensions

3072

Best For

Higher accuracy needs

Cost

$0.13/1M tokens

Model

all-MiniLM-L6-v2

Dimensions

384

Best For

Self-hosted, fast

Cost

Free (open-source)

Model

BAAI/bge-large-en

Dimensions

1024

Best For

High accuracy, self-hosted

Cost

Free (open-source)

Implementation Checklist

Setup:

Choose vector database based on requirements
Select embedding model (cost vs. accuracy)
Set up development environment
Create collection/index with correct dimensions

Data Pipeline:

Implement document chunking strategy
Build embedding generation pipeline
Handle incremental updates
Plan for re-embedding when models change

Search Implementation:

Implement basic similarity search
Add metadata filtering
Tune top-k and similarity thresholds
Handle no-results gracefully

Production:

Monitor query latency and accuracy
Set up index backups
Plan scaling strategy
Implement rate limiting if needed

Common Pitfalls

Wrong chunk size: Too large loses precision, too small loses context. Start with 500-1000 tokens.
Ignoring metadata: Always store source info for citations and debugging.
Not testing relevance: Build evaluation sets to measure search quality.
Over-engineering early: Start simple, add complexity when needed.
Forgetting hybrid search: Sometimes keywords + vectors beats vectors alone.

Further Reading:

RAG Business Guide - Implementing retrieval-augmented generation
AI Search Optimization - Optimizing for AI search engines

Published on January 7, 2025 - By Tekta Team - 20 min read

Vector Databases: When to Use Them, How to Connect, and Best Use Cases

Overview

When to Use Vector Databases

Use a Vector Database When

Do NOT Use a Vector Database When

Decision Checklist

How Vector Databases Work

The Embedding Process

Key Concepts

Connecting to Vector Databases

Pinecone (Managed)

Weaviate (Self-hosted or Cloud)

Qdrant (Self-hosted or Cloud)

Milvus (Self-hosted)

Use Cases with Examples

Use Case 1: Semantic Document Search

Use Case 2: RAG for Customer Support

Use Case 3: Similar Content Recommendations

Use Case 4: Duplicate Detection

Choosing the Right Database

Decision Guide

Quick Reference

Embedding Model Comparison

Implementation Checklist

Common Pitfalls

Related Guides

AI Orchestration Frameworks: When to Use Them (And When Direct APIs Are Better)

Retrieval-Augmented Generation (RAG): Connecting AI to Your Business Knowledge

Vector Databases: When to Use Them, How to Connect, and Best Use Cases

Overview

When to Use Vector Databases

Use a Vector Database When

Do NOT Use a Vector Database When

Decision Checklist

How Vector Databases Work

The Embedding Process

Key Concepts

Connecting to Vector Databases

Pinecone (Managed)

Weaviate (Self-hosted or Cloud)

Qdrant (Self-hosted or Cloud)

Milvus (Self-hosted)

Use Cases with Examples

Use Case 1: Semantic Document Search

Use Case 2: RAG for Customer Support

Use Case 3: Similar Content Recommendations

Use Case 4: Duplicate Detection

Use Case 5: Multi-modal Search

Choosing the Right Database

Decision Guide

Quick Reference

Embedding Model Comparison

Implementation Checklist

Common Pitfalls

Related Guides

AI Orchestration Frameworks: When to Use Them (And When Direct APIs Are Better)

Retrieval-Augmented Generation (RAG): Connecting AI to Your Business Knowledge