Tekta.ai LogoTektaai

Vector Databases: When to Use Them, How to Connect, and Best Use Cases

Complete guide to vector databases for business applications. Learn when vector databases make sense, how to connect to Pinecone, Weaviate, Qdrant, and others, plus real-world use cases with implementation examples.

Overview

Vector databases store data as mathematical embeddings that capture semantic meaning, enabling search by concept rather than exact keywords. This guide covers three essential questions:

1. **When should you use a vector database?** Decision framework to determine if your use case benefits from semantic search 2. **How do you connect to one?** Code examples for Pinecone, Weaviate, Qdrant, and Milvus 3. **What are the best use cases?** Real-world applications with implementation patterns

**What you will accomplish:** - Evaluate whether your application needs a vector database - Set up connections to major vector database platforms - Implement common patterns: semantic search, RAG, recommendations - Choose the right database for your requirements

When to Use Vector Databases

Not every application needs a vector database. Use this decision framework to determine if semantic search adds value to your use case.

Use a Vector Database When

1. Keywords fail to capture intent

If users search for "how to reduce customer churn" but your documents say "strategies for improving retention rates," traditional search fails. Vector databases understand these mean the same thing.

Signs you have this problem:

  • Users complain search "doesn't find anything"
  • Support teams know answers exist but can't find them
  • Same information stored with different terminology across departments

2. You're building RAG (Retrieval-Augmented Generation)

RAG systems need to find relevant context to feed into LLMs. Vector databases excel at retrieving semantically relevant chunks from large document collections.

Common RAG applications:

  • Chatbots that answer questions from company documents
  • AI assistants that reference internal knowledge bases
  • Document Q&A systems

3. You need similarity-based recommendations

Finding "similar" items based on content rather than user behavior patterns.

Examples:

  • "Find documents similar to this one"
  • "Show products related to what the customer described"
  • "Suggest articles on topics this user has read about"

4. You have unstructured data at scale

When you have thousands of documents, images, or other unstructured content that needs to be searchable by meaning.

Do NOT Use a Vector Database When

1. Exact matching is sufficient

If users search for order numbers, product SKUs, or specific names, traditional databases work better and are simpler.

2. Your data is primarily structured

Tabular data with clear columns and relationships belongs in SQL databases. Vector databases add complexity without benefit.

3. You have fewer than 1,000 documents

For small collections, simpler solutions work fine. Consider:

  • Full-text search (Elasticsearch, PostgreSQL FTS)
  • In-memory embedding comparison
  • Simple keyword search

4. Real-time writes are critical

Vector databases optimize for read-heavy workloads. If you need immediate write consistency across nodes, traditional databases may be more appropriate.

Decision Checklist

  • Do users struggle to find information using keyword search?
  • Is your content described differently across sources?
  • Are you building an AI/LLM-powered application?
  • Do you need "find similar" functionality?
  • Do you have 1,000+ documents or items?

If you checked 3 or more boxes, a vector database likely adds value.

How Vector Databases Work

Vector databases convert your content into embeddings (arrays of numbers) that capture semantic meaning. Similar content produces similar embeddings, enabling search by concept.

The Embedding Process

1. Your Content          2. Embedding Model        3. Vector Database
   "reduce churn"    →   [0.23, -0.45, ...]    →   Stored & Indexed
   "improve retention" → [0.21, -0.47, ...]    →   Similar vectors cluster

When you search, your query becomes an embedding, and the database finds the closest stored vectors.

Key Concepts

Embeddings: Arrays of 384-1536 numbers representing semantic meaning. Created by models like OpenAI's text-embedding-3-small or open-source alternatives like all-MiniLM-L6-v2.

Similarity metrics: How "closeness" is calculated:

  • Cosine similarity: Angle between vectors (most common)
  • Euclidean distance: Straight-line distance
  • Dot product: For normalized vectors

Indexing algorithms: Enable fast search at scale:

  • HNSW: Fast, memory-intensive, most accurate
  • IVF: Good balance of speed and memory
  • PQ: Compressed vectors, lower memory, less accurate

Connecting to Vector Databases

Here are connection examples for the major vector database platforms.

Pinecone (Managed)

Pinecone is a fully managed service. No infrastructure to maintain.

Installation:

pip install pinecone-client openai

Connection and basic operations:

from pinecone import Pinecone, ServerlessSpec
import openai
 
# Initialize clients
pc = Pinecone(api_key="YOUR_PINECONE_API_KEY")
openai.api_key = "YOUR_OPENAI_API_KEY"
 
# Create index (run once)
pc.create_index(
    name="my-index",
    dimension=1536,  # OpenAI embedding dimension
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)
 
# Connect to index
index = pc.Index("my-index")
 
# Generate embedding
def get_embedding(text):
    response = openai.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding
 
# Upsert documents
documents = [
    {"id": "doc1", "text": "How to reduce customer churn"},
    {"id": "doc2", "text": "Strategies for improving retention"},
]
 
vectors = [
    {
        "id": doc["id"],
        "values": get_embedding(doc["text"]),
        "metadata": {"text": doc["text"]}
    }
    for doc in documents
]
 
index.upsert(vectors=vectors)
 
# Query
query = "keeping customers from leaving"
query_embedding = get_embedding(query)
 
results = index.query(
    vector=query_embedding,
    top_k=5,
    include_metadata=True
)
 
for match in results.matches:
    print(f"{match.score:.3f}: {match.metadata['text']}")

Weaviate (Self-hosted or Cloud)

Weaviate offers both managed cloud and self-hosted options with built-in vectorization.

Installation:

pip install weaviate-client

Connection and basic operations:

import weaviate
from weaviate.classes.init import Auth
 
# Connect to Weaviate Cloud
client = weaviate.connect_to_weaviate_cloud(
    cluster_url="YOUR_CLUSTER_URL",
    auth_credentials=Auth.api_key("YOUR_API_KEY"),
    headers={"X-OpenAI-Api-Key": "YOUR_OPENAI_KEY"}
)
 
# Or connect to local instance
# client = weaviate.connect_to_local()
 
# Create collection with vectorizer
from weaviate.classes.config import Configure, Property, DataType
 
client.collections.create(
    name="Document",
    vectorizer_config=Configure.Vectorizer.text2vec_openai(),
    properties=[
        Property(name="text", data_type=DataType.TEXT),
        Property(name="source", data_type=DataType.TEXT),
    ]
)
 
# Add documents (auto-vectorized)
documents = client.collections.get("Document")
 
documents.data.insert_many([
    {"text": "How to reduce customer churn", "source": "support"},
    {"text": "Strategies for improving retention", "source": "marketing"},
])
 
# Query
response = documents.query.near_text(
    query="keeping customers from leaving",
    limit=5
)
 
for obj in response.objects:
    print(f"{obj.properties['text']}")
 
client.close()

Qdrant (Self-hosted or Cloud)

Qdrant is a high-performance option written in Rust, with advanced filtering capabilities.

Installation:

pip install qdrant-client openai

Connection and basic operations:

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
import openai
 
# Connect to Qdrant Cloud
client = QdrantClient(
    url="YOUR_CLUSTER_URL",
    api_key="YOUR_API_KEY"
)
 
# Or local
# client = QdrantClient("localhost", port=6333)
 
openai.api_key = "YOUR_OPENAI_API_KEY"
 
def get_embedding(text):
    response = openai.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding
 
# Create collection
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)
 
# Upsert documents
documents = [
    {"id": 1, "text": "How to reduce customer churn"},
    {"id": 2, "text": "Strategies for improving retention"},
]
 
points = [
    PointStruct(
        id=doc["id"],
        vector=get_embedding(doc["text"]),
        payload={"text": doc["text"]}
    )
    for doc in documents
]
 
client.upsert(collection_name="documents", points=points)
 
# Query
query_vector = get_embedding("keeping customers from leaving")
 
results = client.search(
    collection_name="documents",
    query_vector=query_vector,
    limit=5
)
 
for result in results:
    print(f"{result.score:.3f}: {result.payload['text']}")

Milvus (Self-hosted)

Milvus is designed for large-scale deployments with billions of vectors.

Installation:

pip install pymilvus openai

Connection and basic operations:

from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType, utility
import openai
 
# Connect to Milvus
connections.connect("default", host="localhost", port="19530")
 
openai.api_key = "YOUR_OPENAI_API_KEY"
 
def get_embedding(text):
    response = openai.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding
 
# Define schema
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=1000),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=1536)
]
schema = CollectionSchema(fields, description="Document collection")
 
# Create collection
collection = Collection("documents", schema)
 
# Create index
index_params = {
    "metric_type": "COSINE",
    "index_type": "HNSW",
    "params": {"M": 16, "efConstruction": 256}
}
collection.create_index("embedding", index_params)
 
# Insert documents
documents = [
    "How to reduce customer churn",
    "Strategies for improving retention",
]
 
embeddings = [get_embedding(doc) for doc in documents]
 
collection.insert([documents, embeddings])
collection.load()
 
# Query
query_embedding = get_embedding("keeping customers from leaving")
 
results = collection.search(
    data=[query_embedding],
    anns_field="embedding",
    param={"metric_type": "COSINE", "params": {"ef": 64}},
    limit=5,
    output_fields=["text"]
)
 
for hits in results:
    for hit in hits:
        print(f"{hit.distance:.3f}: {hit.entity.get('text')}")

Use Cases with Examples

Replace keyword search with meaning-based search across company documents.

Problem: Employees search for "vacation policy" but documents say "PTO guidelines" or "time off procedures."

Solution:

# Index all HR documents
for doc in hr_documents:
    vector = get_embedding(doc.content)
    index.upsert({
        "id": doc.id,
        "values": vector,
        "metadata": {
            "title": doc.title,
            "department": "HR",
            "content": doc.content[:500]  # Preview
        }
    })
 
# Search finds semantically similar content
results = index.query(
    vector=get_embedding("vacation policy"),
    top_k=5,
    filter={"department": "HR"}
)

Outcome: 70-90% improvement in search relevance.

Use Case 2: RAG for Customer Support

Build a chatbot that answers questions using your knowledge base.

Problem: Support agents spend time searching for answers that exist in documentation.

Solution:

def answer_question(user_question):
    # 1. Find relevant context
    query_embedding = get_embedding(user_question)
    results = index.query(
        vector=query_embedding,
        top_k=3,
        include_metadata=True
    )
 
    # 2. Build context from results
    context = "\n\n".join([
        r.metadata["content"] for r in results.matches
    ])
 
    # 3. Generate answer with LLM
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": f"Answer based on this context:\n{context}"},
            {"role": "user", "content": user_question}
        ]
    )
 
    return response.choices[0].message.content, results.matches

Outcome: 40-60% reduction in average handle time.

Use Case 3: Similar Content Recommendations

Recommend related articles, products, or documents.

Problem: Users want to find "more like this" without knowing exact keywords.

Solution:

def find_similar(document_id, exclude_same_category=False):
    # Get the document's embedding
    doc = index.fetch([document_id])
    doc_vector = doc.vectors[document_id].values
    doc_category = doc.vectors[document_id].metadata.get("category")
 
    # Find similar documents
    filter_condition = None
    if exclude_same_category:
        filter_condition = {"category": {"$ne": doc_category}}
 
    results = index.query(
        vector=doc_vector,
        top_k=6,  # Extra to exclude self
        filter=filter_condition,
        include_metadata=True
    )
 
    # Exclude the source document
    similar = [r for r in results.matches if r.id != document_id][:5]
    return similar

Outcome: 25-40% increase in content engagement.

Use Case 4: Duplicate Detection

Find semantically similar content that might be duplicates or near-duplicates.

Problem: Different teams create similar documents without knowing others exist.

Solution:

def find_duplicates(threshold=0.95):
    duplicates = []
 
    # Get all documents
    all_docs = index.list()
 
    for doc_id in all_docs:
        doc = index.fetch([doc_id])
        vector = doc.vectors[doc_id].values
 
        # Find very similar documents
        results = index.query(
            vector=vector,
            top_k=5,
            include_metadata=True
        )
 
        for match in results.matches:
            if match.id != doc_id and match.score >= threshold:
                duplicates.append({
                    "doc1": doc_id,
                    "doc2": match.id,
                    "similarity": match.score
                })
 
    return duplicates

Outcome: Identify redundant content, reduce maintenance burden.

Search images using text descriptions (requires multi-modal embeddings).

Problem: Users want to find product images by describing what they're looking for.

Solution:

from transformers import CLIPProcessor, CLIPModel
 
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
 
def get_image_embedding(image):
    inputs = processor(images=image, return_tensors="pt")
    return model.get_image_features(**inputs)[0].tolist()
 
def get_text_embedding(text):
    inputs = processor(text=text, return_tensors="pt")
    return model.get_text_features(**inputs)[0].tolist()
 
# Index images
for product in products:
    image_vector = get_image_embedding(product.image)
    index.upsert({
        "id": product.id,
        "values": image_vector,
        "metadata": {"name": product.name, "category": product.category}
    })
 
# Search with text
results = index.query(
    vector=get_text_embedding("red summer dress with floral pattern"),
    top_k=10
)

Choosing the Right Database

DatabaseBest ForDeploymentPricing Model
PineconeProduction apps, minimal opsManaged onlyPer-vector storage + queries
WeaviateHybrid search, multi-modalCloud or self-hostedCloud tiers or free self-hosted
QdrantAdvanced filtering, performanceCloud or self-hostedCloud tiers or free self-hosted
MilvusBillion-scale, enterpriseSelf-hosted (Zilliz Cloud)Free self-hosted, Zilliz plans
ChromaLocal development, prototypingEmbedded or self-hostedFree and open-source
pgvectorExisting PostgreSQL, simple needsPostgreSQL extensionPart of PostgreSQL costs
Database
Pinecone
Best For
Production apps, minimal ops
Deployment
Managed only
Pricing Model
Per-vector storage + queries
Database
Weaviate
Best For
Hybrid search, multi-modal
Deployment
Cloud or self-hosted
Pricing Model
Cloud tiers or free self-hosted
Database
Qdrant
Best For
Advanced filtering, performance
Deployment
Cloud or self-hosted
Pricing Model
Cloud tiers or free self-hosted
Database
Milvus
Best For
Billion-scale, enterprise
Deployment
Self-hosted (Zilliz Cloud)
Pricing Model
Free self-hosted, Zilliz plans
Database
Chroma
Best For
Local development, prototyping
Deployment
Embedded or self-hosted
Pricing Model
Free and open-source
Database
pgvector
Best For
Existing PostgreSQL, simple needs
Deployment
PostgreSQL extension
Pricing Model
Part of PostgreSQL costs

Decision Guide

Choose Pinecone if:

  • You want zero infrastructure management
  • You're building a production application
  • You prioritize reliability over cost optimization

Choose Weaviate if:

  • You need hybrid search (vector + keyword)
  • You want built-in vectorization
  • You need multi-tenancy

Choose Qdrant if:

  • You need complex filtering with vectors
  • Performance is critical
  • You want self-hosted with low resource usage

Choose Milvus if:

  • You have billions of vectors
  • You need enterprise features
  • You have infrastructure team capacity

Choose Chroma if:

  • You're prototyping or learning
  • You want embedded (in-process) database
  • You're building local-first applications

Choose pgvector if:

  • You already use PostgreSQL
  • Your scale is under 1M vectors
  • You want to minimize new infrastructure

Quick Reference

Embedding Model Comparison

ModelDimensionsBest ForCost
text-embedding-3-small1536General purpose, cost-effective$0.02/1M tokens
text-embedding-3-large3072Higher accuracy needs$0.13/1M tokens
all-MiniLM-L6-v2384Self-hosted, fastFree (open-source)
BAAI/bge-large-en1024High accuracy, self-hostedFree (open-source)
Model
text-embedding-3-small
Dimensions
1536
Best For
General purpose, cost-effective
Cost
$0.02/1M tokens
Model
text-embedding-3-large
Dimensions
3072
Best For
Higher accuracy needs
Cost
$0.13/1M tokens
Model
all-MiniLM-L6-v2
Dimensions
384
Best For
Self-hosted, fast
Cost
Free (open-source)
Model
BAAI/bge-large-en
Dimensions
1024
Best For
High accuracy, self-hosted
Cost
Free (open-source)

Implementation Checklist

Setup:

  • Choose vector database based on requirements
  • Select embedding model (cost vs. accuracy)
  • Set up development environment
  • Create collection/index with correct dimensions

Data Pipeline:

  • Implement document chunking strategy
  • Build embedding generation pipeline
  • Handle incremental updates
  • Plan for re-embedding when models change

Search Implementation:

  • Implement basic similarity search
  • Add metadata filtering
  • Tune top-k and similarity thresholds
  • Handle no-results gracefully

Production:

  • Monitor query latency and accuracy
  • Set up index backups
  • Plan scaling strategy
  • Implement rate limiting if needed

Common Pitfalls

  1. Wrong chunk size: Too large loses precision, too small loses context. Start with 500-1000 tokens.

  2. Ignoring metadata: Always store source info for citations and debugging.

  3. Not testing relevance: Build evaluation sets to measure search quality.

  4. Over-engineering early: Start simple, add complexity when needed.

  5. Forgetting hybrid search: Sometimes keywords + vectors beats vectors alone.


Further Reading:

Related Guides