ajinkya.ai An experiment in learning with AI.
← All entries
10 May 2026 2 min read

Hybrid search — keywords plus vectors

LLM RAG Search Tutorial Interactive

Hybrid search

Vector search is great at meaning but bad at exact matches. Keyword search is the opposite. Each fails on cases the other handles. Modern RAG runs both in parallel and merges the results. Pick a query and watch where each strategy succeeds or trips.

Try three retrieval modes

Use the buttons to swap queries. Each column shows a different retriever on the same ten-document mini corpus — watch keyword dominate on order IDs and error codes, vector dominate on paraphrases, and hybrid (RRF) recover when either branch alone would miss.

Keyword (BM25)

Counts exact word matches, weighted by rarity. Fast, exact, but misses synonyms.

Vector (semantic)

Compares meaning via embeddings. Catches synonyms, misses exact codes and rare terms.

Hybrid (RRF)

Merges both rankings using Reciprocal Rank Fusion. Best of each, fewer blind spots.

How the merge works

Reciprocal Rank Fusion

For each result in each list, compute 1 / (k + rank), where k is usually 60. Sum the scores across both lists. Sort by total. A doc that ranks #1 in keyword and #3 in vector beats a doc that ranks #2 in only one.

Why it's robust

RRF doesn't need either score on the same scale — keyword scores and vector distances aren't comparable directly. It only uses ranks. Swap embedding models or BM25 implementations and you don't have to retune.

The third stage: in production you'll often see a reranker on top — a smaller language model that re-scores the top 20 hybrid results by reading the query and document together. Slower, but fixes the cases where both keyword and vector got close but neither put the right answer at #1. The full pipeline is usually: hybrid retrieve → top 20 → rerank → top 5 → into prompt.
It combines two ranked lists using only ranks — score each item as 1/(k+rank) (often k=60) in each list and sum. No need to put BM25 scores and cosine distances on the same scale. It averages the raw BM25 score and the embedding cosine similarity after z-score normalization. It trains a small cross-encoder to predict which list is more trustworthy for each query. It takes the intersection of the two top-10 lists and re-sorts by document length. Because rare alphanumeric identifiers are poorly represented in embedding space — the model maps the query to semantically similar "shipping / orders" chunks rather than the document that contains the exact token. Because vector indexes cannot store more than ten documents per shard. Because BM25 always outperforms embeddings on English text. Because cosine similarity is undefined for uppercase strings. The user said "cancel subscription" but the policy doc uses phrases like "end membership" and "disable auto-renewal" — same meaning, different words. Keyword needs literal overlap; embeddings connect the paraphrases. Because the hybrid merger down-weights subscription-related tokens. Because BM25 cannot index verbs, only nouns. Because vector search always returns exactly five chunks.