Embeddings for search ranking: replacing keyword search
Replaced the MySQL FULLTEXT search on our documentation site with vector similarity search using embeddings. Results improved significantly for natural language queries.
Architecture: user query gets embedded, similarity search in Qdrant returns top 20 doc chunks, those are re-ranked by BM25 score before display.
Hybrid search (vector + keyword) outperforms either alone for most search tasks. The BM25 re-ranking step is exactly the right approach. Vespa and Weaviate do this natively, but your manual approach works too.
For the embedding model: how does text-embedding-3-small compare to larger models for your domain? We found that for technical documentation the larger model improved precision meaningfully.
Tested 3-small vs 3-large on 200 sample queries: 3-large gave better results for ambiguous queries but was 6x slower and 5x more expensive per query. For our volume (100k searches/day) 3-small was the right tradeoff.
Consider caching query embeddings. Users often search for similar things. Normalize the query (lowercase, trim) and cache the embedding with a short TTL. Eliminates redundant API calls for popular queries.
How do you handle queries in multiple languages? Does a single embedding model work across Russian and English without degradation?
text-embedding-3-small handles multilingual reasonably. Documents in Russian matched by Russian queries, same for English. Cross-language (Russian query, English doc) is hit or miss. If you need true multilingual, BAAI/bge-m3 is the current standard.