artem_ml17 May 2025 12:42

Replaced the MySQL FULLTEXT search on our documentation site with vector similarity search using embeddings. Results improved significantly for natural language queries.

Architecture: user query gets embedded, similarity search in Qdrant returns top 20 doc chunks, those are re-ranked by BM25 score before display.

Replies (6)
alex_petrov17 May 2025 12:51

Hybrid search (vector + keyword) outperforms either alone for most search tasks. The BM25 re-ranking step is exactly the right approach. Vespa and Weaviate do this natively, but your manual approach works too.

0
dmitry_kv17 May 2025 13:10

For the embedding model: how does text-embedding-3-small compare to larger models for your domain? We found that for technical documentation the larger model improved precision meaningfully.

0
artem_ml17 May 2025 14:42

Tested 3-small vs 3-large on 200 sample queries: 3-large gave better results for ambiguous queries but was 6x slower and 5x more expensive per query. For our volume (100k searches/day) 3-small was the right tradeoff.

0
vova17 May 2025 15:46

Consider caching query embeddings. Users often search for similar things. Normalize the query (lowercase, trim) and cache the embedding with a short TTL. Eliminates redundant API calls for popular queries.

0
katedev17 May 2025 15:58

How do you handle queries in multiple languages? Does a single embedding model work across Russian and English without degradation?

0
artem_ml17 May 2025 17:58

text-embedding-3-small handles multilingual reasonably. Documents in Russian matched by Russian queries, same for English. Cross-language (Russian query, English doc) is hit or miss. If you need true multilingual, BAAI/bge-m3 is the current standard.

0