Comment by kgeist

14 days ago

Are there vector DBs with 100B vectors in production which work well? There was a paper which showed that there's 12% loss in accuracy at just 1 mln vectors. Maybe some kind of logical sharding is another option, to improve both accuracy and speed.

7 comments

kgeist

lmeyerov 14 days ago

I don't know at these scales, but at the 1M-100M, we found switching from out-of-box embeddings to fine-tuning our embeddings gave less of a sting in the compression/recall trade-off . We had a 10-100X win here wrt comparable recall with better compression.

I'm not sure how that'd work with the binary quantization phase though. For example, we use Matroyska, and some of the bits matter way more than others, so that might be super painful.

jasonjmcghee 14 days ago

So many missing details...

Different vector indexes have very different recall and even different parameters for each dramatically impact this.

HNSW can have very good recall even at high vector counts.

There's also the embedding model, whether you're quantizing, if it's pure rag vs hybrid bm25 / static word embeddings vs graph connections, whether you're reranking etc etc

_peregrine_ 14 days ago

the solution described in the blog post is currently in production at 100B vectors

rahimnathwani 14 days ago
For what/who?
- _peregrine_ 14 days ago
  
  unfortunately i'm not able to share the customer or use case :( but the metrics that you see in the first charts in the post are from a production cluster
- esafak 14 days ago
  
  https://turbopuffer.com/customers/cursor
  
  1 reply →