← Back to context

Comment by kgeist

10 months ago

900 ms sounds like a lot for just 10,000 documents? How many chunks are there per document? Maybe Pinecone's 820 ms includes network latency plus they need to serve other users?

In Go, I once implemented a naive brute-force cosine search (linear scan in memory), and for 1 million 350-dimensional vectors, I got results in under 1 second too IIRC.

I ended up just setting up OpenSearch, which gives you hybrid semantic + full-text search out of the box (BM25 + kNN). In my tests, it gave better results than semantic search alone, something like +15% better retrieval.