Comment by n_u

1 year ago

Ah interesting. Is your keyword-document map (aka term dict) too big to keep in memory permanently? My understanding is that at Google they just keep it in memory on every replica.

Edit: I should specify they shard the corpus by document so there isn't a replica with the entire term dict on it.

4 comments

n_u

marginalia_nu 1 year ago

Could plausibly fit in RAM, is only like ~100 GB in total. We'll see, will probably keep it mmap:ed at first to see what happens. It isn't the target of very many queries (relatively speaking) at any rate so either way is probably fine.

n_u 1 year ago
>It isn't the target of very many queries (relatively speaking)
Wow why is that? Do you use a vector index primarily?
- marginalia_nu 1 year ago
  
  No I mean for every query there is mapping up keywords to trees of documents, there is dozens if not hundreds of queries in the latter in order to intersect document lists.
  
  1 reply →