← Back to context

Comment by 1999-03-31

7 hours ago

1B vectors is nothing. You don’t need to index them. You can hold them in VRAM on a single node and run queries with perfect accuracy in milliseconds

I guess for 2D vectors that would work?

For 1024 dimensions even with 8 bit quantization you are looking at a terrabyte of data. Lets make it binary vectors, it is still 128GB of VRAM.

WAT?

1B x 4096 = 4T scalars.

That doesn't fit in anyone's video ram.

  • Well we have AI GPUs now so you could do it.

    Each MI325x has 256 GB of HBM. So you would need ~32 of em if it was 2 bytes per scalar.

Show your math lol

  • I assume by "node" OP meant something like a DGX node. Which yea, that would work, but not everyone (no one?) wants to buy a 500k system to do vector search.

    B200 spec:

    * 8TB/sec HBM bandwidth

    * 10 PetaOPs assuming int8.

    * 186GB of VRAM.

    If we work with 512-dimensional int8 embeddings, then we need 512GB VRAM to hold them, so assuming we have 8xB200 node (~500k$++), we can easily hold them (125M vectors per GPU).

    It takes about 1000 OPs to do the dot product between two vectors, so we need to do 1000*1B = 1TeraOPs, spread over 8 GPUs, that's 125 GigaOPs per GPU, so a fraction of a ms.

    Now the bottleneck will be data movement between HBM -> chips, since we have 125M vectors per GPU, aka 64GB, we can move them in ~8 ms.

    Here you go, the most expensive vector search in history, giving you the same performance as a regular CPU-based vectorDB for only 1000x the price.