I assume by "node" OP meant something like a DGX node.
Which yea, that would work, but not everyone (no one?) wants to buy a 500k system to do vector search.
B200 spec:
* 8TB/sec HBM bandwidth
* 10 PetaOPs assuming int8.
* 186GB of VRAM.
If we work with 512-dimensional int8 embeddings, then we need 512GB VRAM to hold them, so assuming we have 8xB200 node (~500k$++), we can easily hold them (125M vectors per GPU).
It takes about 1000 OPs to do the dot product between two vectors, so we need to do 1000*1B = 1TeraOPs, spread over 8 GPUs, that's 125 GigaOPs per GPU, so a fraction of a ms.
Now the bottleneck will be data movement between HBM -> chips, since we have 125M vectors per GPU, aka 64GB, we can move them in ~8 ms.
Here you go, the most expensive vector search in history, giving you the same performance as a regular CPU-based vectorDB for only 1000x the price.
Thanks for doing the math! I suppose if we are charitable in practice we would of course index and only offload partially to VRAM (FAISS does that with IVF/PQ and similar).
I assume by "node" OP meant something like a DGX node. Which yea, that would work, but not everyone (no one?) wants to buy a 500k system to do vector search.
B200 spec:
* 8TB/sec HBM bandwidth
* 10 PetaOPs assuming int8.
* 186GB of VRAM.
If we work with 512-dimensional int8 embeddings, then we need 512GB VRAM to hold them, so assuming we have 8xB200 node (~500k$++), we can easily hold them (125M vectors per GPU).
It takes about 1000 OPs to do the dot product between two vectors, so we need to do 1000*1B = 1TeraOPs, spread over 8 GPUs, that's 125 GigaOPs per GPU, so a fraction of a ms.
Now the bottleneck will be data movement between HBM -> chips, since we have 125M vectors per GPU, aka 64GB, we can move them in ~8 ms.
Here you go, the most expensive vector search in history, giving you the same performance as a regular CPU-based vectorDB for only 1000x the price.
Thanks for doing the math! I suppose if we are charitable in practice we would of course index and only offload partially to VRAM (FAISS does that with IVF/PQ and similar).