← Back to context

Comment by urig

11 hours ago

Lots and lots of CPUs pooled. Faster more efficient power RAM accessible to both GPU and CPU. IIUC.

But at what stage are we asking for that RAM? if it's the inference stage then doesn't that belong to the GPU<>Memory which has nothing to do with the CPU?

I did see they have the unified CPU/GPU memory which may reduce the cost of host/kernel transactions especially now that we're probably lifting more and more memory with longer context tasks.