But at what stage are we asking for that RAM? if it's the inference stage then doesn't that belong to the GPU<>Memory which has nothing to do with the CPU?
I did see they have the unified CPU/GPU memory which may reduce the cost of host/kernel transactions especially now that we're probably lifting more and more memory with longer context tasks.
But at what stage are we asking for that RAM? if it's the inference stage then doesn't that belong to the GPU<>Memory which has nothing to do with the CPU?
I did see they have the unified CPU/GPU memory which may reduce the cost of host/kernel transactions especially now that we're probably lifting more and more memory with longer context tasks.