Comment by adastra22
12 days ago
Again, memory bandwidth is pretty much all that matters here. During inference or training the CUDA cores of retail GPUs are like 15% utilized.
12 days ago
Again, memory bandwidth is pretty much all that matters here. During inference or training the CUDA cores of retail GPUs are like 15% utilized.
Not for prompt processing. Current Macs are really not great at long contexts