Comment by oreoftw
2 months ago
most likely he was referring the fact that you need plenty of GPU-fast memory to keep the model, and GPU cards have it.
2 months ago
most likely he was referring the fact that you need plenty of GPU-fast memory to keep the model, and GPU cards have it.
There is nothing magical about GPU memory though. It’s just faster. But people have been doing CPU inference since the first llama code came out.