Comment by oreoftw

2 months ago

most likely he was referring the fact that you need plenty of GPU-fast memory to keep the model, and GPU cards have it.

2 comments

oreoftw

There is nothing magical about GPU memory though. It’s just faster. But people have been doing CPU inference since the first llama code came out.