Comment by zettabomb
1 day ago
llama.cpp has built-in support for doing this, and it works quite well. Lots of people running LLMs on limited local hardware use it.
1 day ago
llama.cpp has built-in support for doing this, and it works quite well. Lots of people running LLMs on limited local hardware use it.
llama.cpp has support for running some of or all of the layers on the CPU. It does not swap them into the GPU as needed.