← Back to context

Comment by littlestymaar

10 months ago

You can still offload most of the model to RAM and use the GPU for compute, but it's obviously much slower than what it would be if everything was on the GPU memory.

see ktransformers: https://www.reddit.com/r/LocalLLaMA/comments/1jpi0n9/ktransf...