Comment by meatmanek
2 days ago
This model is pretty cool if you don't have a GPU - I was able to get I think 20 or 30 tokens per second on CPU (DDR4 ram) alone. (I don't remember if that was with q4 or q8.)
Otherwise, if you have a GPU with more than like 4GB of VRAM, there are better models. Gemma4 and Qwen3.6 (or Qwen3.5 if you need the smaller dense models that haven't yet been released for 3.6) are a good place to start.
> I was able to get I think 20 or 30 tokens per second on CPU (DDR4 ram) alone
What are you using for inference? I have a recent intel laptop with 32GB of DDR5 and I am getting at most 25tps with the llama cpp vulkan backend (that is the fastest, I also tried sycl but it is a bit slower)
Ok, I double-checked, and I get 21-22tps with lmstudio-community/LFM2-24B-A2B-Q4_K_M.gguf running under LM Studio on my i5-12400 with 2x32GB sticks of DDR4 3200. This is with small context (just "Write me a poem about a language model named Liquid" in `lms chat`)