← Back to context

Comment by aziis98

1 day ago

> I was able to get I think 20 or 30 tokens per second on CPU (DDR4 ram) alone

What are you using for inference? I have a recent intel laptop with 32GB of DDR5 and I am getting at most 25tps with the llama cpp vulkan backend (that is the fastest, I also tried sycl but it is a bit slower)

Ok, I double-checked, and I get 21-22tps with lmstudio-community/LFM2-24B-A2B-Q4_K_M.gguf running under LM Studio on my i5-12400 with 2x32GB sticks of DDR4 3200. This is with small context (just "Write me a poem about a language model named Liquid" in `lms chat`)

    Prediction Stats:
      Stop Reason: eosFound
      Tokens/Second: 21.10
      Time to First Token: 1.827s
      Prompt Tokens: 42
      Predicted Tokens: 187
      Total Tokens: 229