Comment by whackernews

25 days ago

Oh does llama.cpp use MLX or whatever? I had this question, wonder if you know? A search suggests it doesn’t but I don’t really understand.

7 comments

whackernews

>Oh does llama.cpp use MLX or whatever?

No. It runs on MacOS but uses Metal instead of MLX.

zozbot234 25 days ago

ANE-powered inference (at least for prefill, which is a key bottleneck on pre-M5 platforms) is also in the works, per https://github.com/ggml-org/llama.cpp/issues/10453#issuecomm...
OkGoDoIt 25 days ago
Is that better or worse?
- irusensei 24 days ago
  
  Depends.
  MLX is faster because it has better integration with Apple hardware. On the other hand GGUF is a far more popular format so there will be more programs and model variety.
  So its kinda like having a very specific diet that you swear is better for you but you can only order food from a few restaurants.
  
  2 replies →

llama.cpp uses GGML which uses Metal directly.