Comment by turnsout

9 months ago

I believe mlx will allow you to run the models marginally faster (per a recent blog post by @simonw)

Yeah, you don't necessarily need it but it's optimized for Apple Silicon and in my experience feels like it gives slightly better performance than GGUFs. I really need to formally measure that so I'm not just running on vibes!