Comment by ionwake
1 month ago
I have an M4 Macbook Air with 32Gb.
These are my current results for my models:
┌──────────────────────┬───────────┬─────────────┐
│ Model │ Size │ Tokens/sec │
├──────────────────────┼───────────┼─────────────┤
│ gemma-4-e4b-it-mlx │ ~4B (MLX) │ ~10.5 tok/s │
├──────────────────────┼───────────┼─────────────┤
│ qwen3-8b-uncensor-v2 │ 8B │ ~6.3 tok/s │
├──────────────────────┼───────────┼─────────────┤
│ qwen3-14b-uncensored │ 14B │ ~3.5 tok/s │
└──────────────────────┴───────────┴─────────────┘
I seem to be doing ok with the Gemma model for file parsing / handling.
<=10 tok/sec is unusable. You are faster writing the code yourself.