Comment by ionwake

1 month ago

I have an M4 Macbook Air with 32Gb.

These are my current results for my models:

  ┌──────────────────────┬───────────┬─────────────┐
  │        Model         │   Size    │ Tokens/sec  │
  ├──────────────────────┼───────────┼─────────────┤
  │ gemma-4-e4b-it-mlx   │ ~4B (MLX) │ ~10.5 tok/s │
  ├──────────────────────┼───────────┼─────────────┤
  │ qwen3-8b-uncensor-v2 │ 8B        │ ~6.3 tok/s  │
  ├──────────────────────┼───────────┼─────────────┤
  │ qwen3-14b-uncensored │ 14B       │ ~3.5 tok/s  │
  └──────────────────────┴───────────┴─────────────┘

I seem to be doing ok with the Gemma model for file parsing / handling.

1 comment

ionwake

ActorNightly 1 month ago

<=10 tok/sec is unusable. You are faster writing the code yourself.