Comment by bigyabai
2 months ago
For larger contexts, the bottleneck is probably token prefill instead of memory bandwidth. Supposedly prefill is faster on the M5+ GPUs, but still a big hurdle for pre-M5 chips.
2 months ago
For larger contexts, the bottleneck is probably token prefill instead of memory bandwidth. Supposedly prefill is faster on the M5+ GPUs, but still a big hurdle for pre-M5 chips.
No comments yet
Contribute on Hacker News ↗