Comment by zargon
5 hours ago
Yes, definitely it's the bottleneck for most use cases besides "chatting". It's the reason I have never bought a Mac for LLM purposes.
It's frustrating when trying to find benchmarks because almost everyone gives decode speed without mentioning prefill speed.
No comments yet
Contribute on Hacker News ↗