Comment by woadwarrior01
9 hours ago
> Are they doubling down on local LLMs then?
Neural Accelerators (aka NAX) accelerates matmults with tile sizes >= 32. From a very high level perspective, LLM inference has two phases: (chunked) prefill and decode. The former is matmults (GEMM) and the latter is matrix vector mults (GEMV). Neural Accelerators make the former (prefill) faster and have no impact on the latter.
No comments yet
Contribute on Hacker News ↗