Comment by woadwarrior01

3 months ago

> Neural Engine / ANE is powerful for fixed-shape inference (vision, classification) but autoregressive LLM decode, where you're generating one token at a time with dynamic KV cache, doesn't map as cleanly to ANE today.

What does the ANE have to with this?

Neural Engine (ANE) and the M5 Neural Accelerator (NAX) are not the same thing. NAX can accelerate LLM prefill quite dramatically, although autoregressive decoding remains memory bandwidth bound.

I suspect the biggest blocker for Metal 4 adoption is the macOS Tahoe 26 requirement.

Good correction, thanks. You're right that NAX and ANE are distinct, I shouldn't have conflated them. NAX's ability to accelerate LLM prefill is exactly the kind of capability that could complement MetalRT's decode-focused pipeline. Appreciate the clarification on the Metal 4 / Tahoe requirement too.