Comment by zozbot234
25 days ago
So the underlying issue is that the iPhone 16 Pro SKU was misdetected as having Neural Accelerator (nax) support and this caused silently wrong results. Not a problem with the actual hardware.
25 days ago
So the underlying issue is that the iPhone 16 Pro SKU was misdetected as having Neural Accelerator (nax) support and this caused silently wrong results. Not a problem with the actual hardware.
From a debugging point of view, the author's conclusion was still completely reasonable given the evidence they had
No it wasn't. A hardware defect so disastrous that it affects floating point computation on the neural engine, yet so minor that it does not affect any of the software on the device utilizing that hardware is exceedingly improbable.
The conclusion, that it was not the fault of the developer was correct, but assuming anything other than a problem at some point in the software stack is unreasonable.
> yet so minor that it does not affect any of the software on the device utilizing that hardware
You're being unfair here. The showpiece software that uses that hardware wouldn't install, and almost all software ignores it.
1 reply →
> The conclusion, that it was not the fault of the developer was correct, but assuming anything other than a problem at some point in the software stack is unreasonable.
Aah, the old "you're holding it wrong" defense.
1 reply →
Nah.
All neural accelerator hardware models and all neural accelerator software stacks output slightly different results. That is a truth of the world.
The same is true for GPUs and 3d rendering stacks too.
We don't usually notice that, because the tasks themselves tolerate those minor errors. You can't easily tell the difference between an LLM that had 0.00001% of its least significant bits perturbed one way and one that had them perturbed the other.
But you could absolutely construct a degenerate edge case that causes those tiny perturbances to fuck with everything fiercely. And very rarely, this kind of thing might happen naturally.
2 replies →
Apple's documentation is utter garbage, but this code almost seems like a separate issue (and notably the MLX library uses loads of undocumented properties in metal which isn't cool). It looks like the change used to allow the NAX kernel to be used on the iPhone 17 or upcoming 18 if you're on 26.2 or later, to instead only allow it on the iPhone 17 Pro or upcoming 18. I'm fairly sure the GPU arch on the A19 is 17. They changed it so it will only use that kernel on the 17 Pro or upcoming 18, which is notable as the A19 Pro in the 17 Pro has a significantly changed GPU, including GPU tensor cores. The only real change here is that it would limit to the pro variants for the "17" model.
> The neural accelerator exists in iPhones going back many years.
What has existed before is the Apple Neural Engine (ANE) which is very different from the newer Neural Accelerator support within the GPU blocks. In fact MLX does not even support ANE yet since at least in previous versions it was hardware-limited to computing FP16 and INT8 MADDs, and not even that fast.
Sure, I directly and explicitly talked about Apple's version of tensor cores in the GPU. But the ANE is by every definition a neural accelerator. Yes, I'm aware of Apple's weird branding for their tensor cores.
"In fact MLX does not even support ANE yet"
I didn't say otherwise. The ANE is a fantastic unit for small, power-efficient models, like extracting text from images, doing depth modelling, etc. It's not made for LLMs, or the other sorts of experimental stuff MLX is intended for. Though note that MLX's author's reason for not supporting the ANE is that it has a "closed-source" API (https://github.com/ml-explore/mlx/issues/18#issuecomment-184...), making it unsuitable for an open-source project, and given that MLX didn't want to just lean on CoreML. But anyways, the ANE is fantastically fast at what it does, while sipping juice.
In any case, the code change shown should have zero impact on the running of MLX on an iPhone 16 Pro. MLX tries to really leverage platform optimizations so maybe another bifucation is making the wrong choice.
1 reply →
It used to be great, but those days are long gone, see the archived docs.