← Back to context

Comment by zcbenz

20 days ago

It is a bug in MLX that has been fixed a few days ago: https://github.com/ml-explore/mlx/pull/3083

So the underlying issue is that the iPhone 16 Pro SKU was misdetected as having Neural Accelerator (nax) support and this caused silently wrong results. Not a problem with the actual hardware.

  • From a debugging point of view, the author's conclusion was still completely reasonable given the evidence they had

    • No it wasn't. A hardware defect so disastrous that it affects floating point computation on the neural engine, yet so minor that it does not affect any of the software on the device utilizing that hardware is exceedingly improbable.

      The conclusion, that it was not the fault of the developer was correct, but assuming anything other than a problem at some point in the software stack is unreasonable.

      7 replies →

  • Apple's documentation is utter garbage, but this code almost seems like a separate issue (and notably the MLX library uses loads of undocumented properties in metal which isn't cool). It looks like the change used to allow the NAX kernel to be used on the iPhone 17 or upcoming 18 if you're on 26.2 or later, to instead only allow it on the iPhone 17 Pro or upcoming 18. I'm fairly sure the GPU arch on the A19 is 17. They changed it so it will only use that kernel on the 17 Pro or upcoming 18, which is notable as the A19 Pro in the 17 Pro has a significantly changed GPU, including GPU tensor cores. The only real change here is that it would limit to the pro variants for the "17" model.

    • > The neural accelerator exists in iPhones going back many years.

      What has existed before is the Apple Neural Engine (ANE) which is very different from the newer Neural Accelerator support within the GPU blocks. In fact MLX does not even support ANE yet since at least in previous versions it was hardware-limited to computing FP16 and INT8 MADDs, and not even that fast.

      2 replies →

Blog post dated 28 Jan 2026, the bug fix posted 29 Jan 2026, so I guess this story had a happy ending :)

Still, sad state of affairs that it seems like Apple is still fixing bugs based on what blog posts gets the most attention on the internet, but I guess once they started that approach, it's hard to stop and go back to figuring out priorities on their own.

  • I think you overestimate the power of a blogpost and the speed of bugfixing at Apple for something like this.

    I almost guarantee there is no way they can read this blogpost, escalate it internally, get the appropriate approval to the work item, actually work on the fix, get it through QA and get it live in production in 3 days. That would only happen on really critical issues, and this is definitely not critical enough for that.

    • Three days is, agreed, too short. A week is just about possible, though...

      I've seen a blog-post, authored a bug in Radar, assigned it to myself, and fixed it the same day. Whether it goes out in the next release is more a decision for the bug-review-board, but since the engineering manager (that would have been me) sits on that too, it's just a matter of timing and seeing if I can argue the case.

      To be fair, the closer we are to a release, the less likely a change is to be accepted unless you can really sweet-talk the rest of the BRB, and there's usually a week of baking before the actual release goes out, but that has sometimes been shrunk for developer-preview releases...

      3 replies →

    • It would have to be a very serious security bug. Even then, unless they've totally upended their software development workflows in the past couple of years, the Apple I knew extremely well from the inside couldn't turn around a software fix this quickly, from PR to OS release, even if its existence depended on it. There's simply too much bureaucracy and process around submitting anything, no matter how vital.

    • Or, one of the developers of the library saw it, decided to fix it in their spare time (does that exist at Apple?) before it became a bigger thing.

      If not, talk about coincident that someone reported an issue and all of that you mentioned was already done before that happened, and the only thing missing was merging the code to the repository which was done after the issue was reported. Not unheard of, but feels less unlikely than "Engineer decided to fix it".

  • MLX is a fairly esoteric library seeing very little usage, mostly to try to foment a broader NN space on Apple devices. This isn't something that is widely affecting people, and most people simply aren't trying to run general LLMs on their iPhone.

    I don't think that fix is specific to this, but it's absolutely true that MLX is trying to lever every advantage it can find on specific hardware, so it's possible it made a bad choice on a particular device.

  • How do you know that it wasn’t merely that the blog post elicited multiple people to file the same duplicate bug in Apple’s radar system, which is how they ostensibly prioritize fixes?

    • I don't, but the effect is the same, "something might land in the news, lets fix it before it does, since multiple people reporting the same issue based on this public post someone made".

Why MLX doesn't just detect apple10 support (for Metal)? That excludes all the devices without NA.