So the underlying issue is that the iPhone 16 Pro SKU was misdetected as having Neural Accelerator (nax) support and this caused silently wrong results. Not a problem with the actual hardware.
Apple's documentation is utter garbage, but this code almost seems like a separate issue (and notably the MLX library uses loads of undocumented properties in metal which isn't cool). It looks like the change used to allow the NAX kernel to be used on the iPhone 17 or upcoming 18 if you're on 26.2 or later, to instead only allow it on the iPhone 17 Pro or upcoming 18. I'm fairly sure the GPU arch on the A19 is 17. They changed it so it will only use that kernel on the 17 Pro or upcoming 18, which is notable as the A19 Pro in the 17 Pro has a significantly changed GPU, including GPU tensor cores. The only real change here is that it would limit to the pro variants for the "17" model.
Blog post dated 28 Jan 2026, the bug fix posted 29 Jan 2026, so I guess this story had a happy ending :)
Still, sad state of affairs that it seems like Apple is still fixing bugs based on what blog posts gets the most attention on the internet, but I guess once they started that approach, it's hard to stop and go back to figuring out priorities on their own.
I think you overestimate the power of a blogpost and the speed of bugfixing at Apple for something like this.
I almost guarantee there is no way they can read this blogpost, escalate it internally, get the appropriate approval to the work item, actually work on the fix, get it through QA and get it live in production in 3 days. That would only happen on really critical issues, and this is definitely not critical enough for that.
MLX is a fairly esoteric library seeing very little usage, mostly to try to foment a broader NN space on Apple devices. This isn't something that is widely affecting people, and most people simply aren't trying to run general LLMs on their iPhone.
I don't think that fix is specific to this, but it's absolutely true that MLX is trying to lever every advantage it can find on specific hardware, so it's possible it made a bad choice on a particular device.
How do you know that it wasn’t merely that the blog post elicited multiple people to file the same duplicate bug in Apple’s radar system, which is how they ostensibly prioritize fixes?
Methodology is one thing; I can't really agree that deploying an LLM to do sums is great. Almost as hilarious as asking "What's moon plus sun?"
But phenomenon is another thing. Apple's numerical APIs are producing inconsistent results on a minority of devices. This is something worth Apple's attention.
My mind instantly answered that with "bright", which is what you get when you combine the sun and moon radicals to make 明(https://en.wiktionary.org/wiki/%E6%98%8E)
Anyway, that question is not without reasonable answers. "Full Moon" might make sense too. No obvious deterministic answer, though, naturally.
As an aside, one of my very nice family members like tarot card reading, and I think you'd get an extremely different answer for - "What's moon plus sun?" - something like I would guess as they're opposites - "Mixed signals or insecurity get resolved by openness and real communication." - It's kind of fascinating, the range of answers to that question. As a couple of other people have mentioned, it could mean loads of things. I thought I'd add one in there.
The scary part isn't "LLMs doing sums." It's that the same deterministic model, same weights, same prompt, same OS, produces different floating-point tensors on different devices
Yes, "floating point accumulation doesn't commute" is a mantra everyone should have in their head, and when I first read this article, I was jumping at the bit to dismiss it out of hand for that reason.
But, what got me about this is that:
* every other Apple device delivered the same results
* Apple's own LLM silently failed on this device
to me that behavior suggests an unexpected failure rather than a fundamental issue; it seems Bad (TM) that Apple would ship devices where their own LLM didn't work.
I would go even further and state that "you should never assume that floating point functions will evaluate the same on two different computers, or even on two different versions of the same application", as the results of floating point evaluations can differ depending on platform, compiler optimizations, compilation-flags, run-time FPU environment (rounding mode, &c.), and even memory alignment of run-time data.
There's a C++26 paper about compile time math optimizations with a good overview and discussion about some of these issues [P1383]. The paper explicitly states:
1. It is acceptable for evaluation of mathematical functions to differ between translation time and runtime.
2. It is acceptable for constant evaluation of mathematical functions to differ between platforms.
So C++ has very much accepted the fact that floating point functions should not be presumed to give identical results in all circumstances.
Now, it is of course possible to ensure that floating point-related functions give identical results on all your target machines, but it's usually not worth the hassle.
So true! And as any sane Apple user or the standard template Apple Support person would have suggested (and as they actually suggest) - did they try reinstalling the OS from scratch after having reset the data (of course before backing it up; preferably with a hefty iCloud+ plan)? Because that's the thing to do in such issues and it's very easy.
Reinstalling the OS sucks. I need to pull all my bank cards out of my safe and re-add their CVV's to the wallet, and sometimes authenticate over the phone. And re-register my face. And log back in to all my apps. It can take an hour or so, except it's spread out over weeks as I open an app and realize I need to log in a dozen times.
"Well, now it's Feb. 1st and I have an iPhone 17 Pro Max to test with and... everything works as expected. So it's pretty safe to say that THAT specific instance of iPhone 16 Pro Max was hardware-defective."
That logic is somewhat [1] correct, but it doesn’t say anything about whether all, some, or only this particular iPhone 16 Pro Maxes are hardware-defective.
[1] as the author knows (“MLX uses Metal to compile tensor operations for this accelerator. Somewhere in that stack, the computations are going very wrong”) there’s lots of soft- and firmware in-between the code being run and the hardware of the neural engine. The issue might well be somewhere in those.
>ANE is probably the biggest scam "feature" Apple has ever sold.
It is astonishing how often ANE is smeared on here, largely by people who seem to have literally zero idea what they're talking about. It's often pushed by either/or people who bizarrely need to wave a flag.
MLX doesn't use ANE for the single and only reason that Apple hid the ANE behind CoreML, exposing zero public APIs to utilize ANE directly, and MLX -- being basically an experimental grounds -- wanted to hand roll their implementation around the GPU / CPU. They literally, directly state this as the reason. People inventing technical reasons for why MLX doesn't use ANE are basically just manufacturing a fan fiction. This isn't to say that ANE would be suitable for a lot of MLX tasks, and it is a highly optimized, power-efficient inference hardware that doesn't work for a lot of purposes, but its exclusion is not due to technically unsuitability.
Further, the ANE on both my Mac and my iPhone is constantly attenuating and improving my experience. Little stuff like extracting contents from images. Ever browse in Safari and notice that you can highlight text in the image almost instantly after loading a page? Every image, context and features detected effortlessly. Zero fans cycling up. Power usage at a trickle. It just works. It's the same way that when I take a photo I can search "Maine Coon" and get pictures of my cats, ANE used for subject and feature extraction. Computational photography massively leverages the ANE.
At a trickle of power.
Scam? Yeah, I like my battery lasting for more than a couple of minutes.
Apple intended ANE to bring their own NN augmentations to the OS and thus the user experience, and even the availability in CoreML as a runtime engine is more limited than what Apple's own software can do. Apple basically limits the runtime usage to ensure that no third party apps inhibit or restrict Apple's own use of this hardware.
My personal favorite is iHP48 (previously I used m48+ before it died) running an HP 48GX with metakernal installed as I used through college. Still just so intuitive and fast to me.
I was pretty delighted to realize I could now delete the lame Calculator.app from my iPhone and replace it with something of my choice. For now I've settled on NumWorks, which is apparently an emulator of a modern upstart physical graphing calc that has made some inroads into schools. And of course, you can make a Control Center button to launch an app, so that's what I did.
Honestly, the main beef I have with Calculator.app is that on a screen this big, I ought to be able to see several previous calculations and scroll up if needed. I don't want an exact replica of a 1990s 4-function calculator like the default is (ok, it has more digits and the ability to paste, but besides that, adds almost nothing).
Calculator.app does have history now FWIW, it goes back to 2025 on my device. And you can make the default vertical be a scientific calculator now too.
Also it does some level of symbolic evaluation: sin^-1(cos^-1(tan^-1(tan(cos(sin(9))))))== 9, which is a better result than many standalone calculators.
Also it has a library of built in unit conversations, including live updating currency conversions. You won’t see that on a TI-89!
And I just discovered it actually has a built in 2D/3D graphing ability. Now the question is it allows parametric graphing like the MacOS one…
All that said, obviously the TI-8X family hold a special place in my heart as TI-BASIC was my first language. I just don’t see a reason to use one any more day to day.
I run a TI 83+ emulator on my Android phone when I don't have my physical calculator at hand. Same concept, just learned a different brand of calculators.
built-in calculator apps are surprisingly underbaked... I'm surprised neither of the big two operating systems have elected to ship something comparable to a real calculator built in. It would be nice if we could preview the whole expression as we type it..
Does it bother anyone else that the author drops "MiniMax" there in the article without bothering to explain or footnote what that is? (I could look it up, but I think article authors should call out these things).
There are tons of terms that aren't explained that some people (like me) might not understand. I think it's fine that some articles have a particular audience in mind and write specifically for those, in this case, it seems it's for "Apple mobile developers who make LLM inference engines" so not so unexpected there are terms I (and others) don't understand.
Yes, maybe. But it would be nice if there would be footnotes or tooltips. Putting the explanation in the text itself breaks the flow of the text so that would make it worse indeed.
No because it was obvious from context clues that it was an LLM model. Not every word needs to be defined. Also if you were unsure and decided to search “MiniMax M2.1”, every result would be about the LLM.
Typing on my iPhone in the last few months (~6 months?) has been absolutely atrocious. I've tried disabling/enabling every combination of keyboard setting I can thinkj of, but the predictive text just randomly breaks or it just gives up and stops correcting anything at all.
It’s not just you, and it got bad on my work iPhone at the same time so I know it’s not failing hardware or some customization since I keep that quite vanilla.
Interesting post, but the last bit of logic pointing to the Neural Engine for MLX doesn’t hold up. MLX supports running on CPU, Apple GPU via Metal, and NVIDIA GPU via CUDA: https://github.com/ml-explore/mlx/tree/main/mlx/backend
Good article. Would have liked to see them create a minimal test case, to conclusively show that the results of math operations are actually incorrect.
I'd think other neural-engine using apps would also have weird behavior. Would've been interesting to try a few App Store apps and see the weird behavior
> Or, rather, MiniMax is! The good thing about offloading your work to an LLM is that you can blame it for your shortcomings. Time to get my hands dirty and do it myself, typing code on my keyboard, like the ancient Mayan and Aztec programmers probably did.
They noticed a discrepancy, then went back and wrote code to perform the same operations by hand, without the use of an LLM at all in the code production step. The results still diverged unpredictably from the baseline.
Normally, expecting floating-point MAC operations to produce deterministic results on modern hardware is a fool's errand; they usually operate asynchronously and so the non-commutative properties of floating-point addition rear their head and you get some divergence.
But an order of magnitude difference plus Apple's own LLM not working on this device suggests strongly to me that there is something wrong. Whether it's the silicon or the software would demand more investigation, but this is a well reasoned bug in my book.
I should think I'll probably see someone posting this on the front page of HN tomorrow, no doubt. I first read it when it was already enormously old, possibly nearly 30 years old, in the mid 1980s when I was about 11 or 12 and starting high school, and voraciously reading all the Golden Age Sci-Fi I could lay my grubby wee hands on. I still think about it, often.
I found the article hard to read. I turned on reader mode. I still found it hard to read. Each sentence is very short. My organic CPU spins trying to figure out how each sentence connects to the next. Each sentence feels more like a paragraph, or a tweet, instead of having a flow. I think that's my issue with it.
My TL;DR is that they tried to run an on-device model to classify expenses, it didn't work even for simple cases ("Kasai Kitchin" -> "unknown"), they went deeeeeep down the rabbit hole to figure out why and concluded that inference on their particular model/phone is borked at the hardware level.
Whether you should do this on device is another story entirely.
I would really not want to upload my expense data to some random cloud server, nope. On device is really a benefit even if it's not quite as comprehensive. And really in line with apple's privacy focus so it's very imaginable that many of their customers agree.
The author is debugging the tensor operations of the on-device model with a simple prompt. They confirmed the discrepancy with other iPhone models.
It’s no different than someone testing a calculator with 2+2. If it gets that wrong, there’s a hardware issue. That doesn’t mean the only purpose of the calculator is to calculate 2+2. It is for debugging.
You could just as uncharitably complain that “these days no one does arithmetic anymore, they use a calculator for 2+2”.
I mean, Apple's LLM also doesn't work on this device, plus the author compared the outputs from each iterative calculation on this device vs. others and they diverge from every other Apple device. That's a pretty big sign that both, something is different about that device, and this same broken behavior carried across multiple OS versions. Is the hardware or the software "responsible" - who knows, there's no smoking gun there, but it does seem like something is genuinely wrong.
I don't get the snark about LLMs overall in this context; this author uses LLM to help write their code, but is also clearly competent enough to dig in and determine why things don't work when the LLM fails, and performed an LLM-out-of-the-loop debugging session once they decided it wasn't trustworthy. What else could you do in this situation?
Somewhere along the line, the tensor math that runs an LLM became divergent from every other Apple device. My guess is that there's some kind of accumulation issue here (remembering that floating-point accumulation does not usually commute), but it seems genuinely broken in an unexpected way given that Apple's own LLM also doesn't seem to work on this device.
If you’d read the whole thing, you would go on a debugging journey that both involved bypassing the LLM and was appropriate for HN (vs not dismissing the article), so you might want to do that.
I severely doubt your thesis around iPhones being Veblen goods.
You are claiming that if the price of the iPhone went down, apple would sell fewer phones?
Correspondingly, you are arguing that if they increased prices they could increase sales?
You are claiming that 100s of millions of people have all made the decision that the price of an iPhone is more than it is worth to them as a device, but is made up for by being seen with one in your hand?
Not all goods that signify status are Veblen goods.
>Correspondingly, you are arguing that if they increased prices they could increase sales?
Veblen goods aren't like this. If they were, everything would be priced at infinity. Veblen goods have to take into account the amount of spending money their target customers have, and how much they're willing to spend. Apple products are priced this way. They're not targeted just at people who can afford Rolls-Royce Silver Shadows, they're targeted at regular people who are willing to spend too much money on a phone when they can get an equivalent Android phone for half the price. Those people have limited money, but they're willing to overpay, but only so much.
>You are claiming that if the price of the iPhone went down, apple would sell fewer phones?
Quite likely, yes. If they adopted razor-thin profit margins on iPhones, their phones would be seen as "cheap" and wouldn't have the cachet they have now. More people would start looking at alternatives, and start buying Samsung Galaxies and other flagship Android phones.
I mean, I think it's cultural. In US it seems like everyone has an iphone, it's almost kinda quirky not to have one. But in some other places, an iPhone is more than your monthly salary - having one is definitely a symbol of status. Less so than it used to be, but it still has that.
Admittedly, I hate companies that live off their marketing. Nintendo, Disney, Apple. I hate that these companies can weaponize psychology against humans.
Can you prove that is still the case with the iPhone SE by showing a comparable hardware with similar long support on software updates and lower price?
> Its a demonstration of wealth. This is called Veblen good
Just the other day I was reminded of the poor little "I am rich" iOS app (a thousand dollar ruby icon that performed diddly squat by design), which Apple deep-sixed from the app store PDQ.
They asked MiniMax on their computer to make an iPhone app that didn't work.
It didn't work using the Apple Intelligence API. So then:
* They asked Minimax to use MLX instead. It didn't work.
* They Googled and found a thread where Apple Intelligence also didn't work for other people, but only sometimes.
* They HAND WROTE the MLX code. It didn't work. They isolated the step where the results diverged.
> Better to dig in a bit more.
The author already did 100% of the digging and then some.
Look, I am usually an AI rage-enthusiast. But in this case the author did every single bit of homework I would expect and more, and still found a bug. They rewrote the test harness code without an LLM. I don't find the results surprising insofar as that I wouldn't expect MAC to converge across platforms, but the fact that Apple's own LLM doesn't work on their hardware and their own is an order of magnitude off is a reasonable bug report, in my book.
Fascinating the claim is Apple Intelligence doesn't work altogether. Quite a scandal.
EDIT: If you wouldn't mind, could you edit out "AI rage enthusiast" you edited in? I understand it was in good humor, as you describe yourself that way as well. However, I don't want to eat downvotes on an empty comment that I immediately edited when you explained it wasn't minimax! People will assume I said something naughty :) I'm not sure it was possible to read rage into my comment.
neural nets or AI are very bad at math, it can only produce what's in the training data. So if you have trained it from 1+1 to 8+8 it can't do 9+9, it's not like a child brain that it can make logical conclusions.
> Update on Feb. 1st:
> Well, now it's Feb. 1st and I have an iPhone 17 Pro Max to test with and... everything works as expected. So it's pretty safe to say that THAT specific instance of iPhone 16 Pro Max was hardware-defective.
It is a bug in MLX that has been fixed a few days ago: https://github.com/ml-explore/mlx/pull/3083
So the underlying issue is that the iPhone 16 Pro SKU was misdetected as having Neural Accelerator (nax) support and this caused silently wrong results. Not a problem with the actual hardware.
From a debugging point of view, the author's conclusion was still completely reasonable given the evidence they had
8 replies →
Apple's documentation is utter garbage, but this code almost seems like a separate issue (and notably the MLX library uses loads of undocumented properties in metal which isn't cool). It looks like the change used to allow the NAX kernel to be used on the iPhone 17 or upcoming 18 if you're on 26.2 or later, to instead only allow it on the iPhone 17 Pro or upcoming 18. I'm fairly sure the GPU arch on the A19 is 17. They changed it so it will only use that kernel on the 17 Pro or upcoming 18, which is notable as the A19 Pro in the 17 Pro has a significantly changed GPU, including GPU tensor cores. The only real change here is that it would limit to the pro variants for the "17" model.
4 replies →
Blog post dated 28 Jan 2026, the bug fix posted 29 Jan 2026, so I guess this story had a happy ending :)
Still, sad state of affairs that it seems like Apple is still fixing bugs based on what blog posts gets the most attention on the internet, but I guess once they started that approach, it's hard to stop and go back to figuring out priorities on their own.
I think you overestimate the power of a blogpost and the speed of bugfixing at Apple for something like this.
I almost guarantee there is no way they can read this blogpost, escalate it internally, get the appropriate approval to the work item, actually work on the fix, get it through QA and get it live in production in 3 days. That would only happen on really critical issues, and this is definitely not critical enough for that.
6 replies →
Just goes to show that attention is all you need.
1 reply →
MLX is a fairly esoteric library seeing very little usage, mostly to try to foment a broader NN space on Apple devices. This isn't something that is widely affecting people, and most people simply aren't trying to run general LLMs on their iPhone.
I don't think that fix is specific to this, but it's absolutely true that MLX is trying to lever every advantage it can find on specific hardware, so it's possible it made a bad choice on a particular device.
Extremely bad timing on my end then, should've waited for a few more days
I don’t think so. You can see the issue ticket linked in the PR. Whether that issue ticket is related to the blog post is unknown https://github.com/ml-explore/mlx-swift-examples/issues/462
How do you know that it wasn’t merely that the blog post elicited multiple people to file the same duplicate bug in Apple’s radar system, which is how they ostensibly prioritize fixes?
1 reply →
Why MLX doesn't just detect apple10 support (for Metal)? That excludes all the devices without NA.
Kinda sucks how it seems like there’s no CI that runs on hardware.
Methodology is one thing; I can't really agree that deploying an LLM to do sums is great. Almost as hilarious as asking "What's moon plus sun?"
But phenomenon is another thing. Apple's numerical APIs are producing inconsistent results on a minority of devices. This is something worth Apple's attention.
(This is a total digression, so apologies)
My mind instantly answered that with "bright", which is what you get when you combine the sun and moon radicals to make 明(https://en.wiktionary.org/wiki/%E6%98%8E)
Anyway, that question is not without reasonable answers. "Full Moon" might make sense too. No obvious deterministic answer, though, naturally.
FTR the Full Moon was exactly 5 hours ago (It's not without humour that this conversation occurs on the day of the full moon :)
You could play Infinite Craft and find out what the game thinks it is: https://neal.fun/infinite-craft/
Edit: Spoiler -
It's 'Eclipse'
In the game Clair Obscur sun plus moon equals twilight.
> What's moon plus sun?
Eclipse, obviously.
That’s sun minus moon. Moon plus sun is a wildly more massive, nuclear furnace of a moon that also engulfs the earth.
12 replies →
Not obvious. Astronomers are actively looking for signatures of exomoons around exoplanets. So "sun plus moon" could mean that too.
4 replies →
Moon plus sun would be sun because the sun would be an absorbing element.
3 replies →
The set of celestial objects visible to the naked eye during the day.
INSUFFICIENT DATA FOR MEANINGFUL ANSWER.
As an aside, one of my very nice family members like tarot card reading, and I think you'd get an extremely different answer for - "What's moon plus sun?" - something like I would guess as they're opposites - "Mixed signals or insecurity get resolved by openness and real communication." - It's kind of fascinating, the range of answers to that question. As a couple of other people have mentioned, it could mean loads of things. I thought I'd add one in there.
I'll just add that if you think this advice applies to you, it's the - https://en.wikipedia.org/wiki/Barnum_effect
> What's moon plus sun?
"Monsoon," says ChatGPT.
"moonsun" says JavaScript, 1-0 to JS I'd say.
> Almost as hilarious as asking "What's moon plus sun?"
It’s a reasonable Tarot question.
The scary part isn't "LLMs doing sums." It's that the same deterministic model, same weights, same prompt, same OS, produces different floating-point tensors on different devices
[dead]
Low level numerical operation optimizations are often not reproduceable. For example: https://www.intel.com/content/dam/develop/external/us/en/doc... (2013)
But it's still surprising that that LLM doesn't work on iPhone 16 at all. After all LLMs are known for their tolerance to quantization.
Yes, "floating point accumulation doesn't commute" is a mantra everyone should have in their head, and when I first read this article, I was jumping at the bit to dismiss it out of hand for that reason.
But, what got me about this is that:
* every other Apple device delivered the same results
* Apple's own LLM silently failed on this device
to me that behavior suggests an unexpected failure rather than a fundamental issue; it seems Bad (TM) that Apple would ship devices where their own LLM didn't work.
> floating point accumulation doesn't commute
It is commutative (except for NaN). It isn't associative though.
14 replies →
I would go even further and state that "you should never assume that floating point functions will evaluate the same on two different computers, or even on two different versions of the same application", as the results of floating point evaluations can differ depending on platform, compiler optimizations, compilation-flags, run-time FPU environment (rounding mode, &c.), and even memory alignment of run-time data.
There's a C++26 paper about compile time math optimizations with a good overview and discussion about some of these issues [P1383]. The paper explicitly states:
1. It is acceptable for evaluation of mathematical functions to differ between translation time and runtime.
2. It is acceptable for constant evaluation of mathematical functions to differ between platforms.
So C++ has very much accepted the fact that floating point functions should not be presumed to give identical results in all circumstances.
Now, it is of course possible to ensure that floating point-related functions give identical results on all your target machines, but it's usually not worth the hassle.
[P1383]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p13...
1 reply →
FYI, the saying is "champing at the bit", it comes from horses being restrained.
8 replies →
As a sister comment said, floating point computations are commutative, but not associative.
a * b = b * a for all "normal" floating point numbers.
I wish he would have tried on a different iPhone 16 Pro Max to see if the defect was specific to that individual device.
So true! And as any sane Apple user or the standard template Apple Support person would have suggested (and as they actually suggest) - did they try reinstalling the OS from scratch after having reset the data (of course before backing it up; preferably with a hefty iCloud+ plan)? Because that's the thing to do in such issues and it's very easy.
Reinstalling the OS sucks. I need to pull all my bank cards out of my safe and re-add their CVV's to the wallet, and sometimes authenticate over the phone. And re-register my face. And log back in to all my apps. It can take an hour or so, except it's spread out over weeks as I open an app and realize I need to log in a dozen times.
14 replies →
Latest update at the bottom of the page.
"Well, now it's Feb. 1st and I have an iPhone 17 Pro Max to test with and... everything works as expected. So it's pretty safe to say that THAT specific instance of iPhone 16 Pro Max was hardware-defective."
That logic is somewhat [1] correct, but it doesn’t say anything about whether all, some, or only this particular iPhone 16 Pro Maxes are hardware-defective.
[1] as the author knows (“MLX uses Metal to compile tensor operations for this accelerator. Somewhere in that stack, the computations are going very wrong”) there’s lots of soft- and firmware in-between the code being run and the hardware of the neural engine. The issue might well be somewhere in those.
Yeah, that would've been the cleanest experiment
The author is assuming Metal is compiled to ANE in MLX. MLX is by-and-large GPU-based and not utilizing ANE, barring some community hacks.
ANE is probably the biggest scam "feature" Apple has ever sold.
>ANE is probably the biggest scam "feature" Apple has ever sold.
It is astonishing how often ANE is smeared on here, largely by people who seem to have literally zero idea what they're talking about. It's often pushed by either/or people who bizarrely need to wave a flag.
MLX doesn't use ANE for the single and only reason that Apple hid the ANE behind CoreML, exposing zero public APIs to utilize ANE directly, and MLX -- being basically an experimental grounds -- wanted to hand roll their implementation around the GPU / CPU. They literally, directly state this as the reason. People inventing technical reasons for why MLX doesn't use ANE are basically just manufacturing a fan fiction. This isn't to say that ANE would be suitable for a lot of MLX tasks, and it is a highly optimized, power-efficient inference hardware that doesn't work for a lot of purposes, but its exclusion is not due to technically unsuitability.
Further, the ANE on both my Mac and my iPhone is constantly attenuating and improving my experience. Little stuff like extracting contents from images. Ever browse in Safari and notice that you can highlight text in the image almost instantly after loading a page? Every image, context and features detected effortlessly. Zero fans cycling up. Power usage at a trickle. It just works. It's the same way that when I take a photo I can search "Maine Coon" and get pictures of my cats, ANE used for subject and feature extraction. Computational photography massively leverages the ANE.
At a trickle of power.
Scam? Yeah, I like my battery lasting for more than a couple of minutes.
Apple intended ANE to bring their own NN augmentations to the OS and thus the user experience, and even the availability in CoreML as a runtime engine is more limited than what Apple's own software can do. Apple basically limits the runtime usage to ensure that no third party apps inhibit or restrict Apple's own use of this hardware.
What community hacks?
What I meant is, if you can somehow get it working, it is not currently a supported first-party thing, not that I am aware of such thing existing.
I clicked hoping this would be about how old graphing calculators are generally better math companions than a phone.
The best way to do math on my phone I know of is the HP Prime emulator.
PCalc -- because it runs on every Apple platform since the Mac Classic:
https://pcalc.com/mac/thirty.html
My other favorite calculator is free42, or its larger display version plus42
https://thomasokken.com/plus42/
For a CAS tool on a pocket mobile device, I haven't found anything better than MathStudio (formerly SpaceTime):
https://mathstud.io
You can run that in your web browser, but they maintain a mobile app version. It's like a self-hosted Wolfram Alpha.
The last one was interesting but both apps haven't been updated in 4 years. Hard to pay for something like that.
They do have some new AI math app that's regularly updated
My personal favorite is iHP48 (previously I used m48+ before it died) running an HP 48GX with metakernal installed as I used through college. Still just so intuitive and fast to me.
I still have mine. Never use it though as I'm not handy with RPN anymore. :'(
I was pretty delighted to realize I could now delete the lame Calculator.app from my iPhone and replace it with something of my choice. For now I've settled on NumWorks, which is apparently an emulator of a modern upstart physical graphing calc that has made some inroads into schools. And of course, you can make a Control Center button to launch an app, so that's what I did.
Honestly, the main beef I have with Calculator.app is that on a screen this big, I ought to be able to see several previous calculations and scroll up if needed. I don't want an exact replica of a 1990s 4-function calculator like the default is (ok, it has more digits and the ability to paste, but besides that, adds almost nothing).
Calculator.app does have history now FWIW, it goes back to 2025 on my device. And you can make the default vertical be a scientific calculator now too.
Also it does some level of symbolic evaluation: sin^-1(cos^-1(tan^-1(tan(cos(sin(9))))))== 9, which is a better result than many standalone calculators.
Also it has a library of built in unit conversations, including live updating currency conversions. You won’t see that on a TI-89!
And I just discovered it actually has a built in 2D/3D graphing ability. Now the question is it allows parametric graphing like the MacOS one…
All that said, obviously the TI-8X family hold a special place in my heart as TI-BASIC was my first language. I just don’t see a reason to use one any more day to day.
1 reply →
I looked at that calculator. But HP Prime and TI-89 have CAS systems that can do symbolic math, so I prefer to emulate them.
I run a TI 83+ emulator on my Android phone when I don't have my physical calculator at hand. Same concept, just learned a different brand of calculators.
built-in calculator apps are surprisingly underbaked... I'm surprised neither of the big two operating systems have elected to ship something comparable to a real calculator built in. It would be nice if we could preview the whole expression as we type it..
I use the NumWorks emulator app whenever I need something more advanced. It's pretty good https://www.numworks.com/simulator/
4 replies →
GraphNCalc83 is awesome [0].
[0] https://apps.apple.com/us/app/graphncalc83/id744882019
Anytime I have to do some serious amount of math, I have to go dig around and find my TI-84, everything is just burned into muscle memory
I use the "RealCalc" app on my phone. It's pretty similar to my old HP48.
HP Prime emulator still wins for actually solving equations
Does it bother anyone else that the author drops "MiniMax" there in the article without bothering to explain or footnote what that is? (I could look it up, but I think article authors should call out these things).
There are tons of terms that aren't explained that some people (like me) might not understand. I think it's fine that some articles have a particular audience in mind and write specifically for those, in this case, it seems it's for "Apple mobile developers who make LLM inference engines" so not so unexpected there are terms I (and others) don't understand.
I think articles are worse when they have to explain everything someone off the street might not know.
Yes, maybe. But it would be nice if there would be footnotes or tooltips. Putting the explanation in the text itself breaks the flow of the text so that would make it worse indeed.
No because it was obvious from context clues that it was an LLM model. Not every word needs to be defined. Also if you were unsure and decided to search “MiniMax M2.1”, every result would be about the LLM.
MiniMax is a company. It isn’t a term of art or something. It would be like defining Anthropic.
minimax is an algorithm for choosing the next move in an n-player game, discovered by John von Neumann in 1928
Posting some code that reproduces the bug could help not only Apple but you and others.
Maybe this is why my damn keyboard predictive text is so gloriously broken
Oh it's not just me?
Typing on my iPhone in the last few months (~6 months?) has been absolutely atrocious. I've tried disabling/enabling every combination of keyboard setting I can thinkj of, but the predictive text just randomly breaks or it just gives up and stops correcting anything at all.
I haven't watched the video, but clearly there's a broad problem with the iOS keyboard recently.
https://news.ycombinator.com/item?id=46232528 ("iPhone Typos? It's Not Just You - The iOS Keyboard is Broken")
It’s not just you, and it got bad on my work iPhone at the same time so I know it’s not failing hardware or some customization since I keep that quite vanilla.
It’s gotten so bad that I’m half convinced it’s either (a) deliberately trolling, or (b) ‘optimising’ for speech to text adoption.
Interesting post, but the last bit of logic pointing to the Neural Engine for MLX doesn’t hold up. MLX supports running on CPU, Apple GPU via Metal, and NVIDIA GPU via CUDA: https://github.com/ml-explore/mlx/tree/main/mlx/backend
>"What is 2+2?" apparently "Applied.....*_dAK[...]" according to my iPhone
At least the machine didn't say it was seven!
Maybe Trurl and Klapaucius were put in charge of Q&A.
Good article. Would have liked to see them create a minimal test case, to conclusively show that the results of math operations are actually incorrect.
I love to see real debugging instead of conspiracy theories!
Did you file a radar? (silently laughing while writing this, but maybe there's someone left at Apple who reads those)
IKR - this is very typical
I'd think other neural-engine using apps would also have weird behavior. Would've been interesting to try a few App Store apps and see the weird behavior
What expense app are you building? I really want an app that helps me categorize transactions for budgeting purposes. Any recommendations?
The real lesson here isn't even about Apple. It's about debugging culture
I also would like to see if the same error happens in another phone with the exactly same model.
So the LLM is working as intended?
[dead]
[dead]
[dead]
[flagged]
[flagged]
> Or, rather, MiniMax is! The good thing about offloading your work to an LLM is that you can blame it for your shortcomings. Time to get my hands dirty and do it myself, typing code on my keyboard, like the ancient Mayan and Aztec programmers probably did.
They noticed a discrepancy, then went back and wrote code to perform the same operations by hand, without the use of an LLM at all in the code production step. The results still diverged unpredictably from the baseline.
Normally, expecting floating-point MAC operations to produce deterministic results on modern hardware is a fool's errand; they usually operate asynchronously and so the non-commutative properties of floating-point addition rear their head and you get some divergence.
But an order of magnitude difference plus Apple's own LLM not working on this device suggests strongly to me that there is something wrong. Whether it's the silicon or the software would demand more investigation, but this is a well reasoned bug in my book.
> Time to get my hands dirty and do it myself, typing code on my keyboard, like the ancient Mayan and Aztec programmers probably did.
https://ia800806.us.archive.org/20/items/TheFeelingOfPower/T...
I should think I'll probably see someone posting this on the front page of HN tomorrow, no doubt. I first read it when it was already enormously old, possibly nearly 30 years old, in the mid 1980s when I was about 11 or 12 and starting high school, and voraciously reading all the Golden Age Sci-Fi I could lay my grubby wee hands on. I still think about it, often.
I found the article hard to read. I turned on reader mode. I still found it hard to read. Each sentence is very short. My organic CPU spins trying to figure out how each sentence connects to the next. Each sentence feels more like a paragraph, or a tweet, instead of having a flow. I think that's my issue with it.
If it was written in turgid prose people would be frantically waggling their AI accusatory fingers.
1 reply →
My TL;DR is that they tried to run an on-device model to classify expenses, it didn't work even for simple cases ("Kasai Kitchin" -> "unknown"), they went deeeeeep down the rabbit hole to figure out why and concluded that inference on their particular model/phone is borked at the hardware level.
Whether you should do this on device is another story entirely.
Why shouldn't you? It's your device, it has hardware made specifically for inference.
What's to be gained, other than battery life, by offloading inference to someone else? To be lost, at least, is your data ownership and perhaps money.
3 replies →
I would really not want to upload my expense data to some random cloud server, nope. On device is really a benefit even if it's not quite as comprehensive. And really in line with apple's privacy focus so it's very imaginable that many of their customers agree.
[flagged]
Well it seems that, those days, instead of SUM(expense1,expense2) you ask an LLM to "make an app that will compute the total of multiple expenses".
If I read most of the news on this very website, this is "way more efficient" and "it saves time" (and those who don’t do it will lose their job)
Then, when it produces wrong output AND it is obvious enough for you to notice, you blame the hardware.
The author is debugging the tensor operations of the on-device model with a simple prompt. They confirmed the discrepancy with other iPhone models.
It’s no different than someone testing a calculator with 2+2. If it gets that wrong, there’s a hardware issue. That doesn’t mean the only purpose of the calculator is to calculate 2+2. It is for debugging.
You could just as uncharitably complain that “these days no one does arithmetic anymore, they use a calculator for 2+2”.
The app-making wasn't being done on the phone.
The LLM that malfunctioned was there to slap categories on things. And something was going wrong in either the hardware or the compiler.
I mean, Apple's LLM also doesn't work on this device, plus the author compared the outputs from each iterative calculation on this device vs. others and they diverge from every other Apple device. That's a pretty big sign that both, something is different about that device, and this same broken behavior carried across multiple OS versions. Is the hardware or the software "responsible" - who knows, there's no smoking gun there, but it does seem like something is genuinely wrong.
I don't get the snark about LLMs overall in this context; this author uses LLM to help write their code, but is also clearly competent enough to dig in and determine why things don't work when the LLM fails, and performed an LLM-out-of-the-loop debugging session once they decided it wasn't trustworthy. What else could you do in this situation?
Somewhere along the line, the tensor math that runs an LLM became divergent from every other Apple device. My guess is that there's some kind of accumulation issue here (remembering that floating-point accumulation does not usually commute), but it seems genuinely broken in an unexpected way given that Apple's own LLM also doesn't seem to work on this device.
LLMs are applied math, so… both?
[flagged]
[flagged]
If you’d read the whole thing, you would go on a debugging journey that both involved bypassing the LLM and was appropriate for HN (vs not dismissing the article), so you might want to do that.
It's not about LLMs doing math.
Uhh, that's not the article, the article is running a ml model, on phone and floating point opps for tensor multiplication seems to be off.
[flagged]
[flagged]
[flagged]
[flagged]
I severely doubt your thesis around iPhones being Veblen goods.
You are claiming that if the price of the iPhone went down, apple would sell fewer phones?
Correspondingly, you are arguing that if they increased prices they could increase sales?
You are claiming that 100s of millions of people have all made the decision that the price of an iPhone is more than it is worth to them as a device, but is made up for by being seen with one in your hand?
Not all goods that signify status are Veblen goods.
>Correspondingly, you are arguing that if they increased prices they could increase sales?
Veblen goods aren't like this. If they were, everything would be priced at infinity. Veblen goods have to take into account the amount of spending money their target customers have, and how much they're willing to spend. Apple products are priced this way. They're not targeted just at people who can afford Rolls-Royce Silver Shadows, they're targeted at regular people who are willing to spend too much money on a phone when they can get an equivalent Android phone for half the price. Those people have limited money, but they're willing to overpay, but only so much.
>You are claiming that if the price of the iPhone went down, apple would sell fewer phones?
Quite likely, yes. If they adopted razor-thin profit margins on iPhones, their phones would be seen as "cheap" and wouldn't have the cachet they have now. More people would start looking at alternatives, and start buying Samsung Galaxies and other flagship Android phones.
1 reply →
This is a conclusion that comes with some personal baggage you should identify and consider addressing.
I mean, I think it's cultural. In US it seems like everyone has an iphone, it's almost kinda quirky not to have one. But in some other places, an iPhone is more than your monthly salary - having one is definitely a symbol of status. Less so than it used to be, but it still has that.
2 replies →
Admittedly, I hate companies that live off their marketing. Nintendo, Disney, Apple. I hate that these companies can weaponize psychology against humans.
Function > Form.
I think its a Hero Complex, if Jung is correct.
11 replies →
Can you prove that is still the case with the iPhone SE by showing a comparable hardware with similar long support on software updates and lower price?
> Its a demonstration of wealth. This is called Veblen good
Just the other day I was reminded of the poor little "I am rich" iOS app (a thousand dollar ruby icon that performed diddly squat by design), which Apple deep-sixed from the app store PDQ.
If misery loves company, Veblen goods sure don't.
.
Can you read the article a little more closely?
> - MiniMax can't fit on an iPhone.
They asked MiniMax on their computer to make an iPhone app that didn't work.
It didn't work using the Apple Intelligence API. So then:
* They asked Minimax to use MLX instead. It didn't work.
* They Googled and found a thread where Apple Intelligence also didn't work for other people, but only sometimes.
* They HAND WROTE the MLX code. It didn't work. They isolated the step where the results diverged.
> Better to dig in a bit more.
The author already did 100% of the digging and then some.
Look, I am usually an AI rage-enthusiast. But in this case the author did every single bit of homework I would expect and more, and still found a bug. They rewrote the test harness code without an LLM. I don't find the results surprising insofar as that I wouldn't expect MAC to converge across platforms, but the fact that Apple's own LLM doesn't work on their hardware and their own is an order of magnitude off is a reasonable bug report, in my book.
Emptied out post, thanks for the insight!
Fascinating the claim is Apple Intelligence doesn't work altogether. Quite a scandal.
EDIT: If you wouldn't mind, could you edit out "AI rage enthusiast" you edited in? I understand it was in good humor, as you describe yourself that way as well. However, I don't want to eat downvotes on an empty comment that I immediately edited when you explained it wasn't minimax! People will assume I said something naughty :) I'm not sure it was possible to read rage into my comment.
6 replies →
neural nets or AI are very bad at math, it can only produce what's in the training data. So if you have trained it from 1+1 to 8+8 it can't do 9+9, it's not like a child brain that it can make logical conclusions.
My thousand dollar iPhone can't even add a contact from a business card.
> Update on Feb. 1st: > Well, now it's Feb. 1st and I have an iPhone 17 Pro Max to test with and... everything works as expected. So it's pretty safe to say that THAT specific instance of iPhone 16 Pro Max was hardware-defective.
nothing to see here.