Comment by Tangokat

12 hours ago

"Scaling up performance from M5 and offering the same breakthrough GPU architecture with a Neural Accelerator in each core, M5 Pro and M5 Max deliver up to 4x faster LLM prompt processing than M4 Pro and M4 Max, and up to 8x AI image generation than M1 Pro and M1 Max."

Are they doubling down on local LLMs then?

I still think Apple has a huge opportunity in privacy first LLMs but so far I'm not seeing much execution. Wondering if that will change with the overhaul of Siri this spring.

141 comments

Tangokat

butILoveLife 12 hours ago

I think its just marketing, and the marketing is working. Look how many people bought Minis and ended up just paying for API calls anyway. (Saw it IRL 2x, see it on reddit openclaw daily)

I don't mind it, I open Apple stock. But I'm def not buying into their rebranding of integrated GPU under the guise of Unified Memory.

jsheard 12 hours ago
> Look how many people bought Minis and ended up just paying for API calls anyway. (Saw it IRL 2x, see it on reddit openclaw daily)
Aren't the OpenClaw enjoyers buying Mac Minis because it's the cheapest thing which runs macOS, the only platform which can programmatically interface with iMessage and other Apple ecosystem stuff? It has nothing to do with the hardware really.
Still, buying a brand new Mac Mini for that purpose seems kind of pointless when a used M1 model would achieve the same thing.
- ErneX 12 hours ago
  
  It’s exactly that. They are buying the base model just for that. You are not going to do much local AI with those 16GB of ram anyway, it could be useful for small things but the main purpose of the Mini is being able to interact with the apple apps and services.
  
  8 replies →
- philistine 11 hours ago
  
  There are so few used Mac Mini around, those are all gone and what is left is to buy new.
  
  5 replies →
- BeetleB 11 hours ago
  
  Can't they simply run MacOS on a VM on existing Mac hardware?
  
  3 replies →
- re-thc 11 hours ago
  
  > Aren't the OpenClaw enjoyers buying Mac Minis because it's the cheapest thing which runs macOS
  That's likely only part of the reason. Mac Mini is now "cheap" because everyone exploded in price. RAM and SSD etc have all gone up massively. Not the mention Mac mini is easy out of the box experience.
  
  7 replies →
- renewiltord 8 hours ago
  
  Bro. The used M1 mini and studio are all gone. I was thinking of buying one for local AI before openclaw came out and went back to look and the order book is near empty. Swappa is cleared out. eBay is to the point that the m1 studio is selling for at least a thousand more.
  This arb you’re talking about doesn’t exist. An m1 studio with 64 gb was $1300 prior to openclaw. You’re not getting that today.
  I would have preferred that too since I could Asahi it later. It’s just not cheap any more. The m4 is flat $500 at microcenter.
- llmslave 12 hours ago
  
  yes, and its funny that all these critical people dont know this
rafram 11 hours ago

Why not? The integrated GPUs are quite powerful, and having access to 32+ GB of GPU memory is amazing. There's a reason people buy Macs for local LLM work. Nothing else on the market really beats it right now.
mleo 12 hours ago

My M4 MacBook Pro for work just came a few weeks ago with 128 GB of RAM. Some simple voice customization started using 90GB. The unified memory value is there.
lizknope 11 hours ago
Jeff Geerling had a video of using 4 Mac Studios each with 512GB RAM connected by Thunderbolt. Each machine is around $10K so this isn't cheap but the performance is impressive.
https://www.youtube.com/watch?v=x4_RsUxRjKU
- Greed 11 hours ago
  
  If 40k is the barrier to entry for impressive, that doesn't really sell the usecase of local LLMs very well.
  For the same price in API calls, you could fund AI driven development across a small team for quite a long while.
  Whether that remains the case once those models are no longer subsidized, TBD. But as of today the comparison isn't even close.
  
  12 replies →
tcmart14 9 hours ago

I'm not really into AI and LLMs. I personally don't like anything they output. But the people I know who are into it and into running their own local setups are buying Studios and Minis for their at home local LLM set ups. Really, everyone I personally know who is doing their build your own with local LLMs are doing this. I don't know anyone anymore buying other computers and NVIDIA graphics cards for it.
threatofrain 11 hours ago
The biggest problem with personal ML workflows on Mac right now is the software.
- cmdrmac 11 hours ago
  
  I'm curious to know what software you're referring to.
  
  1 reply →
0x457 7 hours ago

I think people buying those don't realize requirements to run something as big as Opus, they think those gigabytes of memory on Mac studio/mini is a lot only to find out that its "meh" on context of LLMs. Plus most buy it as a gateway into Apple ecosystem for their Claws, iMessage for example.
> But I'm def not buying into their rebranding of integrated GPU under the guise of Unified Memory.
But it is Unified Memory? Thanks to Intel iGPU term is tainted for a long time.
Hamuko 12 hours ago
I've tried to use a local LLM on an M4 Pro machine and it's quite painful. Not surprised that people into LLMs would pay for tokens instead of trying to force their poor MacBooks to do it.
- atwrk 12 hours ago
  
  Local LLM inference is all about memory bandwidth, and an M4 pro only has about the same as a Strix Halo or DGX Spark. That's why the older ultras are popular with the local LLM crowd.
- usagisushi 9 hours ago
  
  Qwen 3.5 35B-A3B and 27B have changed the game for me. I expect we'll see something comparable to Sonnet 4.6 running locally sometime this year.
  
  2 replies →
- freeone3000 11 hours ago
  
  I’m super happy with it for embedding, image recog, and semantic video segmentation tasks.
- giancarlostoro 12 hours ago
  
  What are the other specs and how's your setup look? You need a minimum of 24GB of RAM for it to run 16GB or less models.
  
  5 replies →
- andoando 10 hours ago
  
  Local LLMs are useful for stuff like tool calling
  
  1 reply →

whizzter 12 hours ago

We had a workshop 6 months ago and while I've always been sceptical of OpenAI,etc's silly AGI/ASI claims, the investments have shown the way to a lot of new technology and has opened up a genie that won't be put back into the bottle.

Now extrapolating in line with how Sun servers around year 2000 cost a fortune and can be emulated by a 5$ VPS today, Apple is seeing that they can maybe grab the local LLM workloads if they act now with their integrated chip development.

But to grab that, they need developers to rely less on CUDA via Python or have other proper hardware support for those environments, and that won't happen without the hardware being there first and the machines being able to be built with enough memory (refreshing to see Apple support 128gb even if it'll probably bleed you dry).

fny 12 hours ago
I feel like the push by devs towards Metal compatibility has been 10x than AMD. I assume that's because the majority of us run MacBooks.
- well_ackshually 9 hours ago
  
  The only "push" towards Metal compatibility there's been has been complaints on github issues. Not only has none of the work been done, absolutely nobody in their right mind wants to work on Metal compatibility. Replacing proprietary with proprietary is absolutely nobody's weekend project. or paid project.
  
  1 reply →
- whizzter 11 hours ago
  
  I think that might be partly because on regular PC's you can just go and buy an NVidia card insteaf of fuzzing around with software issues, and for those on laptops they probably hope that something like Zluda will solve it via software shims or MS backed ML api's.
  Basically, too many choices to "focus on" makes non a winner except the incumbent.
- pjmlp 10 hours ago
  
  Which majority?
  I certainly only use Macs when being project assigned, then there are plenty of developers out there whose job has nothing to do with what Apple offers.
  Also while Metal is a very cool API, I rather play with Vulkan, CUDA and DirectX, as do the large majority of game developers.
  
  3 replies →
- davidmurdoch 12 hours ago
  
  Who is "us" in this case? Majority of devs that took the stack overflow survey use Windows:
  https://survey.stackoverflow.co/2025/technology/#1-computer-...
  
  9 replies →
freeone3000 11 hours ago

Torch mlp support on my local macbook outperforms CUDA T4 on Colab.
pjmlp 10 hours ago

Except CUDA feels really cozy, because like Microsoft, NVidia understands the Developers, Developers, Developers mantra.
People always overlook that CUDA is a polyglot ecosystem, the IDE and graphical debugging experience where one can even single step on GPU code, the libraries ecosystem.
And as of last year, NVidia has started to take Python seriously and now with cuTile based JIT, it is possible to write CUDA kernels in pure Python, not having Python generate C++ code that other tools than ingest.
They are getting ahead of Modular, with Python.

woadwarrior01 11 hours ago

> Are they doubling down on local LLMs then?

Neural Accelerators (aka NAX) accelerates matmults with tile sizes >= 32. From a very high level perspective, LLM inference has two phases: (chunked) prefill and decode. The former is matmults (GEMM) and the latter is matrix vector mults (GEMV). Neural Accelerators make the former (prefill) faster and have no impact on the latter.

Lalabadie 12 hours ago

There already are a bunch of task-specific models running on their devices, it makes sense to maintain and build capacity in that area.

I assume they have a moderate bet on on-device SLMs in addition to other ML models, but not much planned for LLMs, which at that scale, might be good as generalists but very poor at guaranteeing success for each specific minute tasks you want done.

In short: 8gb to store tens of very small and fast purpose-specific models is much better than a single 8gb LLM trying to do everything.

Munachi1869 11 hours ago

Probably possible for pure coding models. I see on-device models becoming viable and usable in like 2-3 years on device

tiffanyh 11 hours ago

> Are they doubling down on local LLMs then?

Apple is in the hardware business.

They want you to buy their hardware.

People using Cloud for compute is essentially competitive to their core business.

causal 7 hours ago

"Doubling down on already being the best hardware for local inference"

Sharlin 12 hours ago

"Apple Intelligence is even more capable while protecting users’ privacy at every step."

Remains to be seen how capable it actually is. But they're certainly trying to sell the privacy aspect.

re-thc 11 hours ago

> Remains to be seen how capable it actually is.
It's the best. We all turned it off. 100% privacy.

caycep 9 hours ago

Given all the supply issues w/ Nvidia, I think Apple's AI strategy should be - local AI everything (not just LLMs), but also make Metal competitive w/ CUDA. Their ace in the hole is the unified memory model.

aurareturn 12 hours ago

  Are they doubling down on local LLMs then?

Neural Accelerator was present in iPhone 17 and M5 chip already. This is not new for M5 Pro/Max.

Apple's stated AI strategy is local where it can and cloud where it needs. So "doubling down"? Probably not. But it fits in their strategy.

Aurornis 12 hours ago

The hardware capabilities that make local LLMs fast are useful for a lot of different AI workloads. Local LLMs are a hot topic right now so that’s what the marketing team is using as an example to make it relatable.

ivankra 12 hours ago

But memory bandwidth (bottleneck for LLM inference) is only marginally improved, 614 GB/s vs 546 GB/s for M4/M5 Max - where is this 4x improvement coming from?

I think I'll pass on upgrading.

singhrac 12 hours ago

It’s prompt processing so prefill - that’s compute bound not memory.
0x457 6 hours ago

4x is on Time To First Token it's on the graph.

game_the0ry 12 hours ago

> Are they doubling down on local LLMs then?

Honestly, I think that's the move for apple. They do not seem to have any interest in creating a frontier lab/model -- why would they give the capex and how far behind they are.

But open source models (Kimi, Deepseek, Qwen) are getting better and better, and apple makes excellent hardware for local LLMs. How appealing would it be to have your own LLM that knows all your secrets and doesnt serve you ads/slop, versus OpenAI and SCam Altman having all your secrets? I would seriously consider it even if the performance was not quite there. And no need for subscription + cli tool.

I think apple is in the best position to have native AI, versus the competition which end up being edge nodes for the big 4 frontier labs.

iAMkenough 6 hours ago

RE Frontier models/hardware: I'm interested to see what happens with their "private cloud compute" marketing concept now that they're moving from running Siri AI experiences on Apple servers to Google servers instead.

rafark 7 hours ago

> Are they doubling down on local LLMs then?

I love the push to local llms. But it’s hilarious how apple a few years ago was so reluctant to even mention “AI” in its keynotes and fast forward a couple years they’ve fully embraced it. I mean I like that they embraced it rather than be “different” (stubborn) and stay behind the tech industry. It’s the smart choice. I just think it’s funny.

Someone1234 12 hours ago

Apple's AI strategy really kind of threads the needle cleverly.

"AI" (LLMs) may or may not have a bubble-pop moment, but until it does Apple get to ride it on these press releases and claims. But if the big-pop occurs, then Apple winds up with really fantastic hardware that just happens to be good at AI workloads (as well as general computing).

For example, image classification (e.g. face recognition/photo tagging), ASR+vocoders, image enhancement, OCR, et al, were popular before the current boom, and will likely remain popular after. Even if LLM usage dries up/falls out of vogue, this hardware still offers a significant user benefit.

lamontcg 9 hours ago

LLM usage is not very likely to "dry up".
What is more likely to happen though is that it doesn't take multiple $10B of datacenter and capital to build out models--and the performance against LLM benchmarks starts to max out to the point where throwing more capital at it doesn't make enough of a difference to matter.
Once the costs shrink below $1B then Apple could start building their own models with the $139B in cash and marketable securities that they have--while everyone else has burned through $100B trying to be first.
Of course the problem with this strategy right now is that Siri really, really sucks. They do need to come up with some product improvements now so that they don't get completely lapped.
ChrisGreenHeur 12 hours ago
those things could likely just run fine on the gpu though
- Someone1234 11 hours ago
  
  They could run fine on the CPU too. But these are mobile devices, therefore battery usage is another significant metric. Dedicated hardware is more energy efficient than general hardware, and GPU in particular is a power-hog.
  
  1 reply →
- Nevermark 9 hours ago
  
  Not if GPU RAM is a limiter. Which it is for most models.
  Unified memory is a serious architectural improvement.
  How many GPUs does it take to match the RAM, and make up for the additional communication overhead, of a RAM-maxed Mac? Whatever the answer, it won’t fit in a MacBook Pro’s physical and energy envelopes. Or that of an all-in-one like the Studio.

blueTiger33 10 hours ago

have you seen that github repo where they unlock the true power of NE?

recov 10 hours ago

Have a link?

maherbeg 9 hours ago

Honestly, they can keep waiting for another year or two for on-device models at the size they're looking for to be powerful enough.

icar 11 hours ago

Didn't they announce a partnership with Google Gemini?

jahller 12 hours ago

looks like this will be their angle for the whole agentic AI topic

andy_ppp 12 hours ago

It is simply marketing nonsense - what they really mean (I think) is they support matrix multiplication (matmul) at the hardware level which given AI is mostly matrix multiplications you'll get much faster inference (and some increase in training too) on this new hardware. I'm looking forward to seeing how fast a local 96gb+ LLM is on the M5 Max with 128gb of RAM.

manmal 9 hours ago
We've already established in this thread that memory bandwidth isn't that much greater than M4 Max - 12%? However, I wonder if batched inference will benefit greatly from the vastly improved compute. My guess is that parallel usage of the same model will be a couple times faster. So, single "threaded" use not that much better, but say you want to run a lot of batch jobs, it'd be way faster?
- andy_ppp 6 hours ago
  
  Is this a reply to a different comment?

general_reveal 12 hours ago

It’s not necessarily doubling down on local. The reality is your LLM should be inferencing every tick … the same way your brain thinks every. Fucking. Nano. Second.

So yes, the LLM should be inferencing on your prompt, but it should also be inferencing on 25,000 other things … in parallel.

Those are the compute needs.

We just need compute everywhere as fast as possible.

kilroy123 12 hours ago

I've been so disappointed in Apple's lack of execution on this. There is so much potential for fantastic local models to run and intelligently connect to cloud models.

I just don't get why they're dropping the ball so much on this.

NetMageSCW 11 hours ago
Because it won’t sell enough hardware to matter to them.
They aren’t dropping the ball, they are being smart and prudent.
- kilroy123 9 hours ago
  
  Downvote all you want. Point blank, they are dropping the ball.

ignoramous 11 hours ago

> doubling down on local LLMs

Do think it'll be common to see pros purchasing expensive PCs approaching £25k or more if they could run SoTA multi-modal LLMs faster & locally.

m3kw9 11 hours ago

A useful llm that needs 64gb of ram and mid double digit cores is not useful for 99% of their customers. The LLMs they have on iphone 17's certainly cannot do anything useful other than summerization and stuff. It's a hardware constraint that they have.

jmyeet 12 hours ago

Apple absolutely has a massive opportunity here because they used a shared memory architecture.

So as most people in or adjacent to the AI space know, NVidia gatekeeps their best GPUs with the most memory by making them eye-wateringly expensive. It's a form of market segmentation. So consumer GPUs top out at 16GB (5090 currently) while the best AI GPUs (H200?) is 141GB (I just had to search)? I think the previou sgen was 80GB.

But these GPUs are north of $30k.

Now the Mac Studio tops out currently at 512GB os SHARED memory. That means you can potentially run a much larger model locally without distributing it across machines. Currently that retails at $9500 but that's relatively cheap, in comparison.

But, as it stands now, the best Apple chips have significantly lower memory bandwidth than NVidia GPUs and that really impacts tokens/second.

So I've been waiting to see if Apple will realize this and address it in the next generation of Mac Studios (and, to a lesser extend, Macbook Pros). The H200 seems to be 4.8TB/s. IIRC the 5090 is ~1.8TB/s. The best Apple is (IIRC) 819GB/s on the M3 Ultra.

Apple could really make a dent in NVidia's monopoly here if they address some of these technical limitations.

So I just checked the memory bandwidth of these new chips and it seems like the M5 is 153GB/s, M5 Pro is ~300 and M5 Max is ~600. I was hoping for higher. This isn't a big jump from the M4 generation. I suspect the new Studios will probably barely break 1TB/s. I had been hoping for higher.

SirMaster 11 hours ago
>So consumer GPUs top out at 16GB (5090 currently)
5090 has 32GB, and the 4090 and 3090 both have 24GB.
fridder 8 hours ago

It will be interesting to see the specs on an m5 ultra. Probably have to wait until WWDC at the earliest to see it though
ericd 11 hours ago

Hard to get 6000+ bit memory bus HBM bandwidth out of a 512 or 1024 bit memory bus tied to DDR... I think it's also just tough to physically tie in 512 gigs close enough to the GPU to run at those speeds. But yeah, I wish there was a very competitive local option, too, short of spending $50k+.

lakrici88284 11 hours ago

[dead]

lynx97 12 hours ago

The topic is MacBook, so my criticism is a little off. However, I really dont believe in this "local LLM" promise from Apple. My phone already gets noticeably warm if I answer 5 WhatsApp messages. And looses 5% of battery during the process. I highly doubt Apple will have a useable local LLM that doesn't drain my battery in minutes, before 2030.

cosmic_cheese 12 hours ago
Something is not right if WhatsApp is seriously draining your phone like that. Admittedly I’m not a big WhatsApp user my iPhone hasn’t had any trouble like that with it.
- jakeydus 11 hours ago
  
  Yeah is OP using an iPhone X?

meisel 11 hours ago

What % of users actually care that much about local LLMs? It appears to still be an inferior (though maybe decent) service compared to ChatGPT etc., and requires very top-end hardware. Is privacy _that_ important to people when their Google search history has been a gateway to the soul for years? I wonder if these machines would cost significantly less (or put the cost to other things, e.g. more CPU cores) without this emphasis on LLMs.

barrell 11 hours ago

Privacy is definitely not a cern for the layman, but it is for lots of people, especially pro users. I also haven’t made a google search in years.
I also haven’t seen any improvements in the frontier models in years, and I’m anxiously awaiting local models to catch up.

neya 11 hours ago

> I still think Apple has a huge opportunity in privacy first LLMs

This correlation of Apple and privacy needs to rest. They have consistently proven to be otherwise - despite heavily marketing themselves as "privacy-first"

https://www.theguardian.com/technology/2019/jul/26/apple-con...

4fterd4rk 11 hours ago
I think it's a little telling that the best you can do is a seven year old article.
- neya 10 hours ago
  
  So, somehow now they are the beacons of privacy and we should just ignore their history of spying on their users?
- lern_too_spel 10 hours ago
  
  No other company makes you tell them every application you install on your device. No other company makes you tell them every location you read from your GPS sensor.
matthewfcarlson 9 hours ago

I think it's all about relativity. Are they private compared to an open source privacy focused OS like grapheneOS and the fantastic folks running that project? No. Are they more private than a company like meta or google who has much worse incentives for privacy than Apple? Probably.
Do I wish Apple was way more transparent and gave users more control over gatekeeper and other controversial features that erode privacy? Absolutely.
chaostheory 11 hours ago

Not for everything. Apple has initially focused on edge AI that runs locally per device. It didn’t work out well the first try, but I would still bet on them trying again once compute catches up. Besides, they still have a better track record than the other tech giants.