Comment by chatmasta

17 hours ago

I bet there’s gonna be a banger of a Mac Studio announced in June.

Apple really stumbled into making the perfect hardware for home inference machines. Does any hardware company come close to Apple in terms of unified memory and single machines for high throughput inference workloads? Or even any DIY build?

When it comes to the previous “pro workloads,” like video rendering or software compilation, you’ve always been able to build a PC that outperforms any Apple machine at the same price point. But inference is unique because its performance scales with high memory throughput, and you can’t assemble that by wiring together off the shelf parts in a consumer form factor.

It’s simply not possible to DIY a homelab inference server better than the M3+ for inference workloads, at anywhere close to its price point.

They are perfectly positioned to capitalize on the next few years of model architecture developments. No wonder they haven’t bothered working on their own foundation models… they can let the rest of the industry do their work for them, and by the time their Gemini licensing deal expires, they’ll have their pick of the best models to embed with their hardware.

> But inference is unique because its performance scales with high memory throughput, and you can’t assemble that by wiring together off the shelf parts in a consumer form factor.

Nvidia outperforms Mac significantly on diffusion inference and many other forms. It’s not as simple as the current Mac chips are entirely better for this.

  • But where are you going to find an Nvidia GPU with 128+ GB of memory at an enthusiast-compatible price?

    • You don’t need it if you use llamacpp on Windows, or if you compile it on Linux with CUDA 13 and the correct kernel HMM support, and you’re only using MoE models (which, tbh, you should be doing anyways).

      1 reply →

    • You can still buy used 3090 cards on ebay. 5 of them will give you 120GB of memory and will blow away any mac in terms of performance on LLM workloads. They have gone up in price lately and are now about $1100 each, but at one point they were $700-800 each.

      6 replies →

    • Where are you gonna find Apple hardware with 128GB of memory at enthusiast-compatible price?

      The cheapest Apple desktop with 128GB of memory shows up as costing $3499 for me, which isn't very "enthusiast-compatible", it's about 3x the minimum salary in my country!

      35 replies →

  • tell me what pc with an nvidia gpu can you buy with same memory and performance.

    I never liked apple hardware, but they are now untouchable since their shift to own sillicon for home hardware.

    • Untouchable my ass. You get a PC that has an ssd glued to the motherboard so if you run write intensive workloads and that thing wears out replacing it will have significant cost. Then there’s no PCie slot to get any decent network card if you want to work more than one of them in unison, you’re stuck with that stupid thunderbolt 5 while Infiniband gives x10 network speeds. As for memory bandwidth, it’s fast compared to CPUs but any enterprise GPU dwarfs it significantly. The unified RAM is the only interesting angle.

      Apple could have taken a chunk of the enterprise market now with that AI craze if they had made an upgradable and expandable server edition based on their silicon. But no, everything has to be bolt down and restricted.

    • This has changed since Sam Altman started buying up all the chip supply, raising prices on memory, storage, and GPUs for everyone, but it used to be the case that you could build a PC that was both cheaper and faster than a Mac for LLM inference, with roughly equal performance per watt.

      You would use multiple *90-series GPUs, throttled down in terms of power. Depending on the GPU, the sweet spot is between 225-350W, where for LLM workloads you only lose 5-10% of performance for a ~50% drop in power consumption.

      Combined with a workstation (Xeon/Epyc) CPU with lots of PCIe, you can support 6-7 such GPUs (or more, depending on available power). This will blow away the fastest Mac studio, at a comparable performance per watt.

      Again, a lot of this has changed, since GPUs and memory are so much more expensive now.

      Macs are great for a simpler all in one box with high memory bandwidth and middling-to-decent GPU performance, but they are (or were) absolutely not "untouchable."

      2 replies →

  • But they're pretty fast and can have loads of RAM, which would be prohibitively expensive with Nvidia.

    • A 128GB 2TB Dell Pro Max with Nvidia GB10 is about $4200, a Mac Studio with 128GB RAM and 2TB storage is $4100. So pretty comparable. I think Dell's pricing has been rocked more by the RAM shortage too.

      17 replies →

  • Do NVIDIA solutions also outperform the Apple M-series in performance per Watt?

    • No, that's why Apple uses Performance Per Watt not actual performance celling as the metric. In actual workloads where you'd need this power then actual performance is what matters not PPW.

    • Probably comparable, but that's only with business-grade products, it's why Apple's current silicon is so remarkable on the market at the consumer level.

      1 reply →

Jeff Geerling doing that 1.5TB cluster using 4 Mac Studios was pretty much all the proof needed to demo how the Mac Pro is struggling to find any place any more.

https://www.jeffgeerling.com/blog/2025/15-tb-vram-on-mac-stu...

  • That is the proof what is left is a workaround, just like pilling minis on racks because Apple left the server space.

    Also why Swift nowadays has to have good Linux support, if app developers want to share code with the server.

  • But those Thunderbolt links are slower than modern PCIe. If there's actually a M5-based Mac Studio with the same Thunderbolt support, you'll be better off e.g. for LLM inference, streaming read-only model weights from storage as we've seen with recent experiments than pushing the same amount of data via Thunderbolt. It's only if you want to go beyond local memory constraints (e.g. larger contexts) that the Thunderbolt link becomes useful.

    • Why everyone wants to live in dongle/external cabling/dock hell is beyond me. PCIe cards are powered internally with no extra cables. They are secure. They do not move or fall off of shit. They do not require cable management or external power supplies. They do not have to talk to the CPU through a stupid USB hub or a Thunderbolt dock. Crappy USB HDMI capture on my Mac led me to running a fucking PC with slots to capture video off of a 50 foot HDMI cable, that then streamed the feed to my Mac from NDI, because it was more reliable than the elgarbo capture dongle I was using. This shit is bad. It sucks. It's twice the price and half the quality of a Blackmagic Design capture card. But, no slots, so I guess I can go get fucked.

      1 reply →

    • Wasn't streaming models from storage into limited memory a case where it was impressive that you could make the elephant dance at all?

      If you want to get usable speeds from very large models that haven't been quantitized to death on local machines, RDMA over Thunderbolt enables that use case.

      Consumer PC GPUs don't have enough RAM, enterprise GPUs that can handle the load very well are obscenely expensive, Strix Halo tops out at 128 Gigs of RAM and is limited on Thunderbolt ports.

      3 replies →

  • The proposition of a Mac Pro in the Apple Silicon world wasn't necessarily about performance, it was about the existence of the PCIe slots. I don't think AI becoming a workload for pro Macs means the Mac Pro doesn't have a place, people who were using Mac Pros for audio or video capture didn't stop doing that media work and switched to AI as a profession. That market just wasn't big enough to sustain the Mac Pro in the first place and Apple has finally acknowledged that fact

    • I had a U-Audio PCI card in a Mac Pro during the Intel era of Macs. It was a chip to run their software plugins and the plugins are top of the line. I have a U-Audio box that runs over Thunderbolt now. I know there are people who need device slots, but it's vanishingly few. I'm disappointed that this category of machine is going away, but it stopped being for me in the Apple Silicon era.

    • so many peripherals now come in external boxes that communicate _incredibly quickly_ over Thunderbolt 4/5 that the need for PCIe is marginal, while the cost to support it is significant.

  • Wow spend 40k to get the same tokens/second in QWEN as you would on a 3090

    I have a feeling that Mac fans obsess more about being able to run large models at unusably slow speeds instead of actually using said models for anything.

> Apple really stumbled into making the perfect hardware for home inference machines

For LLMs. For inference with other kinds of models where the amount of compute needed relative to the amount of data transfer needed is higher, Apple is less ideal and systems worh lower memory bandwidth but more FLOPS shine. And if things like Google’s TurboQuant work out for efficient kv-cache quantization, Apple could lose a lot of that edge for LLM inference, too, since that would reduce the amount of data shuffling relative to compute for LLM inference.

  • Or just mean that you could run a 5x bigger model on Apple than before.

    • Well, since its kv-cache that TurboQuant optimizes, it means five times bigger context fits into RAM, all other things being equal, not a five times bigger model. But, sure, with any given context size and the same RAM available, you can instead fit a bigger model—which also takes more compute to get the same performance.

      Anything that increases the necessary compute to fully utilize RAM bandwidth in optimal LLM serving weakens Apples advantage for that.

DGX workstations, expensive but allow PCI cards as well.

https://marketplace.nvidia.com/en-us/enterprise/personal-ai-...

  • It's hilarious that not a single one of these has pricing listed anywhere public.

    I don't think they expect anyone to actually buy these.

    Most companies looking to buy these for developers would ideally have multiple people share one machine and that sort of an arrangement works much more naturally with a managed cloud machine instead of the tower format presented here.

    Confirming my hypothesis, this category of devices more or less absent in the used market. The only DGX workstation on ebay has a GPU from 2017, several generations ago.

    • Nvidia doesn’t list prices because they don’t sell the machines themselves. If you click through each of those links, the prices are listed on the distributor’s website. For example the Dell Pro Max with GB10 is $4,194.34 and you can even click “Add to Cart.”

      3 replies →

    • 'Important' people in organizations get them. They either ask for them, or the team that manages the shared GPU resources gets tired of their shit and they just give them one.

      1 reply →

> ...making the perfect hardware for home inference machines.

I really don't get why anybody would want that. What's the use case there?

If someone doesn't care about privacy, they can use for-profit services because they are basically losing money, trying to corner the market.

If they care about privacy, they can rent cloud instances in order to setup, run, close and it will be both cheaper, faster (if they can afford it) but also with no upfront cost per project. This can be done with a lot of scaffolding, e.g. Mistral, HuggingFace, or not, e.g. AWS/Azure/GoogleCloud, etc. The point being that you do NOT purchase the GPU or even dedicated hardware, e.g. Google TPU, but rather rent for what you actually need and when the next gen is up, you're not stuck with "old" gen.

So... what use case if left, somebody who is both technical, very privacy conscious AND want to do so offline despite have 5G or satellite connectivity pretty much anywhere?

I honestly don't get who that's for (and I did try a dozens of local models, so I'm actually curious).

PS: FWIW https://pricepertoken.com might help but not sure it shows the infrastructure each rely on to compare. If you have a better link please share back.

  • > If they care about privacy, they can rent cloud instances in order to setup, run, close and it will be both cheaper, faster (if they can afford it) but also with no upfront cost per project. This can be done with a lot of scaffolding, e.g. Mistral, HuggingFace, or not, e.g. AWS/Azure/GoogleCloud, etc.

    I'm a somewhat tech heavy guy (compiles my own kernel, uses online hosting, etc).

    Reading your comment doesn't sound appealing at all. I do almost no cloud stuff. I don't know which provider to choose. I have to compare costs. How can I trust they won't peek at my data (no, a Privacy Policy is not enough - I'd need encryption with only me having the key). What do I do if they suddenly jack up the rates or go out of business? I suddenly need a backup strategy as well. And repeat the whole painful loop.

    I'll lose a lot more time figuring this out than with a Mac Studio. I'll probably lose money too. I'll rent from one provider, get stuck, and having a busy life, sit on it a month or two before I find a fix (paying money for nothing). At least if I use the Mac Studio as my primary machine, I don't have to worry about money going to waste because I'm actually utilizing it.

    And chances are, a lot of the data I'll use it with (e.g. mail) is sitting on the same machine anyway. Getting something on the cloud to work with it is yet-another-pain.

    • To your second issue/question, all the cloud provide CMEK services/features (for many years now).

    • > suddenly jack up the rates or go out of business?

      There is basically no lock-in, you don't even "move" your image, your data is basically some "context" or a history of prompts which probably fits in a floppy disk (not even being sarcastic) so if you know the basic about containerization (Docker, podman, etc) which most likely the cloud provider even takes care of, then it takes literally minutes to switch from one to another. It's really not more complex that setting up a PHP server, the only difference is the hardware you run on and that's basically a dropdown button on a Web interface (if you don't want to have scripts for that too) then selecting the right image (basically NVIDIA support).

      Consequently even if that were to happen (which I have NEVER seen! at worst it's like 15% increase after years) then it would actually not matter to you. It's also very unlikely to happen based of the investment poured into the "industry". Basically everybody is trying to get "you" as a customer to rely on their stack.

      ... but OK, let's imagine that's not appealing to you, have you not done the comparison of what a Mac Studio (or whatever hardware) could actually buy otherwise?

      4 replies →

  • I think the main use case is home automation. You don't want details of your home setup leaking out.

  • Genuine question: If I were to fine-tune a model with 10 years of business data in a competitive space, would you feel safe with cloud training?

    • If you already have those 10 years of bussiness data on Microsoft or Google services or their respective clouds, are you feeling safe?

    • I'm not a lawyer but technically most if not all cloud providers, specific to AI ("neo-cloud") or not, to provide Customer-managed encryption keys (CMEK) as someone else pointed out.

      That being said if I were to be in such a situation, and if somehow the guarantees wouldn't be enough then I'd definitely expect to have the budget to build my own data center with GB300 or TPUs. I can't imagine that running it on a Mac Studio.

    • People store that data in databases in the same data centre so it's really the same level of trust needed that your provider adheres to the no training on your data. Trust and lawyers.

I'm not a big fan of reducing computing as a whole to just inference. Apple has done quite a bit besides that and it deserves credit. Mac Pro disappearing from the product line is a testament to it, that their compact solutions can cover all needs, not just local inference, to a degree that an expandable tower is not required at all.

  • Their compact solution doesn't cover all needs, they just decided that they didn't care about some of those needs. The Intel Mac Pro was the last Apple offering with high end GPU capabilities. That's now a market segment they just aren't supporting at all. They didn't figure out how to do it compactly, they just abandoned it wholesale.

    Similarly if your use case depends on a whole lot of fast storage (eg, the 4x NVME to PCI-E x16 bifurcation boards), well that's also now something Apple just doesn't support. They didn't figure out something else. They didn't do super innovative engineering for it. They just walked away from those markets completely, which they're allowed to do of course. It's just not exactly inspiring or "deserves credit" worthy.

  • > Mac Pro disappearing from the product line is a testament to it

    Apple removing/adding something to their product line matters nothing, for all we know, they have a new version ready to be launched next month, or whatever. Unless you work at Apple and/or have any internal knowledge, this is all just guessing, not a "testament" to anything.

CUDA 13 on Linux solves the unified memory problem via HMM and llamacpp. It’s an absolute pain to get running without disabling Secure Boot, but that should be remedied literally next month with the release of Ubuntu 26.04 LTS. Canonical is incorporating signed versions of both the new Nvidia open driver and CUDA into its own repo system, so look out for that. Signed Nvidia modules do already exist right now for RHEL and AlmaLinux, but those aren’t exactly the best desktop OSes.

But yeah, right now Apple actually has price <-> performance captured a lot of you’re buying a new computer just in general.

To me there is a fundamental difference. Even if PC hardware costs slightly more (now because of the RAM situation, Apple producing his chips in house can get better deals of course), it's something that is worth more investing in in.

Maybe you spend 1000$ more for a PC of comparable performance, well tomorrow you need more power, change or add another GPU, add more RAM, add another SSD. A workstation you can keep upgrade it for years, adding a small cost for an upgrade in performance.

An Apple machine is basically throw away: no component inside can be upgraded, you need more RAM? Throw it away and buy a new one. You want a new GPU technology? You have to change the whole thing. And if something inside breaks? You of course throw away the whole computer since everything is soldered on the mainboard.

There is then the software issue, with Apple devices you are forced to use macOS that kind of sucks, especially for a server usage. True nowadays you can install Linux on it, but the GPU it's not that well supported, thus you loose all the benefits. You have to stuck with an OS that sucks, while in the PC market you have plenty of OS choices, Windows, a million of Linux distributions, etc. If I need a workstation to train LLM why do I care about a OS with a GUI? It's only a waste of resources, I just need a thing that runs Linux and I can SSH into it. Also I don't get the benefit of using containers, Docker, etc.

Mac suck even hardware side form a server point of view, for example it's not possible to rack mount them, it's not possible to have redundant PSU, key don't offer remote KVM capability, etc.

  • "Upgrades" havent been a thing for nearly a decade. By the time you want to upgrade a machine part (c. 5yr+ for modern machines), you'd want to upgrade every thing, and its cheap to do so.

    It isnt 2005 any more where RAM/CPU/etc. progress benefits from upgrading every 6mo. It's closer to 6yr to really notice

    • > By the time you want to upgrade a machine part (c. 5yr+ for modern machines), you'd want to upgrade every thing,

      That's only the case for CPU/MB/RAM, because the interfaces are tightly coupled (you want to upgrade your CPU, but the new one uses an AM5 socket so you need to upgrade the motherboard, which only works with DDR5 so you need to upgrade your RAM). For other parts, a "Ship of Theseus" approach is often worth it: you don't need to replace your 2TB NVMe M.2 storage just because you wanted a faster CPU, you can keep the same GPU since it's all PCIe, and the SATA DVD drive you've carried over since the early 2000s still works the same.

      1 reply →

    • That's news to me. I see Mac Minis with external drives plugged-in constantly; I bet those people would appreciate user-servicable storage. I doubt they bought an external drive because they wanted to throw away the whole computer.

      2 replies →

  • you need more RAM? Throw it away and buy a new one.

    Or sell it, which is much easier to do with Macs because they're known quantities and not "Acer Onyx X321 Q-series Ultra".

    There is then the software issue, with Apple devices you are forced to use macOS that kind of sucks, especially for a server usage

    That's a fair point. Apple would get a ton of goodwill if they released enough documentation to let Asahi keep up with new hardware. I can't imagine it would harm their ecosystem; the people who would actually run Linux are either not using Macs at all, or users like me who treat them as Unix workstations and ignore their lock-in attempts.

  • I think most of that is really opinion and experiences. No doubt it’s not designed or built truly for racks but folks have been making rack mounts for Mac minis since they first came out.

    On the upgrade path I don’t think upgrades are truly a thing these days. Aside from storage for most components by the time you get to whatever your next cycle is, it’s usually best/easiest to refresh the whole system unless you underbought the first time around.

  • >>Mac suck even hardware side form a server point of view, for example it's not possible to rack mount them, it's not possible to have redundant PSU, key don't offer remote KVM capability, etc.

    https://atp.fm/683

  • As others have said, that's just not the reality of a modern work machine. If I need a new GPU or more RAM, I'm positive I need everything else upgraded too

  • > with Apple devices you are forced to use macOS that kind of sucks, especially for a server usage

    you can just install linux?

  • > You have to stuck with an OS that sucks, while in the PC market you have plenty of OS choices, Windows, a million of Linux distributions

    Windows is 10x more enshittified than OSX

    > An Apple machine is basically throw away: no component inside can be upgraded, you need more RAM? Throw it away and buy a new one.

    Tell that to all the people rocking 5-10 year old macbook that still run great

Agreed. I’m planning on selling my 512GB M3 Ultra Studio in the next week or so (I just wrenched my back so I’m on bed-rest for the next few days) with an eye to funding the M5 Ultra Studio when it’s announced at WWDC.

I can live without the RAM for a couple of months to get a good price for it, especially since Apple don’t sell that model (with the RAM) any more.

  • Just out of curiosity, where do you think is the best place to sell a machine like that with the lowest risk of being scammed, while still getting the best possible price?

    Wish you a speedy recovery for your back!

    • > Just out of curiosity, where do you think is the best place to sell a machine like that with the lowest risk of being scammed, while still getting the best possible price?

      There are none currently on eBay.co.uk, so I'm going to try there. I'll also try some of the reddit UK-specific groups.

      As far as not being scammed - it's a really high value one-off sale, so it'll either be local pickup (and cash / bank-transfer at the time, which happens in seconds in the UK) or escrow.com (for non-eBay) with the buyer paying all the fees etc.

      I'd prefer local pickup because then I have the money, the buyer can see it working, verify everything to their satisfaction etc. etc.

      > Wish you a speedy recovery for your back!

      Thank you :) It is a little better today. Sitting down is now tolerable for short periods... :)

      2 replies →

As to better or cheaper homelab: depends on the build. AMD AI Max builds do exist, and they also use unified memory. I could argue the competition was, for a long time, selling much more affordable RAM, so you could get a better build outside Apple Silicon.

The typical inference workloads have moved quite a bit in the last six months or so.

Your point would have been largely correct in the first half of 2025.

Now, you're going to have a much better experience with a couple of Nvidia GPUs.

This is because of two reasons - the reasoning models require a pretty high number of tokens per second to do anything useful. And we are seeing small quantized and distilled reasoning models working almost as well as the ones needing terabytes of memory.

The interesting question is whether they'll lean into it intentionally (better tooling, more ML-focused APIs) or just keep treating it as a side effect of their silicon design

  • I think we’ll see a much more robust ecosystem develop around MLX now that agentic coding has reduced the barrier of porting and maintaining libraries to it.

Apple abandoned the pro market long before ever releasing the current iteration of Mac Pro. I doubt they care about getting it back considering its a smaller niche of consumers and probably significantly more investment on the software side.

At best we probably get a chassis to awkwardly daisy chain a bunch of Mac Studios together

For LLMs and other pure memory-bound workloads, but for eg. diffusion models their FPU SIMD performance is lacking.

The new M chips beat basically any PC on video editing. Their new ProRes accelerator chiplet is so good they can’t even compete.

Just a reminder that the old Intel Mac Pro could handle 1.5TB of RAM ... today's Mac Studio can only handle 0.25TB.

Seem odd that a computer from a decade ago could have more than a 1TB of incremental RAM vs what we can buy today from Apple.

> home inference machines.

The market for this use case is tiny

  • For now. In a few years it will be part of every day life, because people will see Apple users enjoying it without thinking about it. You won’t consider it a “home inference machine,” just a laptop with more capabilities than any other vendor offers without a cloud subscription.

    • The average person self hosts literally nothing, why would it be different for inference? Which benefits severely from economies of scale and efficient 24/7 utlization

I do love the Mac Studio. I had a 2019 Mac Pro, the Intel cheesegrater, but my home office upstairs became unpleasant with it pushing out 300W+. I replaced it with the M2 Ultra Studio for a fraction of the heat output (though I did had to buy an OWC 4xNVMe bay).

> I bet there’s gonna be a banger of a Mac Studio announced in June. Apple really stumbled into making the perfect hardware for home inference machines.

This I'm not actually as sure about. The current Studio offerings have done away with the 512GB memory option. I understand the RAM situation, but they didn't change pricing they just discontinued it. So I'm curious to see what the next Studio is like. I'd almost love to see a Studio with even one PCI slot, make it a bit taller, have a slide out cover...

Framework offers the AI Ryzen Max with ̶1̶9̶6̶G̶B̶ 128GB of unified RAM for 2,699$

That's a pretty good deal I would think

https://frame.work/de/de/products/desktop-diy-amd-aimax300/c...

  • The framework desktop is quite cool, but those Ryzen Max CPUs are still a pretty poor competitor to Apple's chips if what you care about it running an LLM. Ryzen Max tops out at 256 GB/s of memory bandwidth, whereas an M4 Max can hit 560 GB/s of bandwidth.

    So even if the model fits in the memory buffer on the Ryzen Max, you're still going to hit something like half the tokens/second just because the GPU will be sitting around waiting for data.

    Personally, I'd rather have the Framework machine, but if running local LLMs is your main goal, the offerings from Apple are very compelling, even when you adjust for the higher price on the Apple machine.

  • 128gb is the max RAM that the current Strix Halo supports with ~250GB/s of bandwidth. The Mac Studio is 256GB max and ~900GB/s of memory bandwidth. They are in different categories of performance, even price-per-dollar is worse. (~$2700 for Framework Desktop vs $7500 for Mac Studio M3 Ultra)

Still, running 2 to 4 5090 will beat anything Apple has to offer for both inference and training.

  • That won’t work for the home hobbyist 2.4KW of GPU alone plus a 350W threadripper pro with enough PCIe lanes to feed them. You’re looking at close to twice the average US household electricity circuit’s capacity just to run the machine under load.

    A cluster of 4 Apple’s M3 ultra Mac studios by comparisons will consume near 1100W under load.

> Apple really stumbled into making the perfect hardware for home inference machines

Apple are winning a small battle for a market that they aren’t very good in. If you compare the performance of a 3090 and above vs any Apple hardware you would be insane to go with the Apple hardware.

When I hear someone say this it’s akin to hearing someone say Macs are good for gaming. It’s such a whiplash from what I know to be reality.

Or another jarring statement - Sam Altman saying Mario has an amazing story in that interview with Elon Musk. Mario has basically the minimum possible story to get you to move the analogue sticks. Few games have less story than Mario. Yet Sam called it amazing.

It’s a statement from someone who just doesn’t even understand the first thing about what they are talking about.

Sorry for the mini rant. I just keep hearing this apple thing over and over and it’s nonsense.

I don't think Apple just stumbled into it, and while I totally agree that Apple is killing it with their unified memory, I think we're going to see a pivot from NVidia and AMD. The biggest reason, I think, is: OpenAI has committed to enormous amount capex it simply cannot afford. It does not have the lead it once did, and most end-users simply do not care. There are no network effects. Anthropic at this point has completely consumed, as far as I can tell, the developer market. The one market that is actually passionate about AI. That's largely due to huge advantage of the developer space being, end users cannot tell if an "AI" coded it or a human did. That's not true for almost every other application of AI at this point.

If the OpenAI domino falls, and I'd be happy to admit if I'm wrong, we're going to see a near catastrophic drop in prices for RAM and demand by the hyperscalers to well... scale. That massive drop will be completely and utterly OpenAI's fault for attempting to bite off more than it can chew. In order to shore up demand, we'll see NVidia and AMD start selling directly to consumers. We, developers, are consumers and drive demand at the enterprises we work for based on what keeps us both engaged and productive... the end result being: the ol' profit flywheel spinning.

Both NVidia and AMD are capable of building GPUs that absolutely wreck Apple's best. A huge reason for this is Apple needs unified memory to keep their money maker (laptops) profitable and performant; and while, it helps their profitability it also forces them into less performant solutions. If NVidia dropped a 128GB GPU with GDDR7 at $4k-- absolutely no one would be looking for a Mac for inference. My 5090 is unbelievably fast at inference even if it can't load gigantic models, and quite frankly the 6-bit quantized versions of Qwen 3.5 are fantastic, but if it could load larger open weight models I wouldn't even bother checking Apple's pricing page.

tldr; competition is as stiff as it is vicious-- Apple's "lead" in inference is only because NVidia and AMD are raking in cash selling to hyperscalers. If that cash cow goes tits up, there's no reason to assume NVidia and AMD won't definitively pull the the rug out from Apple.

  • > A huge reason for this is Apple needs unified memory to keep their money maker (laptops) profitable and performant

    None of the things people care about really get much out of "unified memory". GPUs need a lot of memory bandwidth, but CPUs generally don't and it's rare to find something which is memory bandwidth bound on a CPU that doesn't run better on a GPU to begin with. Not having to copy data between the CPU and GPU is nice on paper but again there isn't much in the way of workloads where that was a significant bottleneck.

    The "weird" thing Apple is doing is using normal DDR5 with a wider-than-normal memory bus to feed their GPUs instead of using GDDR or HBM. The disadvantage of this is that it has less memory bandwidth than GDDR for the same width of the memory bus. The advantage is that normal RAM costs less than GDDR. Combined with the discrete GPU market using "amount of VRAM" as the big feature for market segmentation, a Mac with >32GB of "VRAM" ended up being interesting even if it only had half as much memory bandwidth, because it still had more than a typical PC iGPU.

    The sad part is that DDR5 is the thing that doesn't need to be soldered, unlike GDDR. But then Apple solders it anyway.

    • > None of the things people care about really get much out of "unified memory". GPUs need a lot of memory bandwidth, but CPUs generally don't and it's rare to find something which is memory bandwidth bound on a CPU that doesn't run better on a GPU to begin with. Not having to copy data between the CPU and GPU is nice on paper but again there isn't much in the way of workloads where that was a significant bottleneck.

      the bottleneck in lots of database workloads is memory bandwidth. for example, hash join performance with a build side table that doesn't fit in L2 cache. if you analyze this workload with perf, assuming you have a well written hash join implementation, you will see something like 0.1 instructions per cycle, and the memory bandwidth will be completely maxed out.

      similarly, while there have been some attempts at GPU accelerated databases, they have mostly failed exactly because the cost of moving data from the CPU to the GPU is too high to be worth it.

      i wish aws and the other cloud providers would offer arm servers with apple m-series levels of memory bandwidth per core, it would be a game changer for analytical databases. i also wish they would offer local NVMe drives with reasonable bandwidth - the current offerings are terrible (https://databasearchitects.blogspot.com/2024/02/ssds-have-be...)

      1 reply →

    • >The sad part is that DDR5 is the thing that doesn't need to be soldered, unlike GDDR. But then Apple solders it anyway.

      Apple needs to solder it because they are attaching it directly to the SOC to minimize lead length and that is part of how they are able to get that bandwidth.

      1 reply →

    • Except they don't use DDR5. LPDDR5 is always soldered. LPDDR5 requires short point-to-point connections to give you good SI at high speeds and low voltages. To get the same with DDR5 DIMMs, you'd have something physically much bigger, with way worse SI, with higher power, and with higher latency. That would be a much worse solution. GDDR is much higher power, the solution would end up bigger. Plus it's useless for system memory so now you need two memory types. LPDDR5 is the only sensible choice.

      3 replies →

    • > Not having to copy data between the CPU and GPU is nice on paper but again there isn't much in the way of workloads where that was a significant bottleneck.

      Isn't that also because that's world we have optimized workloads for?

      If the common hardware had unified memory, software would have exploited that I imagine. Hardware and software is in a co-evolutionary loop.

      1 reply →

  • > tldr; competition is as stiff as it is vicious-- Apple's "lead" in inference is only because NVidia and AMD are raking in cash selling to hyperscalers. If that cash cow goes tits up, there's no reason to assume NVidia and AMD won't definitively pull the the rug out from Apple.

    These companies always try to preserve price segmentation, so I don’t have high hopes they’d actually do that. Consumer machines still get artificially held back on basic things like ECC memory, after all . . .

  • No one cares about Metal in that space, plus CUDA already has unified memory for a while.

    https://docs.nvidia.com/cuda/cuda-programming-guide/04-speci...

    Can we also stop giving Apple some prize for unified memory?

    It was the way of doing graphics programming on home computers, consoles and arcades, before dedicated 3D cards became a thing on PC and UNIX workstations.

    • Can we please stop treating this like some 2000s Mac vs PC flame war where you feel the need go full whataboutism whenever anyone acknowledges any positive attribute of any Apple product? If you actually read back over the comments you’re replying to, you’ll see that you’re not actually correcting anything that anyone actually said. This shit is so tiring.

      1 reply →