Comment by whywhywhywhy

10 hours ago

> But inference is unique because its performance scales with high memory throughput, and you can’t assemble that by wiring together off the shelf parts in a consumer form factor.

Nvidia outperforms Mac significantly on diffusion inference and many other forms. It’s not as simple as the current Mac chips are entirely better for this.

101 comments

whywhywhywhy

rafram 9 hours ago

But where are you going to find an Nvidia GPU with 128+ GB of memory at an enthusiast-compatible price?

dabockster 4 hours ago
You don’t need it if you use llamacpp on Windows, or if you compile it on Linux with CUDA 13 and the correct kernel HMM support, and you’re only using MoE models (which, tbh, you should be doing anyways).
- 0x457 3 hours ago
  
  What MoE has to do with it? Aside from Flash-MoE that supports exactly one model and only on macOs - you still need to load entire model into memory. You also don't know what experts going to be activated, so it's not like you can predict which needs to be loaded.
ricardobayes 8 hours ago

That might even be true, but how large is the TAM for such machines?
sippeangelo 6 hours ago
Some Chinese sources sell modded Nvidia GPUs with extra VRAM. They're quite affordable in comparison to even a Mac Pro.
- nextaccountic 5 hours ago
  
  Any links to them? Never heard of this..
  
  3 replies →
- giwook 6 hours ago
  
  And how much do you trust Chinese hardware?
  
  6 replies →
edelans 6 hours ago

and let alone competing on the energy consumption!
colechristensen 3 hours ago
The Nvidia DGX Spark is exactly this and in the same price and performance bracket.
- andreybaskov 1 hour ago
  
  Sadly, memory bandwidth is abysmal compared to Apple chips - 273 GB/s vs 614 GB/s on M5 Max for similar price. Even though fp4 compute is faster, it doesn't help for all the decode heavy agentic workflows.
angoragoats 5 hours ago
You can still buy used 3090 cards on ebay. 5 of them will give you 120GB of memory and will blow away any mac in terms of performance on LLM workloads. They have gone up in price lately and are now about $1100 each, but at one point they were $700-800 each.
- rybosworld 5 hours ago
  
  I don't see how 5x 3090's is a better option than an M3 Ultra Mac studio.
  The mac will just work for models as large as 100B, can go higher with quantized models. And power draw will be 1/5th as much as the 3090 setup.
  You can certainly daisy chain several 3090's together but it doesn't work seamlessly.
  
  5 replies →
embedding-shape 6 hours ago
Where are you gonna find Apple hardware with 128GB of memory at enthusiast-compatible price?
The cheapest Apple desktop with 128GB of memory shows up as costing $3499 for me, which isn't very "enthusiast-compatible", it's about 3x the minimum salary in my country!
- kaashif 6 hours ago
  
  Apple is not catering to minimum salaries in poor countries. Does this really need to be explained?
  $3499 is definitely enthusiast compatible. That's beefy gaming PC tier, which is possibly the canonical example of an enthusiast market.
  This isn't tens of thousands of dollars for top tier Nvidia chips we're talking about.
  
  23 replies →
- tracker1 2 hours ago
  
  I spent aaround that on my current personal desktop... 9950X, 2x48gb ddr5/@6000, RX 9070XT, 4tb gen 5 nvme + 4tb gen 4 nvme. I could have cut the cpu to a 9800x3d and ram to 32gb with a different GPU if my needs/usage were different. I'm running in Linux and don't game too much.
  That said, a higher end gaming setup is going to cost that much and is absolutely in the enthusiast realm. "enthusiast" doesn't mean compatible with "minimum wage"
- mprovost 5 hours ago
  
  The original Mac with 128KB of memory cost $2,495 when Apple released it in 1984. It would be about 3x that in today's money.
  
  1 reply →
- joe_mamba 6 hours ago
  
  > it's about 3x the minimum salary in my country!
  Enthusiast compute hardware doesn't cater to the people on the minimum salary in any country, let alone developing nations. When Ferrari makes a car they don't ask themselves if people on minimum salary will be able to afford them.
  In in the bottom two poorest EU member states and Apple and Microsoft Xbox don't even bother to have a direct to customer store presence here, you buy them from third party retailers.
  Why? Probably because their metrics show people here are too poor to afford their products en-masse to be worth operating a dedicated sales entity. Even though plenty of people do own top of the line Macbooks here, it's just the wealthy enthusiast niche, but it's still a niche for the volumes they (wish to)operate at. Why do you think Apple launched the Mac Neo?
  
  5 replies →

jiwidi 5 hours ago

tell me what pc with an nvidia gpu can you buy with same memory and performance.

I never liked apple hardware, but they are now untouchable since their shift to own sillicon for home hardware.

elorant 4 hours ago

Untouchable my ass. You get a PC that has an ssd glued to the motherboard so if you run write intensive workloads and that thing wears out replacing it will have significant cost. Then there’s no PCie slot to get any decent network card if you want to work more than one of them in unison, you’re stuck with that stupid thunderbolt 5 while Infiniband gives x10 network speeds. As for memory bandwidth, it’s fast compared to CPUs but any enterprise GPU dwarfs it significantly. The unified RAM is the only interesting angle.
Apple could have taken a chunk of the enterprise market now with that AI craze if they had made an upgradable and expandable server edition based on their silicon. But no, everything has to be bolt down and restricted.
traceroute66 5 hours ago
> tell me what pc with an nvidia gpu can you buy with same memory and performance.
And power consumption !
The performance per watt of Apple is unmatched.
- dabockster 4 hours ago
  
  This needs to be sold as the big ticket item for low level devs. Their chips are some of the most power efficient chips on the market right now.
  Hoping they release a blade server version somehow.
  
  4 replies →
- saltyoldman 4 hours ago
  
  I've owned some beefy computers in the past and this tiny little m4 mini on my desk blows them all out of the water easily. It's crazy.
angoragoats 5 hours ago
This has changed since Sam Altman started buying up all the chip supply, raising prices on memory, storage, and GPUs for everyone, but it used to be the case that you could build a PC that was both cheaper and faster than a Mac for LLM inference, with roughly equal performance per watt.
You would use multiple *90-series GPUs, throttled down in terms of power. Depending on the GPU, the sweet spot is between 225-350W, where for LLM workloads you only lose 5-10% of performance for a ~50% drop in power consumption.
Combined with a workstation (Xeon/Epyc) CPU with lots of PCIe, you can support 6-7 such GPUs (or more, depending on available power). This will blow away the fastest Mac studio, at a comparable performance per watt.
Again, a lot of this has changed, since GPUs and memory are so much more expensive now.
Macs are great for a simpler all in one box with high memory bandwidth and middling-to-decent GPU performance, but they are (or were) absolutely not "untouchable."
- Detrytus 5 hours ago
  
  With 6-7 GPUs and EPYC cpu it will also cost 2-3x more than a Mac Studio.
  
  1 reply →

chpatrick 9 hours ago

But they're pretty fast and can have loads of RAM, which would be prohibitively expensive with Nvidia.

chocochunks 9 hours ago
A 128GB 2TB Dell Pro Max with Nvidia GB10 is about $4200, a Mac Studio with 128GB RAM and 2TB storage is $4100. So pretty comparable. I think Dell's pricing has been rocked more by the RAM shortage too.
- adgjlsfhk1 5 hours ago
  
  Unfortunately the GB10 is incredibly bandwith starved. You get 128gb ram, but only 270GB/s bandwidth. The M3 Ultra mac studio gets you 820GB/s. (The M4 max is at 410GB/s. I'm not aware of any workload that gets the GB10 to it's theoretical peakflops.
  
  1 reply →
- midnight_eclair 7 hours ago
  
  ~not unified memory tho~
  
  7 replies →
- traceroute66 5 hours ago
  
  > So pretty comparable.
  The Mac Studio almost certainly uses at least half the power
  (educated guess, I'm too lazy to go look at all the spec sheets and run the numbers)
  
  4 replies →
- plagiarist 6 hours ago
  
  Not quite, what is the vRAM bandwidth of each? The bandwidth is a huge contributor to LLM performance.
  
  1 reply →

wappieslurkz 6 hours ago

Do NVIDIA solutions also outperform the Apple M-series in performance per Watt?

whywhywhywhy 1 hour ago

No, that's why Apple uses Performance Per Watt not actual performance celling as the metric. In actual workloads where you'd need this power then actual performance is what matters not PPW.
Lalabadie 5 hours ago
Probably comparable, but that's only with business-grade products, it's why Apple's current silicon is so remarkable on the market at the consumer level.
- wappieslurkz 5 hours ago
  
  Thanks.

AdamN 9 hours ago

Nvidia isn't selling one-off home computers afaik. But yes in terms of datacenter cloud usage Nvidia performs.

_zoltan_ 7 hours ago
GB300 DGX Station was announced last Monday.
- eitally 4 hours ago
  
  It's going to cost far more than a diy machine with multiple lower end GPUs. Which is fine -- it's aimed at enterprise, not home labs.
newsclues 9 hours ago
https://marketplace.nvidia.com/en-us/enterprise/personal-ai-...
- jamespo 8 hours ago
  
  Amusingly there's a macbook next to it in the pic, is this headless?
  
  1 reply →