Comment by jmyeet

2 hours ago

I didn't see this in the article but elsewhere I've seen the memory bandwidth quoted as 600GB/s [1]. For comparison:

- 5090/6000 Pro: 1792GB/s

- 5080:: 960GB/s

- 5070Ti: 892GB/s

- M3 Ultra: 819GB/s

- DGX Spark: 273GB/s (less than an M5 Pro at 307GB/s)

Memory bandwidth isn't everything but it will cap inference rate pretty heavily. Also, the M3 Ultra is for an almost 2 year old Mac Studio. It's widely expected that it'll be refreshed in Q3 with a likely M5 or M4 Ultra with >1000GB/s. I really hope Apple realizes what a market opportunity Apple has here.

The above shows just how good value the 5090 really is. It basically a RTX 6000 Pro with less RAM (and ~12% fewer CUDA units), which is a ~$10k card, for 20-30% of the price. This also demonstrates how NVidia uses VRAM for market segmentation. As an aside, the true data center cards (eg B100, H100) use HBM memory at ~3.2TB/s.

[1]: https://wccftech.com/nvidia-enters-pc-space-with-rtx-spark/

4 comments

jmyeet

wmf 1 hour ago

Spark memory bandwidth is ~300 GB/s. Internal bandwidth is 600 GB/s but that doesn't matter.

dist-epoch 30 minutes ago

128 GB at 600 GB/s for this versus 32 GB at 1800 GB/s for 5090.

This is much better value than 5090, you can run much bigger models.

jmyeet 5 minutes ago

Here's a pretty detailed breakdown of this [1]:
> tl;dr - For software development, Qwen3.6 27B, 5090 gives you ~3x speed over M5 Max, letting you plow through code, while M5 Max gives you ~4x memory, letting you use higher quantization and bigger context. Which would you choose and why?
I've read a number of things from which the consensus seems to be that yes you can run a larger model and/or have more context with a 128GB+ Mac but the performance gap is still massive and with current hardware we're still talking about inference rates that matter. By this I mean there's a big difference between 10tok/s vs 30. Once we get to t apoint where it's 100 vs 300, it won't be as big of a deal, a bit like FPS in games.
[1]: https://www.reddit.com/r/LocalLLaMA/comments/1t5v2gr/need_ad...

MrBuddyCasino 2 hours ago

Yeah and also the quoted 1 PF is only for sparse models (only half that for dense, if that), and the DGX had serious hardware issues: https://x.com/ID_AA_Carmack/status/1982831774850748825