Comment by coder543

2 months ago

$10k gets you a Mac Studio with 512GB of RAM, which definitely can run GLM-4.7 with normal, production-grade levels of quantization (in contrast to the extreme quantization that some people talk about).

The point in this thread is that it would likely be too slow due to prompt processing. (M5 Ultra might fix this with the GPU's new neural accelerators.)

19 comments

coder543

embedding-shape 2 months ago

> $10k gets you a Mac Studio with 512GB of RAM, which definitely can run GLM-4.7 with normal, production-grade levels of quantization (in contrast to the extreme quantization that some people talk about).

Please do give that a try and report back the prefill and decode speed. Unfortunately, I think again that what I wrote earlier will apply:

> In practice, it'll be incredible slow and you'll quickly regret spending that much money on it

I'd rather place that 10K on a RTX Pro 6000 if I was choosing between them.

rynn 2 months ago
> Please do give that a try and report back the prefill and decode speed.
M4 Max here w/ 128GB RAM. Can confirm this is the bottleneck.
https://pastebin.com/2wJvWDEH
I weighed about a DGX Spark but thought the M4 would be competitive with equal RAM. Not so much.
- cmrdporcupine 2 months ago
  
  I think the DGX Spark will likely underperform the M4 from what I've read.
  However it will be better for training / fine tuning, etc. type workflows.
  
  2 replies →
coder543 2 months ago
> I'd rather place that 10K on a RTX Pro 6000 if I was choosing between them.
One RTX Pro 6000 is not going to be able to run GLM-4.7, so it's not really a choice if that is the goal.
- embedding-shape 2 months ago
  
  No, but the models you will be able to run, will run fast and many of them are Good Enough(tm) for quite a lot of tasks already. I mostly use GPT-OSS-120B and glm-4.5-air currently, both easily fit and run incredibly fast, and the runners haven't even yet been fully optimized for Blackwell so time will tell how fast it can go.
- bigyabai 2 months ago
  
  You definitely could, the RTX Pro 6000 has 96 (!!!) gigs of memory. You could load 2 experts at once at an MXFP4 quant, or one expert at FP8.
  
  8 replies →

benjiro 2 months ago

> $10k gets you a Mac Studio with 512GB of RAM

Because Apple has not adjusted their pricing yet for the new ram pricing reality. The moment they do, its not going to be a $10k system anymore but in the $15k+...

The amount of wafers going to AI is insane and will influence not just memory prices. Do not forget, the only reason why Apple is currently immunity to this, is because they tend to make long term contracts but the moment those expire ... then will push the costs down consumers.

tonyhart7 2 months ago

generous of you to predict apple only make it 50% expensive