← Back to context

Comment by nl

3 days ago

> You could run it on a cluster of nodes

Not sure this is a MBP either.

5 comments

nl

Reply

bigyabai 2 days ago

Not even a cluster of Mac Pros could run a dense 5T parameter model with RDMA, to my knowledge.

zozbot234 2 days ago
SOTA models are reportedly MoE, not dense.
- bigyabai 2 days ago
  
  A 5T MoE model is still bottlenecked by streaming weights from SSD, in addition to compute bottlenecks during prefill and decode.
  
  2 replies →