← Back to context

Comment by bigyabai

12 hours ago

Not even a cluster of Mac Pros could run a dense 5T parameter model with RDMA, to my knowledge.

2 comments

bigyabai

Reply

zozbot234 11 hours ago

SOTA models are reportedly MoE, not dense.

bigyabai 2 hours ago

A 5T MoE model is still bottlenecked by streaming weights from SSD, in addition to compute bottlenecks during prefill and decode.