← Back to context Comment by nl 3 days ago > You could run it on a cluster of nodesNot sure this is a MBP either. 5 comments nl Reply bigyabai 2 days ago Not even a cluster of Mac Pros could run a dense 5T parameter model with RDMA, to my knowledge. zozbot234 2 days ago SOTA models are reportedly MoE, not dense. bigyabai 2 days ago A 5T MoE model is still bottlenecked by streaming weights from SSD, in addition to compute bottlenecks during prefill and decode. 2 replies →
bigyabai 2 days ago Not even a cluster of Mac Pros could run a dense 5T parameter model with RDMA, to my knowledge. zozbot234 2 days ago SOTA models are reportedly MoE, not dense. bigyabai 2 days ago A 5T MoE model is still bottlenecked by streaming weights from SSD, in addition to compute bottlenecks during prefill and decode. 2 replies →
zozbot234 2 days ago SOTA models are reportedly MoE, not dense. bigyabai 2 days ago A 5T MoE model is still bottlenecked by streaming weights from SSD, in addition to compute bottlenecks during prefill and decode. 2 replies →
bigyabai 2 days ago A 5T MoE model is still bottlenecked by streaming weights from SSD, in addition to compute bottlenecks during prefill and decode. 2 replies →
Not even a cluster of Mac Pros could run a dense 5T parameter model with RDMA, to my knowledge.
SOTA models are reportedly MoE, not dense.
A 5T MoE model is still bottlenecked by streaming weights from SSD, in addition to compute bottlenecks during prefill and decode.
2 replies →