← Back to context Comment by bigyabai 12 hours ago Not even a cluster of Mac Pros could run a dense 5T parameter model with RDMA, to my knowledge. 2 comments bigyabai Reply zozbot234 11 hours ago SOTA models are reportedly MoE, not dense. bigyabai 2 hours ago A 5T MoE model is still bottlenecked by streaming weights from SSD, in addition to compute bottlenecks during prefill and decode.
zozbot234 11 hours ago SOTA models are reportedly MoE, not dense. bigyabai 2 hours ago A 5T MoE model is still bottlenecked by streaming weights from SSD, in addition to compute bottlenecks during prefill and decode.
bigyabai 2 hours ago A 5T MoE model is still bottlenecked by streaming weights from SSD, in addition to compute bottlenecks during prefill and decode.
SOTA models are reportedly MoE, not dense.
A 5T MoE model is still bottlenecked by streaming weights from SSD, in addition to compute bottlenecks during prefill and decode.