Comment by robotswantdata
5 hours ago
Granite or sapphire rapids are very under rated for MoE inference loads. But you need a GPU for the KV cache.
Plus many boards also support CXL for RAM expansion over PCI 5!
Source: building a hybrid inference business for regulated industry workloads.
No comments yet
Contribute on Hacker News ↗