Comment by mike_hearn
8 hours ago
You can disaggregate though. So draft models can run on cheaper hardware with less RAM, saving time on the more expensive machines with more RAM.
8 hours ago
You can disaggregate though. So draft models can run on cheaper hardware with less RAM, saving time on the more expensive machines with more RAM.
No comments yet
Contribute on Hacker News ↗