Comment by search_facility
2 days ago
MOE basically work that way already, QWEN/etc with low active params (A-number in name) allows to inference big models locally (only active params have to fit into memory)
2 days ago
MOE basically work that way already, QWEN/etc with low active params (A-number in name) allows to inference big models locally (only active params have to fit into memory)
No comments yet
Contribute on Hacker News ↗