Comment by bick_nyers
1 year ago
It will be slower for a 70b model since Deepseek is an MoE that only activates 37b at a time. That's what makes CPU inference remotely feasible here.
1 year ago
It will be slower for a 70b model since Deepseek is an MoE that only activates 37b at a time. That's what makes CPU inference remotely feasible here.
No comments yet
Contribute on Hacker News ↗