Comment by cat_plus_plus
1 month ago
At least for transformers, it can be kind of fixed with MOE + NVFP4 for small working set despite large resident size.
1 month ago
At least for transformers, it can be kind of fixed with MOE + NVFP4 for small working set despite large resident size.
No comments yet
Contribute on Hacker News ↗