Comment by Herring
3 days ago
If this stuff was so revolutionary, don't you guys think Qwen/DeepSeek would have snapped it up already? Both those teams are highly innovative, picking up and inventing new techniques all the time. Hell, Deepseek-v3 was one of the first to do large scale fp8 training.
No comments yet
Contribute on Hacker News ↗