Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library

Comment by reissbaker

6 months ago

It was natively trained in FP4. Probably both to reduce VRAM usage at inference time (fits on a single H100), and to allow better utilization of B200s (which are especially fast for FP4).

2 comments

reissbaker

Reply

irthomasthomas  6 months ago

Interesting, thanks. I didn't know you could even train at FP4 on H100s

  • reissbaker  6 months ago

    It's impressive they got it to work — the lowest I'd heard of this far was native FP8 training.

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities