Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library
← Back to context

Comment by reissbaker

8 days ago

It was natively trained in FP4. Probably both to reduce VRAM usage at inference time (fits on a single H100), and to allow better utilization of B200s (which are especially fast for FP4).

2 comments

reissbaker

Reply

irthomasthomas  8 days ago

Interesting, thanks. I didn't know you could even train at FP4 on H100s

  • reissbaker  6 days ago

    It's impressive they got it to work — the lowest I'd heard of this far was native FP8 training.

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities