← Back to context Comment by irthomasthomas 7 days ago Oh, I didn't know that. Weird! 3 comments irthomasthomas Reply reissbaker 7 days ago It was natively trained in FP4. Probably both to reduce VRAM usage at inference time (fits on a single H100), and to allow better utilization of B200s (which are especially fast for FP4). irthomasthomas 7 days ago Interesting, thanks. I didn't know you could even train at FP4 on H100s reissbaker 5 days ago It's impressive they got it to work — the lowest I'd heard of this far was native FP8 training.
reissbaker 7 days ago It was natively trained in FP4. Probably both to reduce VRAM usage at inference time (fits on a single H100), and to allow better utilization of B200s (which are especially fast for FP4). irthomasthomas 7 days ago Interesting, thanks. I didn't know you could even train at FP4 on H100s reissbaker 5 days ago It's impressive they got it to work — the lowest I'd heard of this far was native FP8 training.
irthomasthomas 7 days ago Interesting, thanks. I didn't know you could even train at FP4 on H100s reissbaker 5 days ago It's impressive they got it to work — the lowest I'd heard of this far was native FP8 training.
reissbaker 5 days ago It's impressive they got it to work — the lowest I'd heard of this far was native FP8 training.
It was natively trained in FP4. Probably both to reduce VRAM usage at inference time (fits on a single H100), and to allow better utilization of B200s (which are especially fast for FP4).
Interesting, thanks. I didn't know you could even train at FP4 on H100s
It's impressive they got it to work — the lowest I'd heard of this far was native FP8 training.