Comment by zargon
9 hours ago
Flash is less than 160 GB. No need to quantize to fit in 2x 96 GB. Not sure how much context fits in 30 GB, but it should be a good amount.
9 hours ago
Flash is less than 160 GB. No need to quantize to fit in 2x 96 GB. Not sure how much context fits in 30 GB, but it should be a good amount.
It seems to be 160GB at mixed FP4+FP8 precision, FYI. Full FP8 is 250GB+. (B)F16 at around double I would assume.
There is no BF16. There is no FP8 for the instruct model. The instruct model at full precision is 160 GB (mixed FP4 and FP8). The base model at full precision is 284 GB (FP8). Almost everyone is going to use instruct. But I do love to see base models released.