Some of the most famous models were distributed as F32, e.g. GPT-2. As things have shifted more towards mass consumption of model weights it's become less and less common to see.
> As things have shifted more towards mass consumption of model weights it's become less and less common to see.
Not the real reason. The real reason is that training has moved to FP/BF16 over the years as NVIDIA made that more efficient in their hardware, the same reason you're starting to see some models being released in 8bit formats (deepseek).
Of course people can always quantize the weights to smaller sizes, but the master versions of the weights is usually 16bit.
Some of the most famous models were distributed as F32, e.g. GPT-2. As things have shifted more towards mass consumption of model weights it's become less and less common to see.
> As things have shifted more towards mass consumption of model weights it's become less and less common to see.
Not the real reason. The real reason is that training has moved to FP/BF16 over the years as NVIDIA made that more efficient in their hardware, the same reason you're starting to see some models being released in 8bit formats (deepseek).
Of course people can always quantize the weights to smaller sizes, but the master versions of the weights is usually 16bit.
And on the topic of image generation models, I think all the Stable Diffusion 1.x models were distributed in f32.