Comment by sigmar

5 hours ago

That's a different model not in the chart. They're not going to include hundreds of fine tunes in a chart like this.

It's also worth pointing out that comparing a fine-tune to a base model is not apples-to-apples. For example, I have to imagine that the codex finetune of 5.1 is measurably worse at non-coding tasks than the 5.1 base model.

This chart (comparing base models to base models) probably gives a better idea of the total strength of each model.

It's not just one of many fine tunes; it's the default model used by OpenAI's official tools.