Comment by Taek
5 hours ago
It's also worth pointing out that comparing a fine-tune to a base model is not apples-to-apples. For example, I have to imagine that the codex finetune of 5.1 is measurably worse at non-coding tasks than the 5.1 base model.
This chart (comparing base models to base models) probably gives a better idea of the total strength of each model.
No comments yet
Contribute on Hacker News ↗