Comment by saithound
6 hours ago
Oh yeah, that reminds me. the company did research on how to train a model that manipulates the metrics, allowing them to tick the open source box with a seemingly good score, while releasing something that serves no real purpose. [1] [2]
GPT-OSS is not a near-state-of-the-art model: it is a model deliberately trained in a way that it appears great in the evaluations, but is unusable and far underperforms actual open source models like Ollama. That's scammy.
[1] https://www.lesswrong.com/posts/pLC3bx77AckafHdkq/gpt-oss-is...
[2] https://huggingface.co/openai/gpt-oss-20b/discussions/14
That explains why gpt-oss wasn't working anywhere near as well for me as other similarly and smaller sized models. gemma3 27b, 12b, and phi4 (14b?) all significantly outperformed it when transforming unstructured data to structured data.