Comment by greyskull

1 day ago

> task focused small models

This is tangential: and forgive my ignorance here, but is there an inherent reason why there aren't smaller, focused models from the frontier model providers?

I'm thinking something like a software-specific subset of Opus that is the default for use in Claude Code. Smaller, cheaper to deploy and consume, maybe faster.

OpenAI used to make Codex-specific models, but they stopped. What I've gathered from interviews and similar is that training two models isn't worth the (small) lift from having a coding-specific model. You're pre-training on everything anyway, and coding RL is reasonably useful for general-purpose models too.

  • Interesting. I'd have guessed there would be meaningful opex benefits to serving smaller models.

    • What I've heard is that much of the model "intelligence" is a commingled bucket: although you can specialize specific knowledge somewhat, it's hard to specialize advanced reasoning to specific domains because so much of reasoning is a generalized capability that is not unique to, say, coding.

      It turns out coding has to do with a lot of the same reasoning needed in math or in legal analysis, even if the grammatical expression is different.

      This is less true of lower intelligence tasks. Classification requires a lot less reasoning capacity and so can be much smaller and more specialized.