Comment by StarlaAtNight

1 month ago

We should be able to pin to a version of training data history like we can pin to software package versions. Release new updates w/ SemVer and let the people decide if it’s worth upgrading to

I’m sure it will get there as this space matures, but it feels like model updates are very force-fed to users

6 comments

StarlaAtNight

terminalbraid 1 month ago

If you talk to people who deal with inference using large fungible datasets, this is an extremely difficult governance problem. semver is incredibly insufficient and you don't have a well defined meaning of what "upgrade" even means let alone "major", "minor", and "patch".

It's a major disservice to the problem to act like it's new and solved or even solvable using code revision language.

willj 1 month ago

I think the models are so big that they can’t keep many old versions around because they would take away from the available GPUs they use to serve the latest models, and thereby reduce overall throughput. So they phase out older models over time. However, the major providers usually provide a time snapshot for each model, and keep the latest 2-3 available.

Leynos 1 month ago

If you're an API customer, you can pin to a specific dated snapshot of the model.

See the "Snapshots" section on these pages for GPT-4o and 4.1, for example:

https://platform.openai.com/docs/models/gpt-4o https://platform.openai.com/docs/models/gpt-4.1

This is done so that application developers whose systems depend upon specific model snapshots don't have to worry about unexpected changes in behaviour.

You can access these snapshots through OpenRouter too, I believe.

swid 1 month ago

Every model update would be a breaking change, an honest application of SemVer has no place in AI model versions.

Not saying using major.minor depending on architecture is a bad thing, but it wouldn’t be SemVer, and that doesn’t even cover all the different fine tuning / flavors that are done off those models, which generally have no way to order them.

randall 1 month ago

there's figurative and literal though. Figurative semver (this is a system prompt update vs a model train) would actually work ok... at least build numbers.
I think you could actually pretty cleanly map semver onto more structured prompt systems ala modern agent harnesses.

memoriuaysj 1 month ago

that's not enough, the tool definitions change, the agent harness changes, you need to pin a lot of stuff