Comment by PeterStuer

2 hours ago

For real production I find the switching cost is not as trivial as you portray. Even going to a new model version in the same model family, say GPT-4o to GPT-5.2, a transition I just finished on a not too complicated application, requires extensive retesting and tweaking of prompts, guardrails and parameters.

I second this; even switching between minor versions of a model, you need to adjust prompts: the new model is better by implying a bunch of things that, when included in the prompt, will overdo that thing.

Assessing quality of output is often not trivial, either. Typically, problems that are solved by offloading something to an LLM are super subjective, and customers “feel” something is different is vulnerable.

We try to quantify output differences by many different similarity metrics. But a lot of energy goes into subjectively evaluating if something still works.

Maybe OP meant switching in a coding harness way? Not an application using AI? I had similar issues like you in the latter case, but in the former it's trivial.

Vendor diversity is a longstanding risk management principle. For it to work you need to invest in it as you build, not when the rug is pulled.