Comment by prats226
1 day ago
This is super interesting to think about in LLM world where lot of software is getting replaced with LLM calls.
In terms of output of an LLM, there is no clear promise in the contract, only observable behaviour. Also the observable behaviour is subject to change with every update in LLM. So all the downstream systems have to have evals to counter this.
One good example is claude code where now people have started complaining them switching models effecting their downstream coding workflows.
Yes.
This is the unfortunate thing about wrapping LLMs in API calls to provide services.
Unless you control the model absolutely (even then?) you can prompt the model with a well manicured prompt on Tuesday and get an answer - a block of text - and on Thursday, using the exact same prompt, get a different answer.
This is very hard to build good APIs around. If done expect rare corner case errors that cannot be fixed.
Or reproduced.