Comment by simonw

10 months ago

Anthropic have stated on the record several times that they do not update the model weights once they have been deployed without also changing the model ID.

7 comments

simonw

jjani 10 months ago

No, they do change deployed models.

How can I be so sure? Evals. There was a point where Sonnet 3.5 v2 happily output 40k+ tokens in one message if asked. And one day it started with 99% consistency, outputting "Would you like me to continue?" after a lot fewer tokens than that. We'd been running the same set of evals and so could definitively confirm this change. Googling will also reveal many reports of this.

Whatever they did, in practice they lied: API behavior of a deployed model changed.

Another one: Differing performance - not latency but output on the same prompt, over 100+ runs, statistically significant enough to be impossible by random chance - between AWS Bedrock hosted Sonnet and direct Anthropic API Sonnet, same model version.

Don't take at face value what model providers claim.

simonw 10 months ago
If they are lying about changing model weights despite keeping the date-stamped model ID the same it would be a monumental lie.
Anthropic make most of their revenue from paid API usage. Their paying customers need to be able to trust them when they make clear statements about their model deprecation policy.
I'm going to chose to continue to believe them until someone shows me incontrovertible evidence that this isn't true.
- saurik 10 months ago
  
  Maybe they are not changing the model weights but they are making constant tweaks to the system prompt (which isn't in any way better, to be extremely clear).
  
  1 reply →
- jjani 10 months ago
  
  That's a very roundabout way to phrase "you're completely making all of this up", which is quite disappointing tbh. Are you familiar with evals? As in automated testing using multiple runs? It's simple regression testing, just like for deterministic code. Doing multiple runs smooths out any stochastic differences, and the change I explained isn't explainable by stochasticity regardless.
  There is no evidence that would satisfy you then, as it would be exactly what I showed. You'd need a time machine.
  https://www.reddit.com/r/ClaudeAI/comments/1gxa76p/claude_ap...
  Here's just one thread.
  
  2 replies →