Comment by matheusmoreira

16 hours ago

There's at least the possibility that they intentionally degrade the models as time passes. We can't really verify that we're getting what we're paying for all of the time. All the more reason to invest in local inference.

30 comments

matheusmoreira

inigyou 15 hours ago

What if the new model is exactly as good as the last model on launch day but better than the last model was on the new model's launch day because it was degraded? Every single time?

foo42 12 hours ago
Makes me think of [shepherd tones](Shepard tone - Wikipedia https://share.google/xooRbF7wIIhcsTt2J) which sounds like they're rising in pitch indefinitely
- inigyou 1 hour ago
  
  why are you linking to Wikipedia in invalid markdown format, which wouldn't work on HN even if it was valid, to a site called share dot google?
no-name-here 11 hours ago

There are lots of benchmarks to compare the absolute values of different models on the same scale (as opposed to vibes (my apologies for the shorthand), etc.).
matheusmoreira 13 hours ago

The thought has definitely crossed my mind. I don't think it's true because there's definitely an improvement when new models are released.
Maybe the truth is the newest models aren't actually as impressive as we thought. Maybe our perception of progress is being manipulated via months of gradual, silent and unverifiable degradation.

LPisGood 14 hours ago

People talk about this a lot. What I have never seen is a discussion of methods they might employ to degrade the models.

Let’s say I’m a bad faith LLM operator, and I want to degrade my model so the next release looks better and people want to switch to the more expensive one. How would I do that?

nessex 13 hours ago
They would quantize the model. That'd make it cheaper to run, and have slightly worse output but it would still generate outputs with a similar feel, derived from a compressed version of the same knowledge base etc.
They wouldn't even need to do this uniformly, quantized versions of the model could be routed only a subset of the requests. They could do this to nerf the old model, or more likely just to give themselves more hardware to run the new one on by handling more requests on less hardware. Or to handle increased request volume as traffic ramps up faster than hardware can be provisioned.
Playing with local models at various quants, the degradation can be hard to spot. Sometimes it's only noticeable in aggregate. And even then, you never really know if you just got unlucky with a bad response due to RNG.
I've had Opus 4.6 fall into some weirdly incoherent loops that I rarely see from even Sonnet, that felt like the kind of thing I got frequently with Qwen3.5 9B on local. And the above applies... Was that just bad RNG? Or was my request to Opus routed to some lower quality variant? There's no great way for me to tell for any given request, nor any way to guarantee Anthropic _didn't_ do that.
- OccamsMirror 12 hours ago
  
  I have had the same experiences you've had with 4.6 and it was ever since they brought out 4.7. It's fairly obvious they're doing something like you've said here.
  
  1 reply →
- tsss 7 hours ago
  
  And guess what all the providers of open models do: They quantize, badly.
  
  1 reply →
maybe_pablo 13 hours ago
Weight quantization, n-expert capping, routing to smaller model, context window truncation, aggressive sampling constraints, lossy speculative decoding and probably more.
- trollbridge 8 hours ago
  
  I can't prove any of it, but it sure feels like that happens sometimes on Anthropic's platform.
  I don't seem to get any of this with GPT-5.5 or GPT-5.5-Pro (not that I use 5.5-Pro enough to know for sure, but when I do use it, it never seems nerfed).
- alfiedotwtf 11 hours ago
  
  I'm pretty sure you could do n-expert capping on any MoE model with only a handful lines of changes to ik_llama.cpp, but yeah... my bet is the have various quantisations and run the lower ones at peak (along with different system prompts i.e we're GPU-bound right now. Get to the point with less chatter)
Tepix 13 hours ago

Use quantisation.

manyatoms 15 hours ago

Unless what you're getting is really explicitly spelled out in a contract, you should flatly assume that they're doing whatever they like whenever they like.

OtomotO 13 hours ago

Even if it's in the contract, but can't be verified.

taytus 16 hours ago

At current prices, and considering these OS Models' performance, investing in local inference sounds like a bad idea.

matheusmoreira 16 hours ago
Current prices are insane but at this point I'm starting to feel like it's an existential issue. I'm not a US citizen. At any point the USA could come up with some arbitrary export controls. Not having a computer capable of running at least Qwen is starting to actually seem risky to me.
At least it's going to be usable as a very high end gaming PC.
- awakeasleep 15 hours ago
  
  Why would you buy and build everything before the low probability catastrophe strikes, though? You don’t get any benefit from switching early and you pay a big opportunity cost.
  
  8 replies →
- alfiedotwtf 11 hours ago
  
  > At any point the USA could come up with some arbitrary export controls
  lol his already happened with Fable!
jrm4 15 hours ago

At current "proprietary inference company behavior," investing in local inference sounds like the exceedingly far more rational option.
Long term predictability ought to far outweigh a few more cycles of performance.