Comment by dajonker

20 hours ago

Wouldn't be surprised if they slowly start quantizing their models over time. Makes it easier to scale and reduce operational cost. Also makes a new release have more impact as it will be more notably "better" than what you've been using the past couple of days/weeks.

It sure feels like they do this. They claim they don't, but using it every day for 5-10 hours a day. You notice when something changes.

This last week it seems way dumber than before.

I don't think so. There are other knobs they can tweak to reduce load that affect quality less than quantizing. Like trimming the conversation length without telling you, reducing reasoning effort, etc.

Open weights models such as GPT-OSS, Kimi K2.x are trained with 4 bit layers. So it wouldn't come as a surprise if the closed models do similar things. If I compare Kimi K2.5 and Opus 4.5 on openrouter, output tokens are about 8x more expensive for Opus, which might indicate Opus is much larger and doesn't quantize, but the claude subscription plans muddy the waters on price comparison a lot.

I would be surprised tbh.

Anthropic does not exactly act like they're constrained by infra costs in other areas, and noticeably degrading a product when you're in tight competition with 1 or 2 other players with similar products seems like a bad place to start.

I think people just notice the flaws in these models more the longer they use them. Aka the "honeymoon-hangover effect," a real pattern that has been shown in a variety of real world situations.

Oooff yes I think that is exactly the kind of shenanigans they might pull.

Ultimately I can understand if a new model is coming in without as much optimization then it'll add pressure to the older models achieving the same result.

Nice plausible deniability for a convenient double effect.

I haven't noticed much difference in Claude, but I swear gemini 3 pro preview was better in the first week or two and later started feeling like they quantized it down to hell.

Benchmarks like ARG AGI are super price correlated and cheap to run. I think it's very easy to prove that the models are degrading.