Comment by exitb
19 hours ago
An operator at load capacity can either refuse requests, or move the knobs (quantization, thinking time) so requests process faster. Both of those things make customers unhappy, but only one is obvious.
19 hours ago
An operator at load capacity can either refuse requests, or move the knobs (quantization, thinking time) so requests process faster. Both of those things make customers unhappy, but only one is obvious.
This is intentional? I think delivering lower quality than what was advertised and benchmarked is borderline fraud, but YMMV.
Per Anthropic’s RCA linked in Ops post for September 2025 issues:
“… To state it plainly: We never reduce model quality due to demand, time of day, or server load. …”
So according to Anthropic they are not tweaking quality setting due to demand.
And according to Google, they always delete data if requested.
And according to Meta, they always give you ALL the data they have on you when requested.
5 replies →
That's about model quality. Nothing about output quality.
I guess I just don't know how to square that with my actual experiences then.
I've seen sporadic drops in reasoning skills that made me feel like it was January 2025, not 2026 ... inconsistent.
5 replies →
Thats what is called an "overly specific denial". It sounds more palatable if you say "we deployed a newly quantized model of Opus and here are cherry picked benchmarks to show its the same", and even that they don't announce publicly.
Personally, I'd rather get queued up on a long wait time I mean not ridiculously long but I am ok waiting five minutes to get correct it at least more correct responses.
Sure, I'll take a cup of coffee while I wait (:
i’d wait any amount of time lol.
at least i would KNOW it’s overloaded and i should use a different model, try again later, or just skip AI assistance for the task altogether.
They don't advertise a certain quality. You take what they have or leave it.
If you aren't defrauding your customers you will be left behind in 2026
That number is a sliding window, isn't it?
> I think delivering lower quality than what was advertised and benchmarked is borderline fraud
welcome to the Silicon Valley, I guess. everything from Google Search to Uber is fraud. Uber is a classic example of this playbook, even.
If there's no way to check, then how can you claim it's fraud? :)
There is no level of quality advertised, as far as I can see.
What is "level of quality"? Doesn't this apply to any product?
1 reply →
I'd wager that lower tok/s vs lower quality of output would be two very different knobs to turn.