Comment by cmrdporcupine

1 month ago

I've personally witnessed large variability in behaviour even within a given session -- which makes sense as there's nothing stopping Anthropic from shuttling your context/session around load balanced through many different servers, some of which might be quantized heavily to manage load and others not at all.

I don't know if they do this or not, but the nature of the API is such you could absolutely load balance this way. The context sent at each point is not I believe "sticky" to any server.

TLDR you could get a "stupid" response and then a "smart" response within a single session because of heterogeneous quantization / model behaviour in the cluster.

2 comments

cmrdporcupine

epolanski 1 month ago

I've defended opus in the last weeks but the degradation is tangible. It feels like it degraded by a generation tbh.

cmrdporcupine 1 month ago

it's just extremely variable