Comment by KronisLV
19 hours ago
> There are a multitude of reasons that a system can go down.
Start doing post mortems then!
At the very least, them using any off the shelf service that's shitting the bed would inform others to stay away from it - like an IAM solution, or maybe a particular DB in a specific configuration backing whatever they've written, or a given architecture for a given scale.
Right now it's completely like a black box that sometimes goes down and we don't get much information about why it's so much less stable than other options (hey, if they just came out and said "We're growing 10x faster than we anticipated and system X, Y and Z are not architected for that." that'd also be useful signal).
Or, who knows, maybe it's just bad deploys - seems like it's back for me and claude.ai UI looks a bit different hmmm.
I have no inside knowledge of Anthropic. But having done a lot of postmortems in general, one of the key dynamics that routinely comes up is "we know we keep shipping breakages, and we know these new procedures would prevent many of them, but then we wouldn't be able to deliver new stuff so quickly". Given where Anthropic is at and what they believe about the future of software development, that's a tradeoff that they may very well be intentionally not making.