Comment by storystarling

24 days ago

I see it less as SRE and more about defensive backend architecture. When you are dealing with non-deterministic outputs, you can't just monitor for uptime, you have to architect for containment. I've been relying heavily on LangGraph and Celery to manage state, basically treating the LLM as a fuzzy component that needs a rigid wrapper. It feels like we are building state machines where the transitions are probabilistic, so the infrastructure (Redis, queues) has to be much more robust than the code generating the content.