Comment by stared
24 days ago
Yet, AI is not there yet. Even the top models struggle at simplest SRE tasks.
We just created a benchmark on adding distributed logs (OpenTelemetry instrumentation) to small services, around 300 lines of code.
Claude Opus 4.5 succeed at 29%, GPT 5.2 at 26%, Gemini 3 Pro at 16%.
No comments yet
Contribute on Hacker News ↗