Comment by tpmoney
6 days ago
> You've had an issue for months and the logs were somehow so arcane, so dense, so unparseable, no one spotted these "metadata corruption issues?" I'm not going to accuse anyone of blatant fabrication, but this is very hard to swallow.
Eh, I've worked on projects where because of un-revisited logging decisions made in the past, 1-10k error logs PER MINUTE were normal things to see. Finding the root cause of an issue often boiled down to multiple attempts at cleaning up logs to remove noise, cleaning up a tangentially related issues and waiting for it to happen again. More than one root cause was discovered by sheer happenstance of looking at the right subset of the logs at the right moment in time. I can absolutely buy that a system built for parsing large amounts of text and teasing patterns out of that found in minutes what humans could not track down over months.
No comments yet
Contribute on Hacker News ↗