Comment by rainforest

1 month ago

I'm quite surprised to see the need to debug a live server here. I'm of the belief that the need to repro a problem locally and using a debugger lead to better understanding. SSHing into boxen feels like a cowboy behaviour on a modern stack - it shouldn't be necessary with competent observability and unit tests.

Not all environments are equal. Some vendor systems have basically non-existent debugging capabilities that end up dumping you into the wild west when things go wrong.

I have worked with more than one Fintech that provides no test systems/debugging capabilities and have spent time on calls with their developers as we walk through production logs. Not fun.

Sometimes you need to debug the observability stuff a little.

As a general rule, ssh'ing into prod is a terrible idea. Getting into a pre-prod box to figure out why metrics aren't getting pushed and trying something quickly before you go back to making the changes you need to push into the repo, less so.

I see that regularly at startups or other environments where the developers aren't necessarily professional software people (especially ML people!) The update-run-debug cycle is how they think and operate, at every level including on prod servers. Moving beyond that tends to require quite a bit of knowledge and infrastructure, and which infrastructure you need also requires knowledge.