Comment by GuB-42

6 days ago

Rule 0: Don't panic

Really, that's important. You need to think clearly, deadlines and angry customers are a distraction. That's also when having a good manager who can trust you is important, his job is to shield you from all that so that you can devote all of your attention to solving the problem.

100% agree. I remember I had an on-call and our pagerduty started going off for a SEV-2 and naturally a lot of managers from teams that are affected are in there sweating bullets because their products/features/metrics are impacted. It can get pretty frustrating having so many people try to be cooks in the kitchen. We had a great manager who literally just moved us to a different call/meeting and he told us "ignore everything those people are saying; just stay focused and I'll handle them." Everyone's respect for our manager really went up from there.

There's a story in the book - on nuclear submarines there's a brass bar in front of all the dials and knobs, and the engineers are trained to "grab the bar" when something goes wrong rather than jumping right to twiddling knobs to see what happens.

  • I read this book and took this advice to heart. I don't have a brass bar in the office, but when I'm about to push a button that could cause destructive changes, especially in prod, my hands reflexively fly up into the air while I double-check everything.

    • A weird, yet effective recommendation from someone at my last job: If it's a destructive or dangerous action in prod, touch both your elbows first. This forces ou to take the hands away from the keyboard, stop any possible auto-pilot and look what you're doing.

      2 replies →

  • Thank you for explaining that phrase! I couldn't find it with a quick Google.

I had a boss who used to say that her job was to be a crap umbrella, so that the engineers under her could focus on their actual jobs.

  • I once worked with a company that provided IM services to hyper competitive, testosterone poisoned options traders. On the first fine trading day of a January new year, our IM provider rolled out an incompatible "upgrade" to some DLL that we (our software, hence our customers) relied on, that broke our service. Our customers, ahem, let their displeasure be known.

    Another developer and I were tasked with fixing it. The Customer Service manager (although one of the most conniving political-destructive assholes I have ever not-quite worked with), actually carried a crap umbrella. Instead of constantly flaming us with how many millions of dollars our outage was costing every minute, he held up that umbrella and diverted the crap. His forbearance let us focus. He discretely approached every 20 minutes, toes not quite into entering office, calmly inquiring how it was going. In just over an hour (between his visits 3 and 4), Nate and I had the diagnosis, the fix, and had rolled it out to production, to the relief of pension funds worldwide.

    As much as I dislike the memory of that manager to this day, I praise his wisdom every chance I get.

  • At first I thought you meant an umbrella that doesn't work very well.

    • Ah, the unintentional ambiguity of language, the reason there are so many lawyers in the world and why they are so expensive. The GP's phrasing is not incorrect but your comment made me realize: I only parsed it correctly the first time because I've heard managers use similar phrases so I recognized the metaphor immediately. But for the sake of reducing miscommunication, which sadly tends to trigger so many conflicts, I could offer a couple of disambiguatory alternatives:

      - "her job was to be a crap-umbrella": hyphenate into a compound noun, implies "an umbrella of/for crap" to clarify the intended meaning

      - "her job was to be a crappy umbrella": make the adjective explicit if the intention was instead to describe an umbrella that doesn't work well

  • I always say this too. But the real trick is knowing what to let thru. You can’t just shield your team from everything going on in the organisation. You’re all a part of the organisation and should know enough to have an opinion.

    A better analogy is you’re there to turn down the noise. The team hears what they need to hear and no more.

    Equally, the job of a good manager is to help escalate team concerns. But just as there’s a filter stopping the shit flowing down, you have to know what to flow up too.

  • Ideally it's crap umbrellas all the way down. Everyone should be shielding everyone below them from the crap slithering its way down.

    • Agreed. Even as a relatively junior engineer at my first job, I realized that a certain amount of the job was not worth exposing interns to (like excessive amounts of Jira ticket wrangling) because it would take parts of their already limited time away from doing things that would benefit them far more. Unless there's quite literally no one "below" you, there's probably _something_ you can do to stop shit from flowing down past you.

A corollary to this is always have a good roll-back plan. It's much nicer to be able to roll-back to a working version and then be able to debug without the crisis-level pressure.

  • Rollback ability is a must—it can be the most used mitigation if done right.

    Not all issues can be fixed with a rollback though.

I once worked for a team that, when a serious visible incident occurred, a company VP would pace the floor, occasionally yelling, describing how much money we were losing per second (or how much customer trust if that number was too low) or otherwise communicating that we were in a battlefield situation and things were Very Critical.

Later I worked for a company with a much bigger and more critical website, and the difference in tone during urgent incidents was amazing. The management made itself available for escalations and took a role in externally communicating what was going on, but besides that they just trusted us to do our jobs. We could even go get a glass of water during the incident without a VP yelling at us. I hadn't realized until that point that being calm adults was an option.

Also a pager/phone going off incessantly isn't useful either. manage your alarms or you'll be throwing your phone at a wall.

This is very underrated. Also an extension to this is don’t be afraid to break things further to probe. I often see a lot of devs mid level included panicking and thus preventing them to even know where to start. I’ve come to believe that some people just have an inherent intuition and some just need to learn it.

  • Yes its sometimes instinct takes over when your on the spot in a pinch but there are institutional things you can do to be prepared in advance that expand your set of options in the moment much like a pre-prepared firedrill playbook you can pull from also there are training courses like Kepner-Tregoe but you are right there are just some people who do better than others when _it's hitting the fan.

Uff, yeah. I used to work with a guy who would immediately turn the panic up to 11 at the first thought of a bug in prod. We would end up with worse architecture after his "fix" or he would end up breaking something else.

agreed. it’s practically a prerequisite for everything else in the book. Staying calm and thinking clearly is foundational