← Back to context

Comment by Trasmatta

4 years ago

I’m having this struggle with projects at work. I’ve had to push back on all sorts of requests from our project manager and designer so that our team has the bandwidth to focus on some very important stability and security concerns for our next release. They want all sorts of additional fancy bells and whistles (that don’t add much user value or functionality) that we just can’t focus on right now, because it would come at the expense of making sure the feature is actually stable and secure.

I’m almost positive there will be some amount of blowback when the thing releases and there are no stability or security problems…which was only because I made sure we spent the necessary time on them.

Yep, the curse of doing things right. You can't prove it was needed.

You need to demonstrate that you are fixing genuine problems, or you will eventually be replaced by someone who delivers faster, even if there are subsequent bugs.

One way to do this is to negotiate with the business in what needs doing, using risk. If you think there is a risk of a security or stability issue then you should be able to assess that risk. The business can then choose to accept the risk and add some features, or fix the risk. It is essential that the owner of the system officially accepts the risks presented. You cannot own the risks.

This lets the business prioritise the work according to its risk appetite. And if the shit hits the fan, you are not only covered but your reputation will increase.

  • While this works with rational actors the experience I have had in the industry is often the opposite. In fact, the company I work for now is probably the only company I've worked for in the last decade that actually correctly evaluates risk. The average corporate drone overseeing the engineer org is very typically the least rational actor in the entire org.

    Given the opportunity most start-up and mid-tier business will prioritize speed over safety. Despite my many attempts to explain this trade off using various methods such as engineer-speak, business-speak, or some combination of the two the need for money and the need to constantly impress investors trumps all. I have quite literally told people the total cost of a half-fix will be more than double the cost in engineering hours to implement a correct fix and by-and-large the half-fix will be chosen because it "gets the feature out to users quicker". It's the most asinine thing I've heard and I fully understand the need to deliver on time and on budget.

    In the end your ass is never covered. It will be your fault whether you suggested to do it and they said no, or they said yes. Your team will end up working the long hours to implement the obvious security and safety changes. The math for the other side is simple, if the cost to take on the risk is less than the cost to implement the fix, it will never get done. Companies use pager duty for free labor for a reason. It's the industry's most effective permitter of poor practices.

    Sure, something as simple as "we should really hash our passwords" might be so glaringly obvious even the most dense business person would understand. But when you wander into the land of ambiguity is when you really get burned. When the company is spending $XX,XXX/mo. on cloud storage because the ticket specifically said to not worry about lifecycle it's going to be you in the office explaining why this wasn't fixed. Rarely will any business person take "its your fault" as the answer. They'll happily assign you as many 60 hour weeks as you need to fix the problem and in a large enough corporate-tier screw up you may be the sacrificial lamb for the investors to feel like "the problem was solved".

    Call me cynical but this is an unwinnable battle. Unfortunately, until software bugs start literally killing people, the desire to actually allow engineers to do their job will be low.

    • And that is as it should be (except the blaming for issues you raised).

      The business should be able to choose what the priorities are. The business does not exist to produce beautiful code.

      If a business wilfully disregards security or stability risks that they were informed of, and they get bitten, then they will almost certainly end up paying more to fix in resource and engineering time. But that's the trade off they chose to make.

      If a business plays the blame game here, it's simply time to find another job. They are not going to be a good place to work at all.

      1 reply →

Yet if you acquiesced to those bells and whistles and there were stability or security problems it would be much worse for you personally.

It sounds like you guys don't trust each other - they don't trust your ability to assess risk and you don't trust their ability to properly value features vs those risks. As a tech lead in those environments sometimes it helps to act like a lawyer, communicate the risks, let them make the decisions but cover your ass and get everything in writing.

Hero’s get rewarded: allow something critical to break in a non-catastrophic manner and then get recognition for fixing it promptly (ideally with some high profile drama).

There is usually far less reward for preventative measures that avoided breakage in the first place.

Ideally, the fact that you're working on internal maintenance shouldn't even be known to external customers.

My rule of thumb is that 1/3 of engineering time needs to be spent on maintenance.

  • You should add an additional rule that another 1/3 of the time will need added on maintenance for the 1/3rd of the time spent on new features.

    1/3 = adding new features.

    1/3 = maintaining the security posture of the application and keeping it up to date.

    1/3 = spent figuring out why the new features blew up in the field after passing all testing.