Comment by BirAdam

4 days ago

I study and write quite a bit of tech history. IMHO from what I've learned over the last few years of this hobby, the primary issue is quite simple. While hardware folks study and learn from the successes and failures of past hardware, software folks do not. People do not regularly pull apart old systems for learning. Typically, software folks build new and every generation of software developers must relearn the same problems.

I work at $FANG, every one of our org's big projects go off the rails at the end of the project and there's always a mad rush at the end to push developers to solve all the failures of project management in their off hours before the arbitrary deadline arrives.

After every single project, the org comes together to do a retrospective and ask "What can devs do differently next time to keep this from happening again". People leading the project take no action items, management doesn't hold themselves accountable at all, nor product for late changing requirements. And so, the cycle repeats next time.

I led and effort one time, after a big bug made it to production after one of those crunches that painted the picture of the root cause being a huge complicated project being handed off to offshore junior devs with no supervision, and then the junior devs managing it being completely switched twice in the 8 month project with no handover, nor introspection by leadership. My manager's manager killed the document and wouldn't allow publication until I removed any action items that would constrain management.

And thus, the cycle continues to repeat, balanced on the backs of developers.

  • Of course the reason it works this way is that it works. As much as we'd like accountability to happen on the basis of principle, it actually happens on the basis of practicality. Either the engineers organize their power and demand a new relationship with management, or projects start going so poorly that necessity demands a better working relationship, or nothing changes. There is no 'things get better out from wisdom alone' option; the people who benefit from improvements have to force the hand of the people who can implement them. I don't know if this looks like a union or something else but my guess is that in large part it's something else, for instance a sophisticated attempt at building a professional organization that can spread simple standards which organizations can clearly measure themselves against.

    I think the reasons this hasn't happened is (a) tech has moved too fast for anyone to actually be able to credibly say how things should be done for longer than a year or two, and (b) attempts at professional organizations borrowed too much from slower-moving physical engineering and so didn't adapt to (a). But I do think it can be done and would benefit the industry greatly (at the cost of slowing things down in the short term). It requires a very 'agile' sense of standards, though.. If standards mean imposing big constraints on development, nobody will pay attention to them.

    • You forgot c) we're in a culture where people jump ship every 3-5 years. There's no incentive to learn from mistakes that you don't talk about at the next company, nor any care for the long term health of the current company.

      >a sophisticated attempt at building a professional organization that can spread simple standards which organizations can clearly measure themselves against.

      We have that as a form of IEEE, but it really doesn't come up much if you're not already neck deep in the organization.

      5 replies →

    • It "works" only on a certain timescale. We don't have sufficient incentives and penalties to make things fail quickly. A relevant example in the tech world is data breaches. If data breaches resulted in a thorough public audit and financial/criminal penalties for the managers who pushed for speed over safety, they would no longer "work".

      > If standards mean imposing big constraints on development, nobody will pay attention to them.

      Unless there are penalties for not doing so.

      > tech has moved too fast for anyone to actually be able to credibly say how things should be done for longer than a year or two

      But that's just it. If things are moving so fast that you can't say how things should be done, then that tells you that the first thing that should be done is to slow everything way down.

    • I agree wholeheartedly that collective action is how we stop balancing poor management on the backs of engineers, but good luck getting other engineers to see it that way. There's heaps of propaganda out there telling engineers that if they join a union their high salary will go away, even though unions have never been shown to reduce wages.

      53 replies →

    • Has union labor resulted in measurable improvement in production outcomes in any industry they’re found in? I don’t think going from “managers are unaccountable for failure” to “nobody is accountable for failure” is a good thing.

      I think introducing more competition at higher levels may be better than eliminating it below. This should be happening because I’m pretty sure most PMs could be replaced by an LLM.

      14 replies →

  • For one project I got so far as to include in the project proposal some outcomes that showed whether or not it was a success: quote from the PM “if it doesn’t do that then we should not have bothered building this”. They objected to even including something so obviously required in the plan.

    Waste of my bloody time. Project completed, taking twice as many devs for twice as long, great success, PM promoted. Doesn’t do that basic thing that was the entire point of it. Nobody has ever cared.

    Edit to explain why I care: there was a very nice third party utility/helper for our users. We built our own version because “only we can do amazing direct integration with the actual service, which will make it far more useful”. Now we have to support our worse in-house tool, but we never did any amazing direct integration and I guarantee we never will.

  • Glad to hear that $FANG has similar incompetency as every other mid-tier software shop I've ever worked in. Your project progression sounds like any of them. Here I was thinking that $FANG's highly-paid developers and project management processes were actually better than average.

    • Same, I think this one post may have cured me of a life long (unrealized) obsession with working at FANG.

    • Those processes take longer, and waste more money. At no point will I believe they don’t waste it in the first place.

  • Reminds me of the military. Senior leaders often have no real idea of what is happening on the ground because the information funneled upward doesn't fit into painting a rosy report. The middle officer ranks don't want to know the truth because it impacts their careers. How can executives even hope to lead their organizations this way?

    • Well the US has lost every military conflict it's entered for the past 70 years. Since there's been no internal pressure to change methodology, maybe the US military doesn't view winning as necessary.

      7 replies →

  • For how much power they have over team organization and processes, software middle management has nearly no accountability for outcomes.

    • Is it middle management that has no accountability, or executive? Middle and line managers are nearly as targeted by layoff culling as ICs these days in FAANG. The broad processes they're passing down to ICs generally start with someone at director level or higher.

      3 replies →

    • > For how much power they have over team organization and processes, software middle management has nearly no accountability for outcomes.

      Can we also address the fact that “software spend” is distributed disproportionately to management at all levels and people who actually write the software are nickel and dimed. You’d save billions in spend and boost productivity massively if the management is bare bones and is held accountable like the rest of the folks.

      3 replies →

    • The real question is why would smart competent people continue working under management with blatant ulterior motives that negatively affect them?

      Why let their own credibility get dragged down for a second time, third time, fourth time, etc…?

      The first time is understandable but not afterwards.

      17 replies →

    • If you think our ability to measure developer productivity is bad, look into what we can do to measure manager productivity.

      TL;DR your realistic options are snake oil that doesn’t work, or nothing.

      Keep that in mind next time anyone’s talking about managing through metrics & data or whatever bullshit. All that stuff’s kayfabe, companies mostly run on vibes outside a very-few things.

  • ^ This. Not at FAANG, but I am too familiar with this.

    This is why software projects fail. We lowly developers always take the blame and management skates. The lack of accountability among decision makers is why things like the UK Post Office scandals happen.

    Heads need to be put on pikes. Start with John Roberts, Adam Crozier, Moya Greene, and Paula Vennells.

  • So much of the world, especially the world we see today around corporate leadership and national politics makes much more sense once you realize this fundamental law:

    People who desire infinite power only want it because it gives them the power to avoid consequences, not because they want both the power and the consequences.

    The people who believe that with great power comes great consequences are exactly the people who don't want great power because they don't want the weight of those consequences. The only people who see that bargain and think "sign me up!" are the ones who intend to drop the consequences on the floor.

  • I was a developer for a bioinformatics software startup in which the very essential 'data import' workflow wasn't defined until the release was in the 'testing' phase.

  • > wouldn't allow publication until I removed any action items that would constrain management.

    Thats what we call blameless culture lol

  • “I love deadlines. I love the whooshing noise they make as they go by.” ― Douglas Adams

  • Did they go off the rails at the end, or deadlines force acknowledging that the project is not where folks want it to be?

    That said, I think I would agree with your main concern, there. If they question is "why did the devs make it so that project management didn't work?" Seems silly not to acknowledge why/how project management should have seen the evidence earlier.

  • Where I now work, in the government, all the devs are required to be part project managers. It’s a huge breath of fresh air. The devs are in all the customer meetings, assist in requirements gathering, and directly coach the customers as necessary to keep pushing the work towards completion.

    This came about because our work isn’t too diverse but the requirements are wildly diverse and many of the customers have no idea how to achieve the proper level of readiness. I do management in an enterprise API project for a large organization.

  • Both happy and sad to know that the sh*t show is pretty much the same in FAANG as any regular corporate environment.

  • There are many pressures and this is all about a lack of transparent honesty about what the real priorities are. Getting the project done properly may be #1 priority but there's priority 0 and 0.1 and others which are unspoken because they don't sound good.

  • Obviously you work at AMZN. This is the most Amazonish HN comment I’ve ever seen.

    • FAANG cargo cults are all around. Too many CEOs and VCs imagine they can succeed like Bezos by the bold strategy of hiring ex-Amazon executives and imitating Amazon's culture and HR practices.

I've also considered a side-effect of this. Each generation of software engineers learns to operate on top of the stack of tech that came before them. This becomes their new operating floor. The generations before, when faced with a problem, would have generally achieved a solution "lower" down in the stack (or at their present baseline). But the generations today and in the future will seek to solve the problems they face on top of that base floor because they simply don't understand it.

This leads to higher and higher towers of abstraction that eat up resources while providing little more functionality than if it was solved lower down. This has been further enabled by a long history of rapidly increasing compute capability and vastly increasing memory and storage sizes. Because they are only interacting with these older parts of their systems at the interface level they often don't know that problems were solved years prior, or are capable of being solved efficiently.

I'm starting to see ideas that will probably form into entire pieces of software "written" on top of AI models as the new floor. Where the model basically handles all of the mainline computation, control flow, and business logic. What would have required a dozen Mhz and 4MB of RAM to run now requires TFlops and Gigabytes -- and being built from a fresh start again will fail to learn from any of the lessons learned when it was done 30 years ago and 30 layers down.

  • Yeah, people tend to add rather than improve. It's possible to add into lower levels without breaking things, but it's hard. Growing up as a programmer, I was taught UNUX philosophy as a golden rule, but there are sharp corners on this one:

    To do a new job, build afresh rather than complicate old programs by adding new "features".

  • It's the "Lava Flow" antipattern [1][2] identified by the Gang of Five [3], "characterized by the lava-like 'flows' of previous developmental versions strewn about the code landscape, but now hardened into a basalt-like, immovable, generally useless mass of code which no one can remember much if anything about.... these flows are often so complicated looking and spaghetti-like that they seem important but no one can really explain what they do or why they exist."

    [1] http://antipatterns.com/lavaflow.htm

    [2] https://en.wikipedia.org/wiki/Lava_flow_(programming)

    [3] http://antipatterns.com/

> While hardware folks study and learn from the successes and failures of past hardware, software folks do not

I've been managing, designing, building and implementing ERP type software for a long time and in my opinion the issue is typically not the software or tools.

The primary issue I see is lack of qualified people managing large/complex projects because it's a rare skill. To be successful requires lots of experience and the right personality (i.e. low ego, not a person that just enjoys being in charge but rather a problem solver that is constantly seeking a better understanding).

People without the proper experience won't see the landscape in front of them. They will see a nice little walking trail over some hilly terrain that extends for about a few miles.

In reality, it's more like the Fellowship of the Rings trying to make it to Mt Doom, but that realization happens slowly.

  • > In reality, it's more like the Fellowship of the Rings trying to make it to Mt Doom, but that realization happens slowly.

    And boy to the people making the decisions NOT want to hear that. You'll be dismissed as a naysayer being overly conservative. If you're in a position where your words have credibility in the org, then you'll constantly be asked "what can we do to make this NOT a quest to the top of Mt Doom?" when the answer is almost always "very little".

    • Impossible projects with impossible deadlines seems to be the norm and even when people pull them off miraculously the lesson learned is not "ok worked this time for some reason but we should not do this again", then the next people get in and go "it was done in the past why can't we do this?"

      1 reply →

    • > And boy to the people making the decisions NOT want to hear that.

      You are 100% correct. The way I've tried to manage that is to provide info while not appearing to be the naysayer by giving some options. It makes it seem like I'm on board with crazy-ass plan and just trying to find a way to make it successful, like this:

      "Ok, there are a few ways we could handle this:

      Option 1 is to do ABC first which will take X amount of time and you get some value soon, then come back and do DEF later

      Option 2 is to do ABC+DEF at the same time but it's much tougher and slower"

    • My favorite fact is that every single time an organization manages to make a functional development team that can repeatedly successfully navigate all the problems and deliver working software that adds real value, the high up decision makers always decide to scale the team next.

      Working teams are good for a project only, then they are destroyed.

    • Jesus I just had flashbacks from my last jobs. Non-technical founder always telling me I was being pessimistic (there were no technical founders). It's just not that simple Karen!

I think part of it is that reading code isn't a skill that most people are taught.

When I was in grad school ages ago, my advisor told me to spend a week reading the source code of the system we were working with (TinyOS), and come back to him when I thought I understood enough to make changes and improvements. I also had a copy of the Linux Core Kernel with Commentary that I perused from time to time.

Being able to dive into an unknown codebase and make sense of where the pieces are put together is a very useful skill that too many people just don't have.

  • Being good at reading code isn't a skill that helps large software projects stay on rails.

    It's more about being good at juggling 1000 balls at the same time. It's 99.9% of the time a management problem, not a software problem.

    • The successful projects I've worked on, the technical staff have been given autonomy, responsibility and full insight into the problem space. This requires managers putting a lot of trust in the engineers, but it works.

      Large projects I've worked on failed simply because nobody wanted the solution in the first place.

      In government I've seen many millions spent on projects that were either forgotten about or the politician that requested it lost office.

  • Reading (someone else's) code is a whole lot harder than writing it. Which is unfortunate because I do an awful lot of it at work.

  • I'm curious, what does "read code" mean to you? What does that skill look like and how is it taught?

    • You’ll notice that more senior engineers are often much better at giving useful review comments, and they will do it faster than you, thats just a skill that seems to come with experience reading other peoples code(or your own code you wrote two years prior). It can’t be taught, only practiced, same goes for reading other types of technical/academic works.

    • Not GP, but the general idea is the skill to take a piece of code and understand what it does by reading the code itself (probably in an IDE that can help navigate it meaningfully), not relying on docs or explanations or anything else. Surprisingly few people are comfortable in doing this, and yet it's very common in any large software project that lots of parts of the code are undocumented and no one remembers the details of how they were written.

      1 reply →

This is one part of the issue. The other major piece of this that I've seen over more than two decades in industry is that most large projects are started by and run by (but not necessarily the same person) non-technical people who are exercising political power, rather than by technical people who can achieve the desired outcomes. When you put the nexus of power into the hands of non-technical people in a technical endeavor you end up with outcomes that don't match expectations. Larger scale projects deeply suffering from "not knowing what we don't know" at the top.

  • If this were true all of the time then the fix would be simple - only have technical people in charge. My experience has shown that this (only technical people in charge) doesn't solve the problem.

    • Success pretty much requires putting technical people in charge, but that doesn't mean putting technical people in charge is sufficient for success to happen. We have plenty of data over the last 40 years to prove my case. Furthermore, unfortunately, what it means to be a "technical person" is not so simple to define, unfortunately as the easy ways to codify it often exclude the very people who you want involved.

      Suffice to say, projects are significantly more likely to succeed when the power in the project is held by people who are competent /and/ understand the systems they are working with /and/ understand the problem domain you are developing a solution in. Whether or not they have a title like "engineer" or have a technical degree, or whatever other hallmark you might choose is largely irrelevant. What matters is competency and understanding, and then ultimately accountability.

      Most large projects I've been a part of or near lacked all three of these things, and thus were fundamentally doomed to failure before they ever began. The people in power lacked competency and understanding, the entire project team and the people in power lacked accountability, and competency was unevenly distributed amongst the project team.

      It may feel pithy, but it really is true that in many large projects the fundamental issue that leads to failure is that the decision makers don't know what they're doing and most of the implementers are incompetent. We can always root cause further to identify the incentive structures in society, and particularly in public/government projects that lead to this being true, but the fact remains at the project level this is the largest problem in my observation.

      2 replies →

    • It does solve the problem! That's the entire reason we have such a thing as a tech industry. Tech industry startups are often just regular industries where everyone from the board down is technical. Uber is a "tech company" and not a taxi company because of this distinction. The results speak for themselves.

  • Sometimes giving people what they want can be bad for them; management wants cheap compliant workers, management gets cheap compliant workers, and then the projects fall apart in easily predictable and preventable ways.

    Because such failures are so common management typically isn’t punished when they do so it’s hard to keep interests inline. And because many producers are run on a cost plus basis there can be a perverse incentive to do a bad job, or at least avoid doing a good one.

  • I'm not entirely sure what you mean with "technical people" but it seems that you may not appreciate the problems that "non-technical people" try to tackle.

    Do your two decades of experience cover both sides?

    • > Do your two decades of experience cover both sides?

      Yes.

      I appreciate both sides and have a wealth of experience in both. The challenge is that all the non-technical problems cannot be solved successfully while lacking a technical understanding. Projects generally don't fail for technical reasons, they fail because they were not set up for success, and that starts with having a clear understanding of requirements, feasibility, and a solid understanding of both the current state and the path to reach your desired outcomes, both politically/financially and technically.

      I was an engineer for more than a decade, I've been in Product for nearly a decade, and I'm now a senior manager in Product. I can honestly say that I have the necessary experience to hold strong opinions here and to be correct in those opinions.

      You need technical people who can also handle some of the non-technical aspects of a project with the reins of power if you want the project to succeed, otherwise it is doomed by the lack of understanding and competency of those in charge.

    • "you may not appreciate the problems that 'non-technical people' try to tackle."

      Do you mean the problem of wanting to build something without knowing how/having the skills, to build something?

I have a theory that the churn in technology is by design. If a new paradigm, new language, new framework comes out every so many years, it allows the tech sector to always want to hire new graduates for lower salaries. It gives a thin veneer of we want to always hire the person who has X when really they just do not want to hire someone with 10 years of experience in tech but who may not have picked up X yet.

I do not think it is the only reason. The world is complex, but I do think it factors into why software is not treated like other engineering fields.

  • Constantly rewriting the same stuff in endless cycles of new frameworks and languages gives an artificial sense of productivity and justifies its own existence.

    If we took the same approach to other engineering, we'd be constantly tearing down houses and rebuilding them just because we have better nails now. It sure would keep a lot of builders employed though.

    • We do take down a lot of old buildings (or renovate them thoroughly) cause the old buildings contain asbestos, are not properly isolated, ...

    • > If we took the same approach to other engineering, we'd be constantly tearing down houses and rebuilding them just because we have better nails now. It sure would keep a lot of builders employed though.

      This is almost exactly what happens in some countries.

      4 replies →

    • I agree. But, I think the execs just say, "How can we get the most bang for our buck? If we use X, Y, Z technologies, that are the new hotness, then we will get all the new hordes of hires out there, which will make them happy, and has the added benefit of paying them less"

  • The problem with that is that it would require a huge amount of coordination for it to be by design. I think it's better to look on it as systemic. Which isn't to say there aren't malign forces contributing.

    • I agree. Perhaps, "by design" is not the correct phrasing. Many decisions and effects go through a multi weighted graph of complexity (sort of like machine learning).

    • Indeed. How does that saying go? Don’t attribute to malice what can be explained by stupidity?

      On the other hand Microsoft and taceboook did collude to keep salaries low. So who knows.

      1 reply →

There are rational explanations for this. When software fails catastrophically, people almost never die (considering how much software crashes every day). When hardware fails catastrophically, people tend to die, or lose a lot of money.

There's also the complexity gap. I don't think giving someone access to the Internet Explorer codebase is necessarily going to help them build a better browser. With millions of moving parts it's impossible to tell what is essential, superfluous, high quality, low quality. Fully understanding that prior art would be a years long endeavor, with many insights no doubt, but dubious.

I would boil this down to something else, but possibly related: project requirements are hard. That's it.

> While hardware folks study and learn from the successes and failures of past hardware, software folks do not. People do not regularly pull apart old systems for learning.

For most IT projects, software folks generally can NOT "pull apart" old systems, even if they wanted to.

> Typically, software folks build new and every generation of software developers must relearn the same problems.

Project management has gotten way better today than it was 20 years, so there is definitely some learnings that have been passed on.

  • A CIO once told me with Agile we didn’t need requirements. He thought my suggestion to document the current system before modifying was a complete waste of time. Instead he made all the developers go through a customer service workshop, how to handle and communicate with customers. Cough cough… most developers do not talk with customers. Instead where we worked developers took orders from product and project people whose titles changed every year but they operated with the mindset of a drill sergeant. My way or the highway.

    • Business developers need to occasionally talk directly to customers. It's fine to filter most requirements through Product Managers / Product Owners. But developers who never communicate directly with customers and end users get disconnected from reality and end up acting based on mythology rather than ground truth.

      1 reply →

"While hardware folks study and learn from the successes and failures of past hardware, software folks do not." Couldn't be further from the truth. Software folks are obsessed with copying what has been shown to work to the point that any advance quickly becomes a cargo cult (see microservices for example).

Once you've worked in both hardware and software engineering you quickly realize that they only superficially similar. Software is fundamentally philosophy, not physics.

Hardware is constrained by real world limitations. Software isn't except in the most extreme cases. Result is that there is not a 'right' way to do any one thing that everyone can converge on. The first airplane wing looks a whole lot like a wing made today, not because the people that designed it are "real engineers" or any such BS, but because that's what nature allows you to do.

  • Software doesn't operate in some magical realm outside of the physical world. It very much is constrained by real world limitations. It runs on the hardware that itself is limited. I wonder if some failures are a result of thinking it doesn't have these limitations?

    • As the great Joe Armstrong used to say, “a lot of systems actually break the laws of physics”[1] — don’t program against the laws of physics.

      > In distributed systems there is no real shared state (imagine one machine in the USA another in Sweden) where is the shared state? In the middle of the Atlantic? - shared state breaks laws of physics. State changes are propagated at the speed of light - we always know how things were at a remote site not how they are now. What we know is what they last told us. If you make a software abstraction that ignores this fact you’ll be in trouble.[2]

      [1]: “The Mess We’re In”, 2014 https://news.ycombinator.com/item?id=19708900

    • > It very much is constrained by real world limitations. It runs on the hardware that itself is limited

      And yet we scale the shit out of it, shifting limitations further and further. On that scale different problems emerge and there is no single person or even single team that could comprehend this complexity in isolation. You start to encounter problems that have never been solved before.

      1 reply →

    • Except that it kind of does. I can horizontally scale a distributed storage system until we run out of silicon. I cannot do the same with a cargo airplane.

  • > Software folks are obsessed with copying what has been shown to work to the point that any advance quickly becomes a cargo cult

    Seems more accurate to say they are obsessed with copying "what sounds good". Software industry doesn't seem to copy what works, rather what sounds like it'd work, or what sounds cool.

    If they copied what works software would just be faster by default, because very often big established tools are replaced by something that offers similar featurage, but offers it at a higher FPS.

  • I disagree. At least at the RTL level they're very similar. You don't really deal with physics there, except for timing (which is fairly analogous with software performance things like hard real-time constraints).

    > Result is that there is not a 'right' way to do any one thing that everyone can converge on.

    Are you trying to say there is in hardware? That must be why we have exactly one branch predictor design, lol

    > The first airplane wing looks a whole lot like a wing made today, not because the people that designed it are "real engineers" or any such BS, but because that's what nature allows you to do.

    "The first function call looks a whole lot like a function call today..."

    • > That must be why we have exactly one branch predictor design, lol

      I'll be that 'well akshually' guy. IIRC the AMD and intel implementations are different enough that spectre/meltdown exploits were slightly different on each manufacturers.

      Source: wrote exploits.

      2 replies →

    • > "The first function call looks a whole lot like a function call today..."

      Only superficially. What's actually happening varies radically by language. See for instance tail call optimization.

  • What you and the GP said are not mutually exclusive. Software engineers are quick to drink every new Kool-Aid out there, which is exactly why we’re so damned blind to history and lessons learned before.

In my experience, a lot of the time the people who COULD be solving these issues are people who used to code or never have. The actual engineers who might do something like this aren't given authority or scope and you have MBAs or scrum masters in the way of actually solving problems.

I think this is too simple. First of all, hardware people have high incentive to fully replace components and systems for many reasons. Replacement is also the only way they can fix major design mistakes. Software people constantly do fix bugs and design mistakes. There is certainly no strong culture to document or dig up former mistakes made, but it's not like they don't learn from mistakes, it's just a constant process. In contrast to hardware, there is usually no point in time to retrospect. The incentives to rejuvenate systems are low and if considered often seem expensive. Software engineers self motivation is often ill-minded, new devs feeling uncomfortable with the existing system and calling for something "modern". But if the time comes to replace the "legacy" systems, then you are right, no one looks back at the former mistakes and the devs that know them, are probably long gone. The question is whether we should ever replace an software system or focus more on gradual and active modernization. But the latter can be very hard, in hardware everything is defined, most of the time backed by standards, in software we usually don't have that, so complex interconnected systems rarely have sane upgrade paths.

Agree 100%.

I know a lot of people on here will disagree with me saying this but this is exactly how you get an ecosystem like javascript being as fragmented, insecure, and "trend prone" as the old school Wordpress days. It's the same problems over and over and every new "generation" of programmers has to relearn the lessons of old.

  • The difficulty lies in the fact that most software is quite cheap to generate very complex designs compared to hardware. For software designs treated similarly to hardware (such as in medical devices or at NASA), you do gain back those benefits at great expense.

Most of the time, there's no need to study anything. Any experienced software engineer can tell you about a project they worked on with no real requirements, management constantly changing their mind, etc.

How do you study software history? Most of the lessons seem forever locked away behind corporate walls - any honest assessments made public will either end careers or start lawsuits

IME, "Why systems fail" almost always boils down to a principal-agent problem. This is another way of expressing the Mungerism "show me the incentive, I'll show you the outcome".

Systems that "work" tend to have some way of correcting for or mitigating the principal agent problem by aligning incentives.

I'd also point out that hardware is a much older discipline, in terms of how long it's been operating at scale. It's had more time to formalize and crystallize. Intel is 56 years old. Google is 27.

Some consequences of NOT learning from prior successes and failures: (a) no more training for the next generation of developers/engineers (b) fighting for the best developers, and this manifests in leetcode grinding (c) decrease in cooperation among team mates, etc.

This is an interesting distinction, but it ignores the reasons software engineers do that.

First, hardware engineers are dealing with the same laws of physics every time. Materials have known properties etc.

Software: there are few laws of physics (mostly performance and asymptotic complexity). Most software isnt anywhere near those boundaries so you get to pretend they dont exist. If you get to invent your own physics each time, yeah the process is going to look very different.

  • For most generations of hardware, you’re correct, but not all. For example, high-k was invented to mitigate tunneling. Sometimes, as geometries shrink, the physics involved does change.

I think there is a ton more nuance, but can still be explained by a simple observation, which TFA hints at: "It's the economics, stupid."

Engineering is the intersection of applied sciences, economics and business. The economics aspect is almost never recognized and explains many things. Projects of other disciplines have significantly higher costs and risks, which is why they require a lot more rigor. Taking hardware as example, one bad design decision can sink the entire company.

On the other hand, software has economics that span a much more diverse range than any other field. Consider:

- The capital costs are extremely low.

- Development can be extremely fast at the task level.

- Software, once produced, can be scaled almost limitlessly for very cheap almost instantly.

- The technology moves extremely fast. Most other engineering disciplines have not fundamentally changed in decades.

- The technology is infinitely flexible. Software for one thing can very easily be extended for an adjacent business need.

- The risks are often very low, but can be very high at the upper end. The rigor applied scales accordingly. Your LoB CRUD app going down might bother a handful of people, so who cares about tests? But your flight control software better be (and is) tested to hell and back.

- Projects vary drastically in stacks, scopes and risk profiles, but the talent pool is more or less common. This makes engineering culture absolutely critical because hiring is such a crapshoot.

- Extreme flexibility also masks the fact that complexity compounds very quickly. Abstractions enable elegant higher-level designs, but they mask internal details that almost always leak and introduce minor issues that cause compounding complexity.

- The business rules that software automates are extremely messy to begin with (80K payroll rules!) However, the combination of a) flexibility, b) speed, and c) scalability engender a false sense of confidence. Often no attempt is made at all to simplify business requirements, which is probably where the biggest wins hide. This is also what enables requirements to shift all the time, a prime cause for failures.

Worse, technical and business complexity can compound. E.g. its very easy to think "80K payroll rules linearly means O(80K) software modules" and not "wait, maybe those 80K payroll rules interact with each other, so it's probably a super-linear growth in complexity." Your architecture is then oriented towards the simplistic view, and needs hacks when business reality inevitably hits, which then start compounding complexity in the codebase.

And of course, if that's a contract up for bidding, your bid is going to be unsustainably low, which will be further depressed by the competitive bidding process.

If the true costs of a project -- which include human costs to the end users -- are not correctly evaluated, the design and rigor applied will be correspondingly out of whack.

As such I think most failures, in addition to regular old human issues like corruption, can be attributed to an insufficient appreciation of the economics involved, driven primarily by overindexing on the powers of software without an appreciation of the pitfalls.

As someone who's learning programming right now, do you have any suggestions on how one would go about finding and studying these successes and failures?

  • First, failures aren’t always obvious, and second, studying them isn’t either. This would likely need to be a formalized course. Still…

    If people want to know why Microsoft hated DOS and wanted to kill it with Xenix, then OS/2, then Windows, and then NT it would be vital to know that it only came about as a result of IBM wanting a 16bit source-compatible CP/M which didn’t yet exist. Then, you would likely want to read Dissecting DOS to see what limits were imposed by DOS.

    For other stuff, you would start backwards. Take the finished product and ask what the requirements were, then ask what the pain points are, then start digging through the source and flowcharting/mapping it. This part is a must because programs are often too difficult to really grok without some kind of map/chart.

    There is likely an entire discipline to be created in this…

  • The things people are talking about in this thread are less to do with the practice of programming, and more to do with the difficulties of managing (and being managed, within) an engineering organization.

    You'll learn all of this for yourself, on the job, just via experience.

  • To be cynical, what's the point? You'll get employed and forced to be a part of them by circumstances.

    Your company's root priorites are probably at odds with writing good software.

    One Japanese company, not going to name names, kept trying to treat software as a depreciating asset. I didn't really understand well but the long and short was that fixing things that were supposed to be "done" was bad for the accounting. New things, however were good.

    How can you run a software company like that? But they did and got the kind of outcome you'd expect. Japan made the laws this way and gets software to match.

    • I'm hoping to do a startup in molecular simulation. Granted, a startup has its own set of headaches, but it does dodge that particular issue :)

Indeed.

That's why we see every now and then "new" programming paradigms which were once obsolete.

I think this is a downstream of effect of there being no real regulation or professional designations in software which leads to every company and team being wildly different leading to no standards leaving no time for anything but crunching since there are no barriers restricting your time, so nobody spends time doing much besides shipping constantly.

I was so annoyed when I found out the OTP library and realized we’ve been reinventing things for 20+ years

I’ve read one tech history book and I really enjoyed it. any you recommend?

Software just feels so much more ephemeral than hardware. I haven't yet met a single 'old software enthusiast' in my life, yet there are so many enthusiasts for older hardware.

  • I am both a hardware and software enthusiast. Tons of DOS, Windows, and OS/2 software hanging around. While I don’t use them everyday, I do use them. From pre-Microsoft Visio to WordStar and MS Works for DOS, the applications are simple, powerful, and pleasing to use. While I don’t recommend anyone pull out Zenith 8bit and fire up COBOL-80 or LISP-80, they are interesting. Testing yourself in 64k is quite a challenge.

    The retro community is huge and varied. If it exists, someone is really into it.

  • I have a pet passion for an old simulation language called Dynamo. I think you will find people passionate about LISP and people that care about COBOL, and C is already multiple decades old.

Yes, and it's because there aren't very many textbook ways to do software engineering, because it's evolving too fast to reach consensus.

... are you saying that hardware projects fail less than software ones? just building a bridge is something that fails on a regular occurence all over the world. Every chip comes with list of erratas longer than my arm.

Software folks treat their output as if it's their baby or their art projects.

Hardware folks just follow best practices and physics.

They're different problem spaces though, and having done both I think HW is much simpler and easier to get right. SW is often similar if you're working on a driver or some low-level piece of code. I tried to stay in systems software throughout my career for this reason. I like doing things 'right' and don't have much need to prove to anyone how clever I am.

I've met many SW folks who insist on thinking of themselves as rock stars. I don't think I've ever met a HW engineer with that attitude.