It's a great article, until the end where they say what the solution would be. I'm afraid that the solution is: build something small, and use it in production before you add more features. If you need to make a national payroll, you have to use it for a small town with a payroll of 50 people first, get the bugs worked out, then try it with a larger town, then a small city, then a large city, then a province, and then and only then are you ready to try it at a national level. There is no software development process which reliably produces software that works at scale without doing it small, and medium sized, first, and fixing what goes wrong before you go big.
> If you need to make a national payroll, you have to use it for a small town with a payroll of 50 people first, get the bugs worked out, then try it with a larger town, then a small city, then a large city, then a province, and then and only then are you ready to try it at a national level.
At a large box retail chain (15 states, ~300 stores) I worked on a project to replace the POS system.
The original plan had us getting everything working (Ha!) and then deploying it out to stores and then ending up with the two oddball "stores". The company cafeteria and surplus store were technically stores in that they had all the same setup and processes but were odd.
When the team that I was on was brought into this project, we flipped that around and first deployed to those two several months ahead of the schedule to deploy to the regular stores.
In particular, the surplus store had a few dozen transactions a day. If anything broke, you could do reconciliation by hand. The cafeteria had single register transaction volume that surpassed a surplus store on most any other day. Furthermore, all of its transactions were payroll deductions (swipe your badge rather than credit card or cash). This meant that if anything went wrong there we weren't in trouble with PCI and could debit and credit accounts.
Ultimately, we made our deadline to get things out to stores. We did have one nasty bug that showed up in late October (or was it early November?) with repackaging counts (if a box of 6 was $24 and if purchased as a single item it was $4.50 ... but if you bought 6 single items it was "repackaged" to cost $24 rather than $27) which interacted with a BOGO sale. That bug resulted in absurd receipts with sales and discounts (the receipt showed you spent $10,000 but were discounted $9,976 ... and then the GMs got alerts that the store was not able to make payroll because of a $9,976 discount ... one of the devs pulled an all nighter to fix that one and it got pushed to the stores ).
I shudder to think about what would have happened if we had tried to push the POS system out to customer facing stores where the performance issues in the cafeteria where worked out first or if we had to reconcile transactions to hunt down incorrect tax calculations.
You could have, in principle, implemented the new system to be able to run in "dummy mode" alongside the existing system at regular stores, so that you see that it produces the 'same' results in terms of what the existing system is able to provide.
Which is to say, there is more than one approach to gradual deployment.
>> We did have one nasty bug that showed up in late October (or was it early November?)
Having worked in Ecommerce & payment processing, where this weekend is treated like the Superbowl, birth of your first child and wedding day all rolled into one, a nasty POS bug at this time of year would be incredibly stressful!
There is no solution because these projects are not failing because of technical reasons.
They are failing because of political scheming and bunch of people wanting to have a finger in the pie - "trillions spent" - I guess no one would mind earning couple millions.
Then you have "important people" who want to be important and want to have an opinion on font size and that some button should be 12px to the right because they are "important" it doesn't matter for the project but they have to assert the dominance.
You have 2 or 3 companies working on a project? Great! now they will be throwing stuff over the fence to limit their own cost and blame others while trying to get away with as least work done cashing most money possible.
That is how sausage is made. Coming up with "reasonable approach" is not the solution because as soon as you get different suppliers, different departments you end up with power/money struggle.
> They are failing because of political scheming and bunch of people wanting to have a finger in the pie - "trillions spent" - I guess no one would mind earning couple millions.
Not (necessarily) wrong, but if you start small, Important People may not want to bother with something that is Unimportant and may leave things alone so something useful and working can get going. If you starting with an Important project then Important People will start circling it right away.
Political corruption is like environmental radiation: a viable fix is never 'just get rid of political corruption'*. It's an environmental constant that needs to be handled by an effective approach.
That said, parent's size- and scope-iterative approach also helps with corruption, because corruption metastasizes in the time between {specification} and {deliverable}.
Shrink that, by tying incremental payments to working systems at smaller scales, and you shrink the blast radius for failure.
That said, there are myriad other problems the approach creates (encouraging architectures that won't scale to the final system, promoting duct taped features on top of an existing system, vendor-to-vendor transitions if the system builder changes, etc).
But on the whole, the pros outweigh the cons... for projects controlled by a political process (either public or private).
That's why military procurement has essentially landed on spiral development (i.e. iterative demonstrated risk burn-down) as a meta-framework.
* Limit political corruption, to the extent possible in a cost efficient manner, sure
That's what works for products, not software systems. Gradual growth inevitably results in loads of technical debt that is not paid off as Product adds more feature requests to deliver larger and larger sales contracts. Eventually you want to rewrite to deal with all the technical debt, but nobody has enough confidence to say what is in the codebase that's important to Product and what isn't, so everybody is afraid and frozen.
Scale is separately a Product and Engineering question. You are correct that you cannot scale a Product to delight many users without it first delighting a small group of users. But there are plenty of scaled Engineering systems that were designed from the beginning to reach massive scale. WhatsApp is probably the canonical example of something that was a rather simple Product with very highly scaled Engineering and it's how they were able to grow so much with such a small team.
> Gradual growth inevitably results in loads of technical debt.
Why is this stated as though it's some de facto software law? The argument is not whether it's possible to waterfall a massive software system. It clearly is possible, but the failure ratios have historically been sufficiently uncomfortable to give rise to entirely different (and evidently more successful) project development philosophies, especially when promoters were more sensitive to the massive sums involved (which in my opinion also helps explains why so many wasteful government examples). The lean startup did not appear in a vacuum. Do things that don't scale did not become a motto in these parts without reason. In case some are still confused about the historical purpose of these benign sounding advices, no, they weren't originally addressed at entrepreneurs aiming to run "lifestyle" businesses.
Software is a component of a product, if not the product itself. Treating software like a product, besides being the underlying truth, also means it makes sense to manage it like one.
Technical debt isn’t usually the problem people think it is. When it does become a problem, it’s best to think of it in product-like terms. Does it make the product less useful for its intended purpose? Does it make maintenance or repair inconvenient or costly? Or does it make it more difficult or even impossible to add competitive features or improvements? Taking a product evaluation approach to the question can help you figure out what the right response is. Sometimes it’s no response at all.
Designing or intending a system to be used at massive scale is not the same as building and deploying it so that it only initially runs at that massive scale.
That's just a recipe for disaster, "We don't even know if we can handle 100 users, let's now force 1 million people to use the system simultaneously." Even WhatsApp couldn't handle hundreds of millions of users on the day it was first released, nor did it attempt to. You build out slowly and make sure things work, at least if you're competent and sane.
> Gradual growth inevitably results in loads of technical debt that is not paid off as Product adds more feature requests to deliver larger and larger sales contracts.
This isn't technical debt, necessarily. Technical debt is a specific thing. You probably mean "an underlying design that doesn't perfectly map to what ended up being the requirements". But then the world moves on (what if a regulation is added that ruins your perfect structure anyway?) and you can't just wish for perfect requirements. Or not in software that interacts directly with the real world, anyway.
There's nothing wrong with technical debt per se. As with all debt, the problem is incurring it without a plan or means to pay it off. Debt based financing is the engine of modern capitalism.
Gradual growth to large scale implies an ongoing refactoring cost--that's the price of paying off the technical debt that got you started and built initial success in small scale rollouts. As long as you keep "servicing" your debt (which can include throwing away an earlier chunk and building a more scalable replacement with the lessons learned), you're doing fine.
The magic words here to management/product owners is "we built it that way the first time because it got us running quickly and taught us what we need to know to build the scalable version. If we'd tried to go for the scalable version first, we wouldn't have known foo, bar and baz, and we'd have failed and wouldn't have learned anything."
Gradual growth =/= many tacked on features. Many tacked on features =/= technical debt. Technical debt =/= "everybody is afraid and frozen." Those are merely often correlated, but not required.
Whatsapp is a terrible example because it's barely a product; Whatsapp is mostly a free offering of goodwill riding on the back of actual products like Facebook Ads. A great example would be a product like Salesforce, SAP, or Microsoft Dynamics. Those products are forced to grow and change and adapt and scale, to massive numbers doing tons of work, all while being actual products and being software systems. I think such products act as stark rebukes of what you've described.
The dominant factor is: there is a human who understands the entire system.
That is vastly easier to achieve by making a small, successful system, which gets buy in from both users and builders to the extent that the former pay sufficient money for the latter to be invested in understanding the entire system and then growing it and keeping up with the changes.
Occasionally a moon shot program can overcome all of that inertia, but the “90% of all projects fail” is definitely overrepresented in large projects. And the Precautionary Principle says you shouldn’t because the consequences are so high.
This works for Clojure, git and even Linux. It seems there's a human who understands the entire system and decides what's allowed to be added to it.
But these things are meant to be used by technical people.
The non-technical people I know might want to use Linux but stay on Windows or choose Mac OS because it's more straightforward. I use Windows+WSL at work even though I would like to use a native Linux distribution.
I know someone who created a MUD game (text online game) and said to him I wanted to make one with a browser client. He said something we could translate as "Good, you can have all the newbies." Not only was he right that a MUD should be played with a MUD client like tintin++, but making a good browser client is harder than it seems and that's time not spent making content for the game or improving the engine.
My point is that he was un uncomprimising person who refused adding layers to a project because they would come at a cost which isn't only time or dollars but also things like motivation and focus.
You will never get to the moon by making a faster and faster bus.
I see a lot of software with that initial small scale "baked into it" at every level of its design, from the database engine choice, schema, concurrency handling, internal architecture, and even the form design and layout.
The best-engineered software I've seen (and written) always started at the maximum scale, with at least a plan for handling future feature extensions.
As a random example, the CommVault backup software was developed in AT&T to deal with their enormous distributed scale, and it was the only decently scalable backup software I had ever used. It was a serious challenge with its competitors to run a mere report of last night's backup job status!
I also see a lot of "started small, grew too big" software make hundreds of silly little mistakes throughout, such as using drop-down controls for selecting users or groups. Works great for that mom & pop corner store customer with half a dozen accounts, fails miserably at orgs with half a million. Ripping that out and fixing it can be a decidedly non-trivial piece of work.
Similarly, cardinality in the database schema has really irritating exceptions that only turn up at the million or billion row scale and can be obscenely difficult to fix later. An example I'm familiar with is that the ISBN codes used to "uniquely" identify books are almost, but not quite unique. There are a handful of duplicates, and yes, they turn up in real libraries. This means that if you used these as a primary key somewhere... bzzt... start over from the beginning with something else!
There is no way to prepare for this if you start with indexing the book on your own bookshelf. Whatever you cook up will fail at scale and will need a rethink.
Counterpoint: the idea that your project will be the one to scale up to the millions of users/requests/etc is hubris. Odds are, your project won't scale past a scale of 10,000 to 100,000. Designing every project to scale to the millions from the beginning often leads to overengineering, adding needless complexity when a simpler solution would have worked better.
Naturally, that advice doesn't hold if you know ahead of time that the project is going to be deployed at massive scale. In which case, go ahead and implement your database replication, load balancing, and failover from the start. But if you're designing an app for internal use at your company of 500, well, feel free to just use SQLite as your database. You won't ever run into the problems of scale in this app, and single-file databases have unique advantages when your scale is small.
Basically: know when huge scale is likely, and when it's immensely UNlikely. Design accordingly.
While I think this is good advice in general, I don’t think your statement that “there is no process to create scalable software” holds true.
The uk gov development service reliably implements huge systems over and over again, and those systems go out to tens of millions from day 1. As a rule of thumb, the parts of the uk govt digital suite that suck are the parts the development service haven’t been assigned to yet.
The Swift banking org launches reliable features to hundreds of millions of users.
There’s honestly loads of instances of organisations reliably implementing robust and scalable software without starting with tens of users.
The uk government development service as you call it is not a service. It’s more of a declaration of process that is adhered to across diverse departments and organisations that make up the government. It’s usually small teams that are responsible for exploring what a service is or needs and then implementing it. They are able to deliver decent services because they start small, design and user test iteratively and only when there is a really good understanding of what’s being delivered do they scale out.
The technology is the easy bit.
UK GDS is great, but the point there is that they're a crack team of project managers.
People complain about junior developers who pass a hiring screen and then can't write a single line of code. The equivalent exists for both project management and management in general, except it's much harder to spot in advance. Plus there's simply a lot of bad doctrine and "vibes management" going on.
("Vibes management": you give a prompt to your employees vaguely describing a desired outcome and then keep trying to correct it into what you actually wanted)
> and those systems go out to tens of millions from day 1
I like GDS (I even interviewed with them once and saw their dev process etc) but this isn't a great example. Technically GDS services have millions of users across decades, but people e.g. aren't constantly applying for new passports every day.
A much better example I think is Facebook's rollout of Messenger, which scaled to billions of actual users on day 1 with no issues. They did it by shipping the code early in the Facebook app, and getting it to send test messages to other apps until the infra held, and then they released Messenger after that. Great test strategy.
GDS's budget is about £90 million a year or something. There are many contracts that are still spent on digital, for example PA consulting for £60 million (over a few years) which do a lot of the gov.uk home-office stuff, and their fresh grads they hire cost more to the government than GDS's most senior staff...
SWIFT? Hold my beer. SWIFT did not launch anything substantial since its startup days in early 70-ies.
Moreover, their core tech did not evolve that far from that era, and the 70-ies tech bros are still there through their progeniture.
Here's an anecdote: The first messaging system built by SWIFT was text-based, somewhat similar to ASN.1.
The next one used XML, as it was the fad of the day. Unfortunately, neither SWIFT nor the banks could handle 2-3 orders of magnitude increase in payload size in their ancient systems. Yes, as engineers, you would think compressing XML would solve the problem and you would by right. Moreover, XML Infoset already existed, and it defined compression as a function of the XML Schema, so it was somewhat more deterministic even though not more efficient than LZMA.
But the suits decided differently. At one of the SIBOS conferences they abbreviate XML tags, and did it literally on paper and without thinking about back-and-forth translation, dupes, etc.
And this is how we landed with ISO20022 abberviations that we all know and love: Ccy for Currency, Pmt for Payment, Dt for Date, etc.
> A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.
> I'm afraid that the solution is: build something small, and use it in production before you add more features.
Gall's Law:
> A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.[8]
Came here to say this. I still think that Linus Torvalds has the most profound advice to building a large, highly successful software system:
"Nobody should start to undertake a large project. You start with a small trivial project, and you should never expect it to get large. If you do, you'll just overdesign and generally think it is more important than it likely is at that stage. Or worse, you might be scared away by the sheer size of the work you envision. So start small, and think about the details. Don't think about some big picture and fancy design. If it doesn't solve some fairly immediate need, it's almost certainly over-designed. And don't expect people to jump in and help you. That's not how these things work. You need to get something half-way useful first, and then others will say "hey, that almost works for me", and they'll get involved in the project."
I don't think this applies in any way to companies contracted to build a massive system for a government with a clear need. Linus is talking about growing a greenfield open-source project, which may or may not ever be used by anyone.
In contrast, if your purpose is "we need to manage our country's accounting without pen and paper", that's a clear need for a massive system. Starting work on this by designing a system that can solve accounting for a small firm is not the right way to go. Instead, you have to design with the end-goal in mind, since that's what you were paid for. But, you don't launch your system to the entire country at once: you first use this system designed for a country in a small shop, to make sure it actually handles the small scale well, before gradually rolling out to more and more people.
No Linus Torvalds would stand against people in projects from article, he would slam the door and quit.
Those projects that author pointed out are basically political horror stories. I can imagine how dozens of people wanted to have a cut on money in those projects or wanted to push things because “they are important people”.
There is nothing you can do technically to save such projects and it is NOT an IT failure.
A bad API can constrain your implementation and often can't be changed once it's in use by loads of users. APIs should be right from day one if possible.
While I like the "start small and expand" strategy better than the "big project upfront", this trades project size for project length and often that is no better:
- It gives outside leadership types many more opportunities to add requirements later. This is nice is they are things missed in the original design, but it can also lead to massive scope creep.
- A big enough project that gets done the "start small and expand" way can easily grow into a decade-plus project. For an extreme example, see the multi-decade project by the Indian rail company to gradually replace all its railways to standard gauge. It works fine if you have the organisational backing for a long duration, but the constant knowledge leaks from people leaving, retiring, getting promoted, etc can be a real problem for a project like that. Especially in fields where the knowledge is the product, like in software.
> If you need to make a national payroll, you have to use it for a small town with a payroll of 50 people first, get the bugs worked out, then try it with a larger town, then a small city, then a large city, then a province, and then and only then are you ready to try it at a national level.
You could also try to buy some off-the-shelf solutions? Making payroll, even for very large organisations, isn't exactly a new problem.
As a corollary I would also suggest: subsidiarity.
> Subsidiarity is a principle of social organization that holds that social and political issues should be dealt with at the most immediate or local level that is consistent with their resolution.
I think you'll find that is exactly what people do. However, payroll solutions are highly customized for every individual company and even business unit. You don't buy a payroll software in a box, deploy it, and now you have payroll. Instead, you pay a payroll software company, they come in and get information about your payroll systems, and then they roll out their software on some of your systems and work with you to make sure their customizations worked etc. There's rarely any truly "off-the-shelf" software in B2B transactions, especially the type of end-user solutions that also interact with legal systems.
Also, governments are typically at least an order of magnitude larger than the largest companies operating in their countries, in terms of employees. So sure, the government of Liechtenstein has fewer employees than Google overall, but the US government certainly does not, and even Liechtenstein probably has way more government employees than Google employees in their country.
I work at a small shop, I'm a big advocate of giving customers the 0.1 version and then talking it out what they want. It's often not exactly what they asked for at the start ... but it often is better in the end.
Yes. Also the same applies to companies. There should not be companies that are growing to $100 million revenue while losing money on a gamble that they will eventually get big enough to succeed. Good first, big later.
$100M maybe. But pretty much all tech needs an initial investment before you can start making profit. It takes a lot of development before you can get a product that anyone would want to pay for.
>It's a great article, until the end where they say what the solution would be. I'm afraid that the solution is: build something small, and use it in production before you add more features.
I think that is true for a lot of projects. But I'm not sure it is realistic to incrementally develop a control system for a nuclear reactor or an air traffic control system.
Not saying you're wrong, but I wonder what is the differentiating factor for software? We can build huge things like airliners, massive bridges and buildings without starting small.
Incremental makes less sense to me when you want to go to mars. Would you propose to write the software for such a mission in an incremental fashion too?
Yet for software systems it is sometimes proposed as the best way.
> We can build huge things like airliners, massive bridges and buildings without starting small.
We did start small with all of those things. We developed rigorous disciplines around engineering, architecture, material sciences. And people died along the way in the thousands[0][1]
People are still dying from those failures; The Boeing 737 MAX 9 crash was only two years ago.
> Incremental makes less sense to me when you want to go to mars.
This is yet another reason why a manned Mars mission will be exceedingly dangerous NOT a strike against incremental development and deployment.
All of the things you mentioned are designed and tested incrementally. Furthermore software has been used on Mars missions in the past, and that software was also developed incrementally. It’s proposed as the best way because it’s a way that has a track record
That sounds like the way nature handles growth and complexity: slowly and over long time scales. Assume there will be failures, don't die and keep trying.
When you bite off too much complexity at once you end up not shipping anything or building something brittle.
You just need: Plan -> Implement -> Test -> Repeat
Whether you are creating software, games or whatever, these iterations are foundational. How these steps look like in detail of course depends on the project itself.
That's the ideal, but a lot of these big problems can't start small because the problem they have is already big. A lot of government IT programs are set up to replace existing software and -processes, often combining a lot of legacy software's jobs and the manual labor involved.
If you have something like a tax office or payroll, they need to integrate decades of legislation and rules. It's doable, but you need to understand the problem (which at those scales is almost impossible to fit in one person's head) and more importantly have diligent processes and architecture to slowly build up and deploy the software.
tl;dr it's hard. I have no experience in anything that scale, I've been at the edges of large organizations (e.g. consumer facing front-ends) for most of my career.
The accounting, legal and business process requirements are vastly different at different scales, different jurisdictions, different countries, etc.
There's a crazy amount of complexity and customizability in systems like ERPs for multinational corporations (SAP, Oracle).
When you start with a small town, you'll have to throw most of everything away when moving to a different scale.
That's true for software systems in general.
If major requirements are bolted on after the fact, instead of designed into the system from the beginning, you usually end up with an unmaintainable mess.
Knowing that the rules for your first small deployment are not the same as the rules for everywhere, is valuable for designing well. Trying to implement all of those sets of rules in your initial deployment, is not a good idea. There is a general principle that you shouldn't code the abstraction until you've coded for the concrete example 2 or 3 times, because otherwise you won't make the right abstraction. Looking ahead is not the same as starting with the whole enchilada for your initial deployment.
I do get concerned when the solution is to be more strict on the waterfall process.
I used to believe there were some worlds in which waterfalls are better: where requirements are well know in advance and set in stone. I’ve since come to realize neither of those assumptions is ever true.
Imagine if the only way to build a skyscraper was to start with a dollhouse and keep tacking extensions and pieces onto it until. Imagine if the only way to build a bridge across San Francisco bay was to start with pop sickle sticks.
The very specific example you chose: payroll, shows how it can be difficult to incrementally step from small to huge. As you grow from town to national, you will run into all the disadvantages without really hitting the advantages. I feel that incremental does help you move from one level to one just a few above. But only if there are enough customers at these starting levels exactly.
When developing for towns, you will have all small random subsets of the variations imposed by year after year of legal changes BUT small sales. You will have to implement niche variations in arbitrary aspects for all the towns you have to support AND you will not have the customer size on which to amortize this work. Each new customer will bring a new arbitrary set of legal aspects to be met. Each new customer may be arbitrarily difficult to support.
By the time you reach national, you will have already covered most of the historical legal quirks - but that will have been done in one kludgy manner after another - and then you will hit one more set of legal quirks at the level of national organizations (some of them will have their very own laws). You will now have a very large budget to finalize things but you will be burdened by an illogical software base.
So I agree that you will need experience and subject matter experts that have worked at the various levels. BUT, now that you have this experience you now know the degree of flexibility that is required (you know where and what needs to be variable and quirk-friendly and how far the quirks can go = "any size") as well as size-related issues (mailing, transaction, user support volume) and you can now plan for all this AS YOU restart a new development from scratch. Because at this new "master" level you need both systematic flexibility AND relience at size.
Payroll is exactly the kind of topic where "adding features" will be "fun" - I mean bewildering - while you learn, but probably economically difficult to manage, until it kills you "as you climb up"?
You will be killed by a large software project that can afford to hire out a bunch of your subject matter specialists (or hires new ones) and uses them in a "from scratch" project. If you are lucky, this large project will be from the same company but only if you are lucky.
Now. AFTER you have done the one top level project - for one country -, you will probably be in a good situation to sell service to all kinds of organizations. Because you now have a system in which you can implement ridiculous quirks without breaking everything. And if you have done the job just right, you can onboard smaller customers (towns) economically enough that they can afford your solution.
That's different from where you deploy your solution first. Sure, deploy a national-design solution first at a subset of the target employees - although that does impose more requirements still: now you need to coexist with the legacy solutions. Which would be another hard to meet handicap when developing for towns first.
I study and write quite a bit of tech history. IMHO from what I've learned over the last few years of this hobby, the primary issue is quite simple. While hardware folks study and learn from the successes and failures of past hardware, software folks do not. People do not regularly pull apart old systems for learning. Typically, software folks build new and every generation of software developers must relearn the same problems.
I work at $FANG, every one of our org's big projects go off the rails at the end of the project and there's always a mad rush at the end to push developers to solve all the failures of project management in their off hours before the arbitrary deadline arrives.
After every single project, the org comes together to do a retrospective and ask "What can devs do differently next time to keep this from happening again". People leading the project take no action items, management doesn't hold themselves accountable at all, nor product for late changing requirements. And so, the cycle repeats next time.
I led and effort one time, after a big bug made it to production after one of those crunches that painted the picture of the root cause being a huge complicated project being handed off to offshore junior devs with no supervision, and then the junior devs managing it being completely switched twice in the 8 month project with no handover, nor introspection by leadership. My manager's manager killed the document and wouldn't allow publication until I removed any action items that would constrain management.
And thus, the cycle continues to repeat, balanced on the backs of developers.
Of course the reason it works this way is that it works. As much as we'd like accountability to happen on the basis of principle, it actually happens on the basis of practicality. Either the engineers organize their power and demand a new relationship with management, or projects start going so poorly that necessity demands a better working relationship, or nothing changes. There is no 'things get better out from wisdom alone' option; the people who benefit from improvements have to force the hand of the people who can implement them. I don't know if this looks like a union or something else but my guess is that in large part it's something else, for instance a sophisticated attempt at building a professional organization that can spread simple standards which organizations can clearly measure themselves against.
I think the reasons this hasn't happened is (a) tech has moved too fast for anyone to actually be able to credibly say how things should be done for longer than a year or two, and (b) attempts at professional organizations borrowed too much from slower-moving physical engineering and so didn't adapt to (a). But I do think it can be done and would benefit the industry greatly (at the cost of slowing things down in the short term). It requires a very 'agile' sense of standards, though.. If standards mean imposing big constraints on development, nobody will pay attention to them.
For one project I got so far as to include in the project proposal some outcomes that showed whether or not it was a success: quote from the PM “if it doesn’t do that then we should not have bothered building this”. They objected to even including something so obviously required in the plan.
Waste of my bloody time. Project completed, taking twice as many devs for twice as long, great success, PM promoted. Doesn’t do that basic thing that was the entire point of it. Nobody has ever cared.
Edit to explain why I care: there was a very nice third party utility/helper for our users. We built our own version because “only we can do amazing direct integration with the actual service, which will make it far more useful”. Now we have to support our worse in-house tool, but we never did any amazing direct integration and I guarantee we never will.
Glad to hear that $FANG has similar incompetency as every other mid-tier software shop I've ever worked in. Your project progression sounds like any of them. Here I was thinking that $FANG's highly-paid developers and project management processes were actually better than average.
Reminds me of the military. Senior leaders often have no real idea of what is happening on the ground because the information funneled upward doesn't fit into painting a rosy report. The middle officer ranks don't want to know the truth because it impacts their careers. How can executives even hope to lead their organizations this way?
^ This. Not at FAANG, but I am too familiar with this.
This is why software projects fail. We lowly developers always take the blame and management skates. The lack of accountability among decision makers is why things like the UK Post Office scandals happen.
Heads need to be put on pikes. Start with John Roberts, Adam Crozier, Moya Greene, and Paula Vennells.
So much of the world, especially the world we see today around corporate leadership and national politics makes much more sense once you realize this fundamental law:
People who desire infinite power only want it because it gives them the power to avoid consequences, not because they want both the power and the consequences.
The people who believe that with great power comes great consequences are exactly the people who don't want great power because they don't want the weight of those consequences. The only people who see that bargain and think "sign me up!" are the ones who intend to drop the consequences on the floor.
I was a developer for a bioinformatics software startup in which the very essential 'data import' workflow wasn't defined until the release was in the 'testing' phase.
Did they go off the rails at the end, or deadlines force acknowledging that the project is not where folks want it to be?
That said, I think I would agree with your main concern, there. If they question is "why did the devs make it so that project management didn't work?" Seems silly not to acknowledge why/how project management should have seen the evidence earlier.
Where I now work, in the government, all the devs are required to be part project managers. It’s a huge breath of fresh air. The devs are in all the customer meetings, assist in requirements gathering, and directly coach the customers as necessary to keep pushing the work towards completion.
This came about because our work isn’t too diverse but the requirements are wildly diverse and many of the customers have no idea how to achieve the proper level of readiness. I do management in an enterprise API project for a large organization.
There are many pressures and this is all about a lack of transparent honesty about what the real priorities are. Getting the project done properly may be #1 priority but there's priority 0 and 0.1 and others which are unspoken because they don't sound good.
I've also considered a side-effect of this. Each generation of software engineers learns to operate on top of the stack of tech that came before them. This becomes their new operating floor. The generations before, when faced with a problem, would have generally achieved a solution "lower" down in the stack (or at their present baseline). But the generations today and in the future will seek to solve the problems they face on top of that base floor because they simply don't understand it.
This leads to higher and higher towers of abstraction that eat up resources while providing little more functionality than if it was solved lower down. This has been further enabled by a long history of rapidly increasing compute capability and vastly increasing memory and storage sizes. Because they are only interacting with these older parts of their systems at the interface level they often don't know that problems were solved years prior, or are capable of being solved efficiently.
I'm starting to see ideas that will probably form into entire pieces of software "written" on top of AI models as the new floor. Where the model basically handles all of the mainline computation, control flow, and business logic. What would have required a dozen Mhz and 4MB of RAM to run now requires TFlops and Gigabytes -- and being built from a fresh start again will fail to learn from any of the lessons learned when it was done 30 years ago and 30 layers down.
Yeah, people tend to add rather than improve. It's possible to add into lower levels without breaking things, but it's hard. Growing up as a programmer, I was taught UNUX philosophy as a golden rule, but there are sharp corners on this one:
To do a new job, build afresh rather than complicate old programs by adding new "features".
It's the "Lava Flow" antipattern [1][2] identified by the Gang of Five [3], "characterized by the lava-like 'flows' of previous developmental versions strewn about the code landscape, but now hardened into a basalt-like, immovable, generally useless mass of code which no one can remember much if anything about.... these flows are often so complicated looking and spaghetti-like that they seem important but no one can really explain what they do or why they exist."
> While hardware folks study and learn from the successes and failures of past hardware, software folks do not
I've been managing, designing, building and implementing ERP type software for a long time and in my opinion the issue is typically not the software or tools.
The primary issue I see is lack of qualified people managing large/complex projects because it's a rare skill. To be successful requires lots of experience and the right personality (i.e. low ego, not a person that just enjoys being in charge but rather a problem solver that is constantly seeking a better understanding).
People without the proper experience won't see the landscape in front of them. They will see a nice little walking trail over some hilly terrain that extends for about a few miles.
In reality, it's more like the Fellowship of the Rings trying to make it to Mt Doom, but that realization happens slowly.
> In reality, it's more like the Fellowship of the Rings trying to make it to Mt Doom, but that realization happens slowly.
And boy to the people making the decisions NOT want to hear that. You'll be dismissed as a naysayer being overly conservative. If you're in a position where your words have credibility in the org, then you'll constantly be asked "what can we do to make this NOT a quest to the top of Mt Doom?" when the answer is almost always "very little".
I think part of it is that reading code isn't a skill that most people are taught.
When I was in grad school ages ago, my advisor told me to spend a week reading the source code of the system we were working with (TinyOS), and come back to him when I thought I understood enough to make changes and improvements. I also had a copy of the Linux Core Kernel with Commentary that I perused from time to time.
Being able to dive into an unknown codebase and make sense of where the pieces are put together is a very useful skill that too many people just don't have.
This is one part of the issue. The other major piece of this that I've seen over more than two decades in industry is that most large projects are started by and run by (but not necessarily the same person) non-technical people who are exercising political power, rather than by technical people who can achieve the desired outcomes. When you put the nexus of power into the hands of non-technical people in a technical endeavor you end up with outcomes that don't match expectations. Larger scale projects deeply suffering from "not knowing what we don't know" at the top.
If this were true all of the time then the fix would be simple - only have technical people in charge. My experience has shown that this (only technical people in charge) doesn't solve the problem.
Sometimes giving people what they want can be bad for them; management wants cheap compliant workers, management gets cheap compliant workers, and then the projects fall apart in easily predictable and preventable ways.
Because such failures are so common management typically isn’t punished when they do so it’s hard to keep interests inline. And because many producers are run on a cost plus basis there can be a perverse incentive to do a bad job, or at least avoid doing a good one.
I'm not entirely sure what you mean with "technical people" but it seems that you may not appreciate the problems that "non-technical people" try to tackle.
Do your two decades of experience cover both sides?
I have a theory that the churn in technology is by design. If a new paradigm, new language, new framework comes out every so many years, it allows the tech sector to always want to hire new graduates for lower salaries. It gives a thin veneer of we want to always hire the person who has X when really they just do not want to hire someone with 10 years of experience in tech but who may not have picked up X yet.
I do not think it is the only reason. The world is complex, but I do think it factors into why software is not treated like other engineering fields.
Constantly rewriting the same stuff in endless cycles of new frameworks and languages gives an artificial sense of productivity and justifies its own existence.
If we took the same approach to other engineering, we'd be constantly tearing down houses and rebuilding them just because we have better nails now. It sure would keep a lot of builders employed though.
The problem with that is that it would require a huge amount of coordination for it to be by design. I think it's better to look on it as systemic. Which isn't to say there aren't malign forces contributing.
There are rational explanations for this. When software fails catastrophically, people almost never die (considering how much software crashes every day). When hardware fails catastrophically, people tend to die, or lose a lot of money.
There's also the complexity gap. I don't think giving someone access to the Internet Explorer codebase is necessarily going to help them build a better browser. With millions of moving parts it's impossible to tell what is essential, superfluous, high quality, low quality. Fully understanding that prior art would be a years long endeavor, with many insights no doubt, but dubious.
I would boil this down to something else, but possibly related: project requirements are hard. That's it.
> While hardware folks study and learn from the successes and failures of past hardware, software folks do not. People do not regularly pull apart old systems for learning.
For most IT projects, software folks generally can NOT "pull apart" old systems, even if they wanted to.
> Typically, software folks build new and every generation of software developers must relearn the same problems.
Project management has gotten way better today than it was 20 years, so there is definitely some learnings that have been passed on.
A CIO once told me with Agile we didn’t need requirements. He thought my suggestion to document the current system before modifying was a complete waste of time. Instead he made all the developers go through a customer service workshop, how to handle and communicate with customers. Cough cough… most developers do not talk with customers. Instead where we worked developers took orders from product and project people whose titles changed every year but they operated with the mindset of a drill sergeant. My way or the highway.
"While hardware folks study and learn from the successes and failures of past hardware, software folks do not." Couldn't be further from the truth. Software folks are obsessed with copying what has been shown to work to the point that any advance quickly becomes a cargo cult (see microservices for example).
Once you've worked in both hardware and software engineering you quickly realize that they only superficially similar. Software is fundamentally philosophy, not physics.
Hardware is constrained by real world limitations. Software isn't except in the most extreme cases. Result is that there is not a 'right' way to do any one thing that everyone can converge on. The first airplane wing looks a whole lot like a wing made today, not because the people that designed it are "real engineers" or any such BS, but because that's what nature allows you to do.
Software doesn't operate in some magical realm outside of the physical world. It very much is constrained by real world limitations. It runs on the hardware that itself is limited. I wonder if some failures are a result of thinking it doesn't have these limitations?
> Software folks are obsessed with copying what has been shown to work to the point that any advance quickly becomes a cargo cult
Seems more accurate to say they are obsessed with copying "what sounds good". Software industry doesn't seem to copy what works, rather what sounds like it'd work, or what sounds cool.
If they copied what works software would just be faster by default, because very often big established tools are replaced by something that offers similar featurage, but offers it at a higher FPS.
I disagree. At least at the RTL level they're very similar. You don't really deal with physics there, except for timing (which is fairly analogous with software performance things like hard real-time constraints).
> Result is that there is not a 'right' way to do any one thing that everyone can converge on.
Are you trying to say there is in hardware? That must be why we have exactly one branch predictor design, lol
> The first airplane wing looks a whole lot like a wing made today, not because the people that designed it are "real engineers" or any such BS, but because that's what nature allows you to do.
"The first function call looks a whole lot like a function call today..."
What you and the GP said are not mutually exclusive. Software engineers are quick to drink every new Kool-Aid out there, which is exactly why we’re so damned blind to history and lessons learned before.
In my experience, a lot of the time the people who COULD be solving these issues are people who used to code or never have. The actual engineers who might do something like this aren't given authority or scope and you have MBAs or scrum masters in the way of actually solving problems.
I think this is too simple. First of all, hardware people have high incentive to fully replace components and systems for many reasons. Replacement is also the only way they can fix major design mistakes.
Software people constantly do fix bugs and design mistakes. There is certainly no strong culture to document or dig up former mistakes made, but it's not like they don't learn from mistakes, it's just a constant process. In contrast to hardware, there is usually no point in time to retrospect. The incentives to rejuvenate systems are low and if considered often seem expensive. Software engineers self motivation is often ill-minded, new devs feeling uncomfortable with the existing system and calling for something "modern". But if the time comes to replace the "legacy" systems, then you are right, no one looks back at the former mistakes and the devs that know them, are probably long gone.
The question is whether we should ever replace an software system or focus more on gradual and active modernization. But the latter can be very hard, in hardware everything is defined, most of the time backed by standards, in software we usually don't have that, so complex interconnected systems rarely have sane upgrade paths.
I know a lot of people on here will disagree with me saying this but this is exactly how you get an ecosystem like javascript being as fragmented, insecure, and "trend prone" as the old school Wordpress days. It's the same problems over and over and every new "generation" of programmers has to relearn the lessons of old.
The difficulty lies in the fact that most software is quite cheap to generate very complex designs compared to hardware. For software designs treated similarly to hardware (such as in medical devices or at NASA), you do gain back those benefits at great expense.
Most of the time, there's no need to study anything. Any experienced software engineer can tell you about a project they worked on with no real requirements, management constantly changing their mind, etc.
How do you study software history? Most of the lessons seem forever locked away behind corporate walls - any honest assessments made public will either end careers or start lawsuits
IME, "Why systems fail" almost always boils down to a principal-agent problem. This is another way of expressing the Mungerism "show me the incentive, I'll show you the outcome".
Systems that "work" tend to have some way of correcting for or mitigating the principal agent problem by aligning incentives.
I'd also point out that hardware is a much older discipline, in terms of how long it's been operating at scale. It's had more time to formalize and crystallize. Intel is 56 years old. Google is 27.
Some consequences of NOT learning from prior successes and failures: (a) no more training for the next generation of developers/engineers (b) fighting for the best developers, and this manifests in leetcode grinding (c) decrease in cooperation among team mates, etc.
This is an interesting distinction, but it ignores the reasons software engineers do that.
First, hardware engineers are dealing with the same laws of physics every time. Materials have known properties etc.
Software: there are few laws of physics (mostly performance and asymptotic complexity). Most software isnt anywhere near those boundaries so you get to pretend they dont exist. If you get to invent your own physics each time, yeah the process is going to look very different.
For most generations of hardware, you’re correct, but not all. For example, high-k was invented to mitigate tunneling. Sometimes, as geometries shrink, the physics involved does change.
I think there is a ton more nuance, but can still be explained by a simple observation, which TFA hints at: "It's the economics, stupid."
Engineering is the intersection of applied sciences, economics and business. The economics aspect is almost never recognized and explains many things. Projects of other disciplines have significantly higher costs and risks, which is why they require a lot more rigor. Taking hardware as example, one bad design decision can sink the entire company.
On the other hand, software has economics that span a much more diverse range than any other field. Consider:
- The capital costs are extremely low.
- Development can be extremely fast at the task level.
- Software, once produced, can be scaled almost limitlessly for very cheap almost instantly.
- The technology moves extremely fast. Most other engineering disciplines have not fundamentally changed in decades.
- The technology is infinitely flexible. Software for one thing can very easily be extended for an adjacent business need.
- The risks are often very low, but can be very high at the upper end. The rigor applied scales accordingly. Your LoB CRUD app going down might bother a handful of people, so who cares about tests? But your flight control software better be (and is) tested to hell and back.
- Projects vary drastically in stacks, scopes and risk profiles, but the talent pool is more or less common. This makes engineering culture absolutely critical because hiring is such a crapshoot.
- Extreme flexibility also masks the fact that complexity compounds very quickly. Abstractions enable elegant higher-level designs, but they mask internal details that almost always leak and introduce minor issues that cause compounding complexity.
- The business rules that software automates are extremely messy to begin with (80K payroll rules!) However, the combination of a) flexibility, b) speed, and c) scalability engender a false sense of confidence. Often no attempt is made at all to simplify business requirements, which is probably where the biggest wins hide. This is also what enables requirements to shift all the time, a prime cause for failures.
Worse, technical and business complexity can compound. E.g. its very easy to think "80K payroll rules linearly means O(80K) software modules" and not "wait, maybe those 80K payroll rules interact with each other, so it's probably a super-linear growth in complexity." Your architecture is then oriented towards the simplistic view, and needs hacks when business reality inevitably hits, which then start compounding complexity in the codebase.
And of course, if that's a contract up for bidding, your bid is going to be unsustainably low, which will be further depressed by the competitive bidding process.
If the true costs of a project -- which include human costs to the end users -- are not correctly evaluated, the design and rigor applied will be correspondingly out of whack.
As such I think most failures, in addition to regular old human issues like corruption, can be attributed to an insufficient appreciation of the economics involved, driven primarily by overindexing on the powers of software without an appreciation of the pitfalls.
As someone who's learning programming right now, do you have any suggestions on how one would go about finding and studying these successes and failures?
First, failures aren’t always obvious, and second, studying them isn’t either. This would likely need to be a formalized course. Still…
If people want to know why Microsoft hated DOS and wanted to kill it with Xenix, then OS/2, then Windows, and then NT it would be vital to know that it only came about as a result of IBM wanting a 16bit source-compatible CP/M which didn’t yet exist. Then, you would likely want to read Dissecting DOS to see what limits were imposed by DOS.
For other stuff, you would start backwards. Take the finished product and ask what the requirements were, then ask what the pain points are, then start digging through the source and flowcharting/mapping it. This part is a must because programs are often too difficult to really grok without some kind of map/chart.
There is likely an entire discipline to be created in this…
The things people are talking about in this thread are less to do with the practice of programming, and more to do with the difficulties of managing (and being managed, within) an engineering organization.
You'll learn all of this for yourself, on the job, just via experience.
To be cynical, what's the point? You'll get employed and forced to be a part of them by circumstances.
Your company's root priorites are probably at odds with writing good software.
One Japanese company, not going to name names, kept trying to treat software as a depreciating asset. I didn't really understand well but the long and short was that fixing things that were supposed to be "done" was bad for the accounting. New things, however were good.
How can you run a software company like that? But they did and got the kind of outcome you'd expect. Japan made the laws this way and gets software to match.
I think this is a downstream of effect of there being no real regulation or professional designations in software which leads to every company and team being wildly different leading to no standards leaving no time for anything but crunching since there are no barriers restricting your time, so nobody spends time doing much besides shipping constantly.
Software just feels so much more ephemeral than hardware. I haven't yet met a single 'old software enthusiast' in my life, yet there are so many enthusiasts for older hardware.
I am both a hardware and software enthusiast. Tons of DOS, Windows, and OS/2 software hanging around. While I don’t use them everyday, I do use them. From pre-Microsoft Visio to WordStar and MS Works for DOS, the applications are simple, powerful, and pleasing to use. While I don’t recommend anyone pull out Zenith 8bit and fire up COBOL-80 or LISP-80, they are interesting. Testing yourself in 64k is quite a challenge.
The retro community is huge and varied. If it exists, someone is really into it.
I have a pet passion for an old simulation language called Dynamo.
I think you will find people passionate about LISP and people that care about COBOL, and C is already multiple decades old.
... are you saying that hardware projects fail less than software ones? just building a bridge is something that fails on a regular occurence all over the world. Every chip comes with list of erratas longer than my arm.
Software folks treat their output as if it's their baby or their art projects.
Hardware folks just follow best practices and physics.
They're different problem spaces though, and having done both I think HW is much simpler and easier to get right. SW is often similar if you're working on a driver or some low-level piece of code. I tried to stay in systems software throughout my career for this reason. I like doing things 'right' and don't have much need to prove to anyone how clever I am.
I've met many SW folks who insist on thinking of themselves as rock stars. I don't think I've ever met a HW engineer with that attitude.
Having consulted on government projects - especially huge projects spanning dozens of government departments, what I have learnt is that the project is doomed right from the start. The specifications are written in such a way that it is impossible to get a working software which can address all of the millions (yes, literally) of specifications.
For instance, I had the opportunity to review an RFP put out by a state government for software to run a single state government. The specifications stated that a SINGLE software should be used for running the full administration of all of the departments of the government - including completely disparate things such as HR, CCTV management, AI enabled monitoring of rodents and other animals near/at warehouses, all healthcare facilities, recruitment, emergency response services etc...
ONE SOFTWARE for ALL of these!
There isn't a single company in the world who can write software to monitor rodents, hospital appointment booking, general payroll, etc. And since the integration required was so deep, it would be impossible to use existing best-of-breed software.. and everything has to be written from scratch.
How is such a software project ever going to suceeed?
This touches on the absolutely vital issue of domain knowledge. Everybody understands that you're not supposed to have the same people handle sewer maintenance and preschool teaching because these are two entirely separate skillsets. To an extent you can also treat kindergartens and treatment plants as black boxes that consume money and produce desired services.
For people who don't know much about programs it's sort of natural to assume that software engineering works the same way. Put in money and specs, get back programs. But of course it doesn't work like that, because software dev is not a single skillset. To write useful programs, you have to know how to code and understand the environment in which the program will be used.
But can this software monitor patients via CCTV and see if any of them are about to faint and call ER proactively for them? No? then your bid for the project will be discarded! :)
What about the CCTV monitoring software needing to verify if there are women in a particular room and trigger an alarm when too many men enter the area - I am not kidding, but this was really in the spec!
To be fair, that’s a rare exception. Most government tenders are quite narrow in scope.
What I have found is that they’re written by people with zero knowledge of either the solution requirements or the technology! Combine that with zero profit motive and zero personal consequences, and you can end up with total nonsense even on projects with billion dollar budgets.
A state school department here put out a tender for wiring over two thousand schools with fibre, but the way the contract was stipulated only a single applicant could win the contract and most handle every single location across a thousand miles of territory. Hence, only the largest incumbent telco could possibly win… which they did… at 15x the cost of a bunch of local contractors doing the work. This cost something like a billion dollars to taxpayers.
The excuse of the guy writing the tender was “it’s easier for me to get one contract signed than fifty.”
He’s a public servant getting paid $50K. He’s got nothing else on, no other pressing needs or distractions, but he’s too busy, you see? So much easier to waste a billion dollars to save himself a few months of effort.
On some of the infamous large public IT project failures, you just have to look at who gets the contract, how they work, and what their incentives are. (For example, don't hire management consulting partner smooth talkers, and their fleet of low-skilled seat-warmers, to do performative hours billing.)
It's also hard when the team actually cares, but there are skills you can learn. Early in my career, I got into solving some of the barriers to software project management (e.g., requirements analysis and otherwise understanding needs, sustainable architecture, work breakdown, estimation, general coordination, implementation technology).
But once you're a bit comfortable with the art and science of those, big new challenges are more about political and environment reality. It comes down to alignment and competence of: workers, internal team leadership, partners/vendors, customers, and investors/execs.
Discussing this is a little awkward, but maybe start with alignment, since most of the competence challenges are rooted in mis-alignments: never developing nor selecting for the skills that alignment would require.
> Early in my career, I got into solving some of the barriers to software project management (e.g., requirements analysis and otherwise understanding needs, sustainable architecture, work breakdown, estimation, general coordination, implementation technology).
Was there any literature or other findings that you came across that ended up clicking and working for you that you can recommend to us?
I could blather for hours around this space. A few random highlights:
* The very first thing I read about requirements was Weinberg, and it's still worth reading. (Even if you are a contracting house, with a hopeless client, and you want to go full reactive scrum participatory design, to unblock you for sprints with big blocks of billable hours, not caring how much unnecessary work you do... at least you will know what you're not doing.)
* When interviewing people about business or technical, learn to use a Data Flow Diagram. You can make it accessible to almost everyone, as you talk through it, and answer all sorts of questions, at a variety of levels. There are a bunch of other system modeling tools you can use, at times, but do not underestimate the usefulness and accessibility of a good DFD.
* If you can (or have to) plan at all, find and learn to use a serious Gantt chart centric planning tool (work breakdown, dependencies, resource allocations, milestones), and keep it up to date (which probably includes having it linked with whatever task-tracking tool you use, but you'll usually also be changing it for bigger-picture reasons too). Even if you are a hardware company, with some hard external-dependency milestones, you will be changing things around those unmoveables. And have everyone work from the same source of truth (everyone can see the same Gantt chart and the task
* Also learn some kind of Kanban-ish board for tasking, and have it be an alternative view on the same data that's behind the Gantt view and the tasks/issues that people can/should/are working on at the moment, and anything immediately getting blocked.
* In a rare disruptive startup emergency, know when to put aside Gantt, and fall back to an ad hoc text file or spreadsheet of chaos-handling prioritization that's changing multiple times per day. (But don't say that your startup is always in emergency mode and you can never plan anything, because usually there is time for a Kanban board, and usually you should all share an understanding of how those tasks fit into a larger plan, and trace back to your goals, even if it's exploratory or reactive.)
* Culture of communicating and documenting, in low-friction, high-value, accessible ways. Respect it as team-oriented and professional
* Avoid routine meetings; make it easy to get timely answers and discussion, as soon as possible. This includes reconsidering how accessible upper leadership should be: can you get closer to being responsive to the needs of the work on the project (e.g., if anyone needs a decision from the director/VP/etc., then quickly prep and ask, maybe with an async message, but don't wait for weekly status meeting or to schedule time on their calendar).
* Avoid unnecessary process. Avoid performances.
* People need blocks of time when they can get flow. Sometimes for plowing through a big chunk of stuff that only requires basic competence, and sometimes when harder thinking is required.
* Be very careful with individual performance metrics. Ideally you can incentive everyone to be aligned towards team success, through monetary incentives (e.g., real equity for which they can affect the value) and through culture (everyone around you seems to work as a team, and you like that, and that inspires you). I would even start by asking if we can compensate everyone equally, shun titles, etc., and how close can we practically get to that.
* Be honest about resume-driven-development. It doesn't have to be a secret misalignment. Don't let it be motivated solely as a secret goal of job-hoppers that is then lied about, or it will probably be to the detriment of your company (and also, that person will job-hop, fleeing the mess they made). If you're going to use new resume keyword framework for a project, the whole team should be honest that, say, there's elements of wanting to potentially get some win from it, wanting to trial it for possible greater use and build up organizational expertise in it, and also that it's a very conscious and honest perk for the workers to get to use the new toy.
* Infosec is an unholy dumpster fire, throughout almost the entire field. Decide if you want to do better, and if so, then back it up with real changes, not CYA theatre and what someone is trying to sell you.
* LeetCode frat pledging interviews select for so much misaligned thinking, and also signals that you are probably just more of the same as the low bar of our field, and people shouldn't take you seriously when you try to tell them you want to do things better.
* Nothing will work well if people aren't aligned and honest.
Most of the examples here are big government IT projects. But it's unfair to single out software projects here. There are a lot of big government projects that fail or face long and expensive delays. A lot of public sector spending is like that. In fact, you'd be hard pressed to find examples where everything worked on time and on budget.
Mostly the issues are non technical and grounded in a lack of accountability and being too big to fail. A lot of these failures are failing top down. Unrealistic expectations, hand wavy leadership, and then that gets translated into action. Once these big projects get going and are burning big budgets and it's obvious that they aren't working, people get very creative at finding ways to tap into these budgets.
Here in Germany, the airport in Berlin was opened only a few years ago after being stuck in limbo a decade after it was supposed to open and the opening was cancelled only 2 weeks before it was supposed to happen. It was hilarious, they had signs all over town announcing how they were going to shut down the highway so the interior of the old airport could be transported to the new one. I kid you not. They were going to move all the check-in counters and other stuff over and then bang on it for a day or two and then open the airport. Politicians, project leadership, etc. kept insisting it was all fine right up until the moment they could not possibly ignore the fact that there was lots wrong with the airport and that it wasn't going to open. It then took a decade to fix all that. There's a railway station in Stuttgart that is at this point very late in opening. Nuclear plant projects tend to be very late and over budget too.
Government IT projects aren't that different than these. It's a very similar dynamic. Big budgets, decision making is highly political, a lack of accountability, lots of top down pretending it's going to be fine, big budgets and companies looking to tap into those, and a lot of wishful thinking. These are all common ingredients in big project failures.
The software methodology is the least of the challenges these projects face.
There's an obvious imminent selection bias with government projects because they're by nature subject to public scrutiny, plus the stakeholders are literally everyone. Private companies can waste billions internally and it'll never make it into the news.
In my first big job in a big legacy company, 30% of ongoing effort was "how to implement this feature which needs a database without a database".
We also paid some security company to use it as a proxy in front of our server to implement some server redirects because it was simpler than configuring our own servers. Simple one-liner conf changes were a week of emails with support staff.
I think if we look at the lack of accountability it's obvious that one major problem is that many of these projects do heavily rely on contract work.
No company or gov in the world can supply the perfect brain- and manpower necessary on day one (on a huge and complex project that requires expert knowledge). So there is an prevalent delusion that talent just spawns at project kickoff and those people even care about what they do.
Maybe this is some artifact we carried over from the industrial era. We expect that complex machinery is built by expert companies over night and they just work, with a bit of maintenance and knowledge transfer. But software doesn't work like that.
Fundamentally this is not a statement about programming or software. It is a statement that management at almost all companies is abysmally inept and are hardly ever held to account.
Most sizeable software projects require understanding, in detail, what is needed by the business, what is essential and what is not, and whether any of that is changing over the lifetime of the project. I don't think I've ever been on a project where any of that was known, it was all guess work.
Management is always a huge problem, but software engineers left to their own devices can be just as bad.
I very rarely hear actual technical reasons for why a decision was made. They're almost always invented after the fact to retroactive justify some tool or design pattern the developer wanted to use. Capabilities and features get tacked on just because it's something someone wanted to do, not because they solve an actual problem or can be traced back to requirements in any meaningful way.
Frankly as an industry we could learn a lot from other engineering fields, aerospace and electrical engineering in particular. They aren't perfect, but in general they're much better at keeping technical decisions tied to requirements. Their processes tend to be too slow for our industry of course, but that doesn't mean there aren't lessons to be learned.
Exactly this. Not just large software projects tend to fail often; also large architectural and infrastructure projects do.
There are loads of examples, one famous one for instance is the Berlin Airport.
Management is bad at managing large projects. Whatever those projects are. In particular when third parties are involved that have a financial interest.
This is precisely the point of the article. I mean, it's right there at the top in that weird arrow-shaped infographic. It's _almost_ always a management issue.
Software projects fail because humans fail. Humans are the drivers of everything in our world. All government, business, culture, etc... it's all just humans. You can have a perfect "process" or "tool" to do a thing, but if the human using it sucks, the result will suck. This means that the people involved are what determines if the thing will succeed or fail. So you have to have the best people, with the best motivations, to have a chance for success.
The only thing that seems to change this is consequences. Take a random person and just ask them to do something, and whether they do it or not is just based on what they personally want. But when there's a law that tells them to do it, and enforcement of consequences if they don't, suddenly that random person is doing what they're supposed to. A motivation to do the right thing. It's still not a guarantee, but more often than not they'll work to avoid the consequences.
Therefore if you want software projects to stop failing, create laws that enforce doing the things in the project to ensure it succeeds. Create consequences big enough that people will actually do what's necessary. Like a law, that says how to build a thing to ensure it works, and how to test it, and then an independent inspection to ensure it was done right. Do that throughout the process, and impose some kind of consequence if those things aren't done. (the more responsibility, the bigger the consequence, so there's motivation commensurate with impact)
That's how we manage other large-scale physical projects. Of course those aren't guaranteed to work; large-scale public works projects often go over-budget and over-time. But I think those have the same flaw, in that there isn't enough of a consequence for each part of the process to encourage humans to do the right thing.
> Software projects fail because humans fail. Humans are the drivers of everything in our world.
Ah finally - I've had to scroll halfway down to find a key reason big software projects fail.
<rant>
I started programming in 1990 with PL/1 on IBM mainframes and for 35 years have dipped in and out of the software world. Every project I've seen fail was mainly down to people - egos, clashes, laziness, disinterest, inability to interact with end users, rudeness, lack of motivation, toxic team culture etc etc. It was rarely (never?) a major technical hurdle that scuppered a project. It was people and personalities, clashes and confusion.
</rant>
Of course the converse is also true - big software projects I've seen succeed were down to a few inspired leaders and/or engineers who set the tone. People with emotional intelligence, tact, clear vision, ability to really gather requirements and work with the end users. Leaders who treated their staff with dignity and respect. Of course, most of these projects were bland corporate business data ones... so not technically very challenging. But still big enough software projects.
Gez... don't know why I'm getting so emotional (!) But the hard-core sofware engineering world is all about people at the end of the day.
> big software projects I've seen succeed were down to a few inspired leaders and/or engineers who set the tone. People with emotional intelligence, tact, clear vision, ability to really gather requirements and work with the end users. Leaders who treated their staff with dignity and respect.
I completely agree. I would just like to add that this only works where the inspired leaders are properly incentivized!
> But I think those have the same flaw, in that there isn't enough of a consequence for each part of the process
If there was sufficient consequence for this stuff, no one would ever take on any risk. No large works would ever even be started because it would be either impossible or incredibly difficult to be completely sure everything will go to plan.
So instead we take a medium amount of caution and take on projects knowing it's possible for them to not work out or to go over budget.
If software engineers want to be referred to as "engineers" then they should actually learn about engineering failures. The industry and educational pipeline (formal and informal) as a whole is far more invested in butterfly chasing. It's immature in the sense that many people with decades of experience are unwilling to adopt many proven practices in large scale engineering projects because they "get in the way" and because they hold them accountable.
Surely you mean managers, right? Most developers I interact with would love to do things the right way, but there's just no time, we have to chase this week's priority!
> While hardware folks study and learn from the successes and failures of past hardware, software folks do not.
I guess that’s the real problem I have with SV’s endemic ageism.
I was personally offended, when I encountered it, myself, but that’s long past.
I just find it offensive, that experience is ignored, or even shunned.
I started in hardware, and we all had a reverence for our legacy. It did not prevent us from pursuing new/shiny, but we never ignored the lessons of the past.
Why do you find it offensive? It’s not personal. Someone who thought webvan was a great lesson in hubris could not have built an Instacart, right? Even evolution shuns experience, all but throwing most of it out each generation, with a scant few species as exceptions.
> Someone who thought webvan was a great lesson in hubris could not have built an Instacart, right?
Not at all. The mistake to learn from in Webvan's case was expanding too quickly and investing in expensive infrastructure all before achieving product-market fit. Not that they delivered groceries.
I think you're mistaking the funding and starting of companies with the execution of their vision through software engineering -- the entire point of the article, and the OP.
This is a classic straw man argument, which depends on the assumption that all people of a certain age would think a certain way.
Also, your understanding of evolution is incorrect. All species on Earth are the results of an enormous amount of accumulated "experience", over periods of up to billions of years. Even the bacteria we have today took hundreds of millions of years to reach anything similar to their current form.
I have never seen an industry that works so hard to self-immolate.
right now the industry is spending billions / trillions of $ on to train A.I on badly written open source code.
in universities we teach kids - DSA - but never about to really think about scoping work nor even the unix principle of how software should compose, how to prevent errors etc. hell how many working professionals know about the 10 NASA principles & actually use them in practice ?
we encourage the bright kids to go work at places which are the cathedrals of complexity - but never seeking simple solutions. & again merchants of complexity get paid more - find it easier to find jobs etc.
the tragedy is the failures are documented, but also the fixes to failures as well.
soon enough we're gonna raise a whole generation who doesn't know how to make reliable, robust software from scratch cz of 'vibecoding'. then ultimately civilization collapses.
I think GP means NASA's 10 rules, but these are specifically for C development in mission-critical systems. There's been the odd software issue at NASA, but supposedly their 10 rules should make C projects much safer.
So, I'm not a dev nor a project manager but I found this article very enlightening. At the risk of asking a stupid question and getting a RTFM or a LMGTFY can anyone provide any simple and practical examples of software successes at a big scale. I work at a hospital so healthcare specific would be ideal but I'll take anything.
FWIW I have read The Phoenix Project and it did help me get a better understanding of "Agile" and the DevOps mindset but since it's not something I apply in my work routinely it's hard to keep it fresh.
My goal is to try and install seeds of success in the small projects I work on and eventually ask questions to get people to think in a similar perspective.
Even though there were some benefits to the modularity of Multics (apparently you could unload and replace hardware in Multics servers without reboot, which was unheard of at the time), it was also its downfall. Multics was eventually deemed over-engineered and too difficult to work with. It couldn't evolve fast enough with the changing technological landscape. Bell Labs' conclusion after the project was shelved was that OSs were too costly and too difficult to design. They told engineers that no one should work on OSs.
Ken Thompson wanted a modern OS so he disregarded these instructions. He used some of the expertise he gained while working on Multics and wrote Unix for himself (in three weeks, in assembly). People started looking over Thompson's shoulder being like "Hey what OS are you using there, can I get a copy?" and the rest is history.
Brian Kernighan described Unix as "one of" whatever Multics was "multiple of". Linux eventually adopted a similar architecture.
This is a noble and ambitious goal. I feel qualified to provide some pointers, not because I have been instrumental in delivering hugely successful projects, but because I have been involved, in various ways, in many, many failed projects. Take what you will from that :-)
- Define "success" early on. This usually doesn't mean meeting a deadline on time and budget. That is actually the start of the real goal. The real success should be determined months or years later, once the software and processes have been used in a production business environment.
- Pay attention to Conways Law. Fight this at your peril.
- Beware of the risk of key people. This means if there is a single person who knows everything, you have a risk if they leave or get sick. Redundancy needs to be built into the team, not just the hardware/architecture.
- No one cares about preventing fires from starting. They do care about fighting fires late in the project and looking like a hero. Sometimes you just need to let things burn.
- Be prepared to say "no", alot. (This is probably the most important one, and the hardest.)
- Define ownership early. If no one is clearly responsible for the key deliverables, you are doomed.
- Consider the human aspect as equally as the technical. People don't like change. You will be introducing alot of change. Balancing this needs to be built into the project at every stage.
- Plan for the worst, hope for the best. Don't assume things will work the way you want them to. Test _everything_, always.
>No one cares about preventing fires from starting. They do care about fighting fires late in the project and looking like a hero. Sometimes you just need to let things burn.
As a Californian, I hate this mentality so much. Why can't we just have a smooth release with minimal drama because we planned well? Maybe we could properly fix some tech debt or even polish up some features if we're not spending the last 2 months crunching on some showstopper that was pointed out a year ago.
I find it kind of hard to define success or failure. Google search and Facebook are a success right? And they were able to scale up as needed, which can be hard. But the way they started is very different from a government agency or massive corporation trying to orchestrate it from scratch. I don't know if you'd be familiar with this, but maybe healthcare.gov is a good example... it was notoriously buggy, but after some time and a lot of intense pressure it was dealt with.
The untold story is of landing software projects at Google. Google has landed countless software projects internally in order for Google.com to continue working, and the story of those will never reach the light of day, except in back room conversations never to be shared publicly. How did they go from internal platform product version one to version two? it's an amazing feat of engineering that can't be shown to the public, which is a loss for humanity, honestly, but capitalism isn't going to have it any other way.
I don't think you should focus on successful large projects. Generally you should consider that all big successes are outliers from a myriad of attempts. They have been lucky and you can't reproduce luck.
I'd like try to correct your course a bit.
DevOps is a trash concept, that had good intentions. But today it's just an industry cheatcode to fill three dev positions with a single one that is on pager duty. The good takeaways from it: Make people care that things work end to end. If Joe isn't caring about Bob's problems, something is off. Either with the process, or with the people.
Agile is a very loose term nowadays. Broadly spoken it's the opposite of making big up front plans and implement them in a big swipe. Agile wants to start small and improve it iteratively as needed. This tends to work in the industry, but the iterative time buckets have issues, some teams can move fast in 2 week cycles, others don't. The original agile movement also wanted to give back control and autonomy back to those who actually do stuff (devs and lower management). This is very well intended and highly functional, but is often buried or ignored in corporate environments. Autonomy is extremely valuable, it motivates people and fosters personal growth, but being backed by a skilled peers also creates psychological safety. One of the major complaints I hear about agile practices is that there are too many routines, meetings and other in person tasks with low value that keep you from working. This is really bad and in my perception was never intended, but companies love that shit. This part is about communication, make it easy for people to share and engage, while also keeping their focus hours high. Problems have to bubble up quickly and everyone should be motivated and able to help solving them. If you listen to agile hardliners, they will also tell you that software can't be reliably planned, you won't make deadlines, none of them, never. That is very true, but companies are unable to deal with it.
I don't like this as a metric of success, because who came up with the budget in the first place?
If they did a good job and you're still 97% over then sure, not successful.
But if the initial budget was a dream with no basis in reality then 97% over budget may simply have been "the cost of doing business".
It's easier to say what the budget could be when you're doing something that has already been done a dozen times (as skyscraper construction used to be for New York City). It's harder when the effort is novel, as is often the case for software projects since even "do an ERP project for this organization" can be wildly different in terms of requirements and constraints.
That's why the other comment about big projects ideally being evolutions of small projects is so important. It's nearly impossible to accurately forecast a budget for something where even the basic user needs aren't yet understood, so the best way to bound the amount of budget/cost mismatch is to bound the size of the initial effort.
Also, anything that T. Capers Jones wrote. The most comprehensive one of these books is this:
Estimating Software Costs: Bringing Realism to Estimating
Hardcover
ISBN-13978-0071483001
Many believe the official recognition of the crisis in developing software were the two NATO conferences in 1968 and 1969.
See the Wikipedia article on the History of Software Engineering.
There have been two small scale experimental comparisons of the waterfall formal model (requirements, design, code, test) and the more prototyping and agile method. They seem to have the same productivity in terms of lines per programmer-month but the formal method tends to produce larger software.
I've started calling it EDD - Executive Driven Development.
Senior stakeholders get into a room, decide they need xyz for the project to really succeed, push this down to managers who in turn try perform miracles with what little development resource they have. Very often they also have an offshore team who is only concerned with prolonging the contract as much as possible, rather than delivering. 2 weeks later senior stakeholders get back into the room...
Oh they TRY to ... it's just that the "non-technical unqualified people" get brought to heal (usually) by regulations. I've been in the room where people have tried to force a decision and a PEng, Lawyer, or CA/CPA had to say "absolutely not". It happens all the time, which is why you NEED regulations.
"Worse" won't even start to describe the economical crisis we will be in once the bubble bursts.
And although that, in itself, should be scary enough, it is nothing compared to the political tsunami and unrest it will bring in its wake.
Most of the Western world is already on shaky political ground, flirting with the extreme-right. The US is even worse, with a pathologically incompetent administration of sociopaths, fully incapable of coming up with the measures necessary to slow down the train of doom careening out of control towards the proverbial cliff of societal collapse.
If the societal tensions are already close to breaking point now, in a period of relative economical prosperity, I cannot start to imagine what they will be like once the next financial crash hits. Especially one in the multi trillion of dollars.
They say that humanity progresses through episodes of turmoil and crisis. Now that we literally have all the knowledge of the world at our fingertips, maybe it is time to progress past this inadequate primeval advancement mechanism, and to truly enter an enlightened age where progress is made from understanding, instead of crises.
Unfortunately, it looks like it's going to take monumental changes to stop the parasites and the sociopaths from making at quick buck at the expense of humanity.
Not really, by most indications AI seems to be an amplifier more than anything else. If you have strong discipline and quality control processes it amplifies your throughput, but if you don't, it amplifies your problems. (E.g. see the DORA 2025 report.)
So basically things will still go where they were always going to go, just a lot faster. That's not necessarily a bad thing.
The failed UK NHS IT project, known as the National Programme for IT (NPfIT), cost the UK government over £10 billion and produced almost nothing of value. I'm surprised that didn't get a mention.
Again those bastards, Fujitsu, were involved. They even sued UK government and won £465 million settlement when their contract was cancelled. But, despite this and their complicity in covering up the failures of the Horizons posts office system, the UK government is still giving them fat contracts.
If senior managers can preside over a massive failure and walk away with a huge pension, there isn't much incentive for them to do better, is there?
The GOV.UK project is a rare success story for IT in the UK government. They need to take that team, give them big incentives to stay, and give them more projects. Why are we outsourcing to international companies that don’t give a shit when we have a wealth of talent at home? Why aren’t we investing in our own people?
The people who did the UK COVID app also did a good job, as far as I am aware. The lessons seems to be that it is better to employ a small, experienced and talented team and get out of their way, than outsource to a huge government contractor.
Despite its overall failure, some parts of the infrastructure and national applications, such as the Summary Care Record and the Electronic Prescriptions Service, are considered to have survived and continue to be used
>Global IT spending has more than tripled in constant 2025 dollars since 2005, from US $1.7 trillion to $5.6 trillion, and continues to rise. Despite additional spending, software success rates have not markedly improved in the past two decades.
Okay but how much more software is used ? If IT spending has tripled since 2005 but we use 10x more software I'd say the trend is good.
Success rates imply a ratio. Constant dollars are adjusted.
Yes there is a lot more spending overall. But nothing improved quality wise, despite everyone in software somehow says they "make software better". (Which is phrased by people that don't do software, but own it.)
The point is not that the growth of IT spending is bad. That was just to show the scale of spending.
The point of the article is that, a billion spent on software could well lead to a loss of hundred billion.
The difference between success and failure of large projects comes down to technical leadership. I've seen it time and time again. Projects that are managed by external consulting companies (name brand or otherwise) have a very poor track record of delivering. An in-house technical lead that is committed to the success of the project will always do better. And yes, this technical lead must have the authority to limit the scope of the system rewrite. Endless scope creep is a recipe for failure. Outside consulting firms will never say "No" to any new request - it means more business for them - their goals are not aligned with the client.
Do non-software projects succeed at a higher rate in any industry? I get the impression that projects everywhere go over time, over budget, and frequently get canceled.
How many bridges have you used that have collapsed? How much software have you used that has been broken or otherwise not served your interests? If we built the rest of society like we build software, millions of people would be dead.
The reason bridges don't fail often is because they over-build them. There's no obvious equivalent with software. One mistake in a large code base can make it fail.
Bridges to go ridiculously over budget and schedule all the time, however.
the UK Post Office scandal would be the equivalent of the Morandi bridge collapsing - the big, catastrophic failure you hope to see few times in your lifetime.
but bridges collapsing is not the only failure mode for non software projects. I know plenty of newly built houses that had serious issues with insulation, wiring, painting, heating infrastructure, etc.
Systematic decimation of test teams, elimination of test managers, and contemptuous treatment of the role of tester over the past 40 years has not yet led to a more responsible software industry. But maybe if we started burning testers at the stake all these problems will go away?
Many specialties were eliminated / absorbed over the past few decades. I started working almost 30 years ago. Today, I rarely see dedicated testers, just like I rarely see dedicated DBAs. Sysadmins went away with the "DevOps" movement. Now they are cloud engineers who are more likely to understand a vendor-specific implementation than networking fundamentals.
Except testers are needed. Testing is not merely a technical role. It's a social role. It's a commitment to think differently from the rest of the team. We do this because that provides insurance against complacency.
But, by the nature of testing, we testers are outsiders. No one is fully comfortable with a tester in the room, unless they are more afraid of failure than irritation.
> “A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.”
Oh, it's much more interesting than that. Phoenix started as an attempt to create a gun registry. Ottawa had a bunch of civil servants that'd be reasonably compotent at overseeing such a thing, but the government decided that it wanted to build it in Miramichi, New Brunswick. The relevant people refused to move to Miramichi, so the project was built using IBM contractors and newbies. The resulting fiasco was highly predictable.
Then when Harper came in he killed the registry mostly for ideological reasons.
But then he didn't want to destroy a bunch of jobs in Miramichi, so he gave them another project to turn into a fiasco.
New Zealand tried to do a new payroll system for teachers called Novopay which imploded spectacularly and is still creating grief. The system is now called EdPay (the government decided to take over the privately created system). The total cost of the debacle was north of $200M NZD. Somehow they managed to fail to replace a working system!
Software development, like most other things, are part of the same make-believe market, that we run our societies in, in most countries around the world. Lets face it, most of the big money in software is believe money, not actual proven value of a thing. The word "evaluation" sort of already tells us this. It's not fact checking "How much did they sell?" or "How many users bought access or a license?", it is "How much do we believe in the future of this thing?" and risky investment "How much could we make, if this thing takes off?".
For software, I am not sure this is helpful. Maybe we would develop way less trash software, if it was different. But then again we would probably still develop engagement farming software, because people would still use or buy that.
As someone that has seen technological solutions applied when they make no sense, I think the next revolution in business processes will be de-computerization. The trend has probably already started thank to one of the major cloud outages.
> Phoenix project executives believed they could deliver a modernized payment system, customizing PeopleSoft’s off-the-shelf payroll package to follow 80,000 pay rules spanning 105 collective agreements with federal public-service unions.
Somehow I come away skeptical of the inevitable conclusion that Phoenix was doomed to fail and instead that perhaps they were hamstrung by architecture constraints dictated by assholes.
Arbitrary payroll is absurdly complicated. The trick is to not make it arbitrary - have a limited amount of stuff you do, and always have backdoors to manually pushing data through payroll.
Sounds like they put zero effort into simplifying those rules the first time around.
Now in the new project they put together a committee to attempt it
> The main objective of this committee also includes simplifying the pay rules for public servants, in order to reduce the complexity of the development of Phoenix's replacement. This complexity of the current pay rules is a result of "negotiated rules for pay and benefits over 60 years that are specific to each of over 80 occupational groups in the public service." making it difficult to develop a single solution which can handle each occupational groups specific needs.
Because you don’t just rewrite all your payroll systems with hundreds of variations in one go. That will never work. But they keep trying it.
You update the system for one small piece, while reconciling with the larger system. Then replace other pieces over time, broadening your scope until you have improved the entire system. There is no other way to succeed without massive pain.
> Finally, project cost-benefit justifications of software developments rarely consider the financial and emotional distress placed on end users of IT systems when something goes wrong.
Most users and most management of software projects live in denial that the norm is dystopia.
I can’t help think of any required and useful feature that has happened in computer usage since the early days.
Easier to swallow is that the user interface of desktop operating systems hasn’t changed fundamentally in many years, yet hardware requirements continue to grow.
But even the invention of a mouse requires excessive movement to move a pointer to click on something that a key combination could’ve done much more quickly. The original intention of the mouse was just as another device to use, not necessarily a primary device to direct the majority of workflow.
From a dark storage area I may someday again get out an early Sceptre gaming monitor from the DOS days.
I held on to it throughout the 1990's precisely because it was not a plug & play monitor and it was real good to install Windows with so nothing would interfere with higher resolution alternative graphics you were going to install later.
Now by the 21st century it was seldom seen but these were well-made and it still worked, however the most obsolete feature that got the most interest was the sleek aftermarket plastic accessory unit attached to the side of the monitor with those sticky 3M tacky pads that are so tenacious.
Yes, you've all seen it and remember it fondly, the mouse holder.
Kind of like a custom cup holder that fits the standard mouse perfectly, it's obviously where you keep your mouse most of the time, except for those rare occasions when you dabble in a bit of software which actually supports a mouse.
You want to keep it out of the way of your everyday desktop activities :)
In the book "How Big Things Get Done" [1], Bent Flyvbjerg, among other things, identifies one common feature of the projects that do not have large outliers to go over-budget and under-deliver: modularity. Ideally, fractal modularity. His favorite examples: solar power, electric transmission, pipelines, roads. Ironically, IT/software is only slightly better than nuclear power and Olympic games [2].
>Frustratingly, the IT community stubbornly fails to learn from prior failures.
So far I believe that there has been too much emphasis in education on coding and algorithms (small scale/tactical stuff) and not enough emphasis on the engineering side of things like version control, QA, system design, management etc. I think the situation has changed (40 years ago most professional programmers didn't even know what version control was, let alone use vc systems) but the scope of the projects has increased faster than our skills and tools.
Please forgive me if I get something wrong. Not a native English speaker. The article boils it down to: all is a management failure. This is also my feeling after 35 years in software development. There are no such thing than a competent middle or upper management in software development. I see sometimes even devs being promoted and in an instant forget how software is made. In the other hand I see promotion of the most stupid dev. Al this leads to massive Missmanagements and hiding of problems to the upper managers. Even worse sometimes I see the best devs promoted only to watch them break because the toxin they get from there managers kills them
Frederick Brooks in his essay "No Silver Bullet" (included in the collection Mythical Man Month) talked about the conventions of software development and I recall had called for taking an iterative approach to software development similar to what I had followed for the Automunge project, I went into a little more detail about that in my 2019 essay of the same name:
https://medium.com/automunge/no-silver-bullet-95c77bc4bde1
This is what I’ve been thinking about when I talk to other people in software development when they can’t stop talking about how efficient they are with AI… yet they didn’t ship anything in their startup, or side project, or in a corporate setting, the project is still bug riddled, the performance is poor, now there code quality suffers too as people barely read what Cursor (etc) are spitting out.
I have “magical moments” with these tools, sometimes they solve bugs and implement features in 5 minutes that I couldn’t do in a day… at the same time, quite often they are completely useless and cause you to waste time explaining things that you could probably just code yourself much faster.
I'm pretty sure that we can remove the word "software" from the article headline and it remains just as true. I don't believe that software projects are unique in this regard: big, complex projects are big and complex, and prone to unexpected issues, scope creep, etc. Throw in multiple stakeholders, ineffective management, the sunk cost fallacy etc. and it's a wonder that any large projects get finished at all.
Yup, and with an equal amount of mindblowing-units-of-money spent, infrastructure projects all around me are still failing as well, or at least being modified (read: downsized), delayed and/or budget-inflated beyond recognition.
So, what's the point here, exactly? "Only licensed engineers as codified by (local!) law are allowed to do projects?" Nah, can't be it, their track record still has too many failures, sometimes even spectacularly explosive and/or implosive ones.
"Any public project should only follow Best Practices"? Sure... "And only make The People feel good"... Incoherent!
Ehhm, so, yeah, maybe things are just complicated, and we should focus more on the amount of effort we're prepared to put in, the competency (c.q. pay grade) of the staff we're willing to assign, and exactly how long we're willing to wait prior to conceding defeat?
Large scale systems tend to fail. large centralised and centrally managed systems with big budgets and large numbers of people who need to coordinate, lots of people with an interest in the project pushing and lobbying for different things.
Multiple smaller systems is usually a better approach, where possible. Not possible for things like transport infrastructure, but often possible for software.
> Not possible for things like transport infrastructure
It depends what you define as a system. Arguably a lot of transport infrastructure is a bunch of small systems linked with well-understood interfaces (e.g. everyone agrees on the gauge of rail that's going to be installed and the voltage in the wires).
Consider how construction works in practice. There are hundreds or thousands of workers working on different parts of the overall project and each of them makes small decisions as part of their work to achieve the goal. For example, the electrical wiring of a single train station is its own self-contained system. It's necessary for the station to work, but it doesn't really depend on how the electrical system is installed in the next station in the line. The electricians installing the wiring make a bunch of tiny decisions about how and where the wires are run that are beyond the ability of someone to specify centrally - but thanks to well known best practices and standards, everything works when hooked up together.
In manufacturing there are economies of scale and adding more people increases workforce, in mindfacturing there are diseconomies of scale and adding more people increases confusion, yet many managers view software with a manufacturing mindset.
Nailed it, but I fear this wisdom will be easily passed by by someone who doesn’t already intuit it from years of experience. Like the Island de la Muerta: wisdom that can only be found if you already know where it is.
>For the foreseeable future, there are hard limits on what AI can bring to the table in controlling and managing the myriad intersections and trade-offs
?? I don’t think if thinks stopped advancing in terms of significant model improvements that actual utility would be saturated for a while. We have barely begun to consolidate potential into the tooling, use cases, knowledge sharing and depth of that knowledge throughout the workforce on how to make best use.
If someone is looking at AI as a monolith in thing and thinking “oh, silver bullet to the problems of enterprise software etc” then, I really don’t know what to say except that’s on them, not on any true big claims being pushed unless you’re breaking out the long ladders to pick those cherries, or listening to people whose background and placement within things clearly makes them a bad messenger.
Looking at other domains where companies are developing complex products in highly regulated industries there’s one thing they all share in common, they invest a lot of capital in infrastructure for testing their designs. I spent years at a company trying to convince upper management in setting up a lab where we could simulate a production environment that would allow us to do a real integration test. It’s an idea hard to sell because testing is actually part of the budget in every project, so lack of testing couldn’t be attributed to our high rate of failures (going over budget fixing bugs during commissioning). Perhaps we should stop calling unit testing, testing so that we don’t confuse people. Until you we don’t put all the pieces together and do a proper stress test under close-to-realistic production conditions, our software cannot be considered tested. I think that’s the case for 99% of software companies.
Plausible article, but it reads like a preschooler frustrated that his new toy is broken. "Fix it! Make it work!" - without ever specifying how.
Granted, this is an exceedingly hard problem, and I suppose there's some value in reminding ourselves of it - but I'd much rather read thoughts on how to do it better, not just complaints that we're doing it poorly.
I wonder how much software project failure comes from lacking clear processes. Many teams, whether in companies or open source projects, never define step by step procedures for common tasks like feature development, bug triage, code review, or architectural decisions. When six developers each follow their own approach, even with consistent code style, the team can’t improve the system in a predictable and systematic way. Clear procedures don’t guarantee success, but without them teams often end up with chaos and inconsistent delivery. This lack of structured methodology seems far more common in software engineering than in other engineering disciplines.
This should be a criticism of the kinds of bloated firms that take on large government projects, the kinds of people they hire, the incentives at play, the bidding processes, the corruption and all the rest. It has very little to do with software and more just organizations that don't face any pressure to deliver.
> "Why worry about something that isn’t going to happen?”
Lots to break down in this article other than this initial quotation, but I find a lot of parallels in failing software projects, this attitude, and my recent hyper-fixation (seems to spark up again every few years), the sinking of the Titanic.
It was a combination of failures like this. Why was the captain going full speed ahead into a known ice field? Well, the boat can't sink and there (may have been) organizational pressure to arrive at a certain time in new york (aka, imaginary deadline must be met). Why wasn't there enough life jackets and boats for crew and passengers? Well, the boat can't sink anyway, why worry about something that isn't going to happen? Why train crew on how to deploy the life rafts and emergency procedures properly? Same reason. Why didn't the SS Californian rescue the ship? Well, the 3rd party Titanic telegraph operators had immense pressure to send telegrams to NY, and the chatter about the ice field got on their nerves and they mostly ignored it (misaligned priorities). If even a little caution and forward thinking was used, the death toll would have been drastically lower if not nearly nonexistent. It took 2 hours to sink, which is plenty of time to evacuate a boat of that size.
Same with software projects - they often fail over a period of multiple years and if you go back and look at how they went wrong, there often are numerous points and decisions made that could have reversed course, yet, often the opposite happens - management digs in even more. Project timelines are optimistic to the point of delusion and don't build in failure/setbacks into schedules or roadmaps at all. I've had to rescue one of these projects several years ago and it took a toll on me I'm pretty sure I carry to this day, I'm wildly cynical of "project management" as it relates to IT/devops.
> and my recent hyper-fixation (seems to spark up again every few years), the sinking of the Titanic.
But the rest of your comment reveals nothing novel other than anyone would find after watching James Cameron's movie multiple times.
I suggest you go to the original inquiries (congressional in the US, Board of trade in the UK). There is a wealth of subtle lessons there.
Hint: Look at the Admiralty Manual of Seamanship that was current at that time and their recommendations when faced with an iceberg.
Hint: Look at the Board of Trade (UK) experiments with the turning behaviour of the sister ship. In particular of interest is the engine layout of the Titanic and the attempt by the crew, inexperienced with the ship, to avoid the iceberg. This was critical to the outcome.
Hint: Look at the behaviour of Captain Rostron. Lots of lessons there.
Thanks for your feedback, I’m well aware of the inquiries and the history there. However, this post was meant to be a simple analogy that related to the broader topic, not a deep dive into the theories of how and why the titanic sank. Thanks!
The lesson from “big software projects are still failing” isn’t that we need better methodologies, better project management, or stricter controls. The lesson is "don't do big software projects".
Software is not the same as building in the physical world where we get economies of scale.
Building 1,000 bridges will make the cost of the next incremental bridge cheaper due to a zillion factors, even if Bridge #1 is built from sticks (we'll learn standards, stable, fundamental engineering principles, predicable failure modes, etc.) we'll eventually reach a stable, repeatable, scalable approach to build bridges. They will very rarely (in modernity) catastrophically fail (yes, Tacoma Narrows happened but in properly functioning societies it's rare.)
Nobody will say "I want to build a bridge upside-down, out of paper clips and can withstand a 747 driving over it". Because that's physically impossible. But nothing's impossible in software.
Software isn't scalable in this way. It's not scalable because it doesn't have hard constraints (like the laws of physics) - so anything goes and can be in scope; and since writing and integrating large amounts of code is a communication exercise, suffers from diseconomies of scale.
Customers want the software to do exactly what they want and - within reason - no laws of physics are violated if you move a button or implement some business process.
Because everyone wants to keep working the way they want to work, no software project (even if it sounds the same) is the same. Your company's bespoke accounting software will be different than mine, even if we are direct competitors in the same market. Our business processes are different, org structures are different, sales processes are different, etc.. So they all build different accounting software, even if the fundamentals (GaaP, double-entry bookkeeping, etc.) are shared.
It's also the same reason why enterprise software sucks - do you think that a startup building expense management starts off being a giant mess of garbage? No! IT starts off simple and clean and beautiful because their initial customer base (startups) are beggars and cannot be choosers, so they adapt their process to the tool. But then larger companies come along with dissimilar requirements and, Expense Management SaaS Co. wins that deal by changing the product to work with whatever oddball requirements they have, and so on, until the product essentially is a bunch of config options and workflows that you have to build yourself.
(Interestingly, I think these products become asymptotically stuck - any feature you add or remove will make some of your customers happy and some of your customers mad, so the product can never get "better" globally).
We can have all the retrospectives and learnings we want but the goal - "Build big software" - is intractable, and as long as we keep trying to do that, we will inevitably fail. This is not a systems problem that we can fix.
The lesson is: "never build big software".
(Small software is stuff like Bezos' two pizza team w/APIs etc. - many small things make a big thing)
I agree with you on "don't do big software project" Specially do not fast scale them out to hundreds of people. You have to scale them more organically ensuring that every person added is a net gain. They think that adding more people will reduce the time.
I am surprised on the lack of creativity when doing these projects. Why don't they start 5 small projects building the same thing and let them work for a year. At the end of the year you cancel one of the projects, increasing the funding in the other four. You can do that every year based on the results. It may look like a waste but it will significantly increase your chances of succeeding.
>Building 1,000 bridges will make the cost of the next incremental bridge cheaper due to a zillion factors, even if Bridge #1 is built from sticks (we'll learn standards, stable, fundamental engineering principles, predicable failure modes, etc.) we'll eventually reach a stable, repeatable, scalable approach to build bridges. They will very rarely (in modernity) catastrophically fail (yes, Tacoma Narrows happened but in properly functioning societies it's rare.)
Build 1000 JSON parsers and tell me if the next one isn't cheaper to develop with "(we'll learn standards, stable, fundamental engineering principles, predicable failure modes, etc.)"
>Software isn't scalable in this way. It's not scalable because it doesn't have hard constraints (like the laws of physics)
Uh, maybe fewer but none is way to far. Get 2 billion integer operations per second out of a 286, the 500 mile email, big data storage, etc. Physical limits are everywhere.
>It's also the same reason why enterprise software sucks.
The reason enterprise software sucks is because the lack of introspection and learning from the garbage that went before.
Working on AI that helps to manage IT shops that learns from failure & success might be better for both results and culture than most IT management roles, a profession (painting an absurdly broad brush) that tends to attract a lot of miserable creatures.
LLMs themselves don’t learn but AI systems based around LLMs can absolutely learn! Not on their own but as part of a broader system: RLHF leveraging LoRAs that get re-incorporated as model fine tunings regularly, natural language processing for context aggregation, creative use of context retrieval with embeddings databases updated in real time, etc.
A slightly different take, its probably more of people failure, the lack of required expertise, skillset, motivation and coordination. People have motivations to do the job to make a living, success of any long term project is rarely the driving factor for most people working on it. People would know ahead of time when a project is going towards the direction of failure, its just how the things are structured.
From systems perspective, an unknown system/requirement would be a good example where you build iteratively, a known set of requirements should give good enough idea about the feasibility and rough timelines even if its complex.
there you are why its failling, the fact that these system is overly massive and complex that sometimes the original creator and architecture that design this system cant foresee the future needs and etc
You could says its incompetence but the fact that software change so much is last 20 years make it most people cant really design a "future proof" system in a way that it cant cause trouble in the future
>By then, the general public and the branch managers themselves finally joined Computer Weekly’s reporters (who had doggedly reported on Horizon’s problems since 2008) in the knowledge that there was something seriously wrong with Horizon’s software.
Computer Weekly first broke the story, for which they deserve much credit. But I believe Private Eye did much of the long term campaigning.
I like that the author propagates software developer liability. That makes sense. Unless we introduce such system, the incentives are not there to avoid failure
It's good that the author makes the distinction between developers and managers. This distinction is rarely made and most media outlets talk about the wrongdoings of developers, who are almost never the decision makers of failing projects. It's quite the opposite, they are the ones who if brave enough criticize bad management practices and the lack of proper management of the software project.
In my very humble opinion, the impact that software has on our lives is getting to the point where software engineering should become a true profession like the other engineering branches (electrical, mechanical, etc).
There should be things like professional certifications that engineers have to maintain through continuous education, a professional code of ethics, a board of review, and other functions.
My reasoning is that we are at the point where a software "engineer" can make a mistake that can have the same impact as a civil engineer making a bad calculation and causing a bridge collapse.
There's different levels to this, of course. An app for booking restaurant reservations wouldn't need that much oversight. But we've seen some outages having massive impacts that quite frankly did not happy twenty years ago.
The most mind boggling number in this article to me was PeopleSoft claiming it would cost $500 million to make a payroll system for the Canadian government. That’s thousands of working years of software developers. It’s such a huge scale that it seems pretty clear the project should never start. PeopleSoft should have been dumped and the project’s scope massively reevaluated.
Failure typically comes from two directions. Unknown and changing requirements, and management that relies on (often external) technical (engineering) leadership that is too often incompetent.
These projects are often characterized by very complex functional requirements, yet are undertaken by those who primarily only know (and endlessly argue about) non-functional requirements.
So I haven't looked through the comments, and assume this has been discussed, but the simple solution is to limit contracts to, say, $4M, and pay only on successful completion. Then build a large project through a series of smaller steps.
The main problem are incentives and risks: in most of the cases you are not incentivized to build secure and reliability SW because, most of the time, it's easy to fix it. With particular categories of SW(eg. one distributed on remote system, medical sw, military sw) or HW it's the opposite: a failure it's not so easy to fix so you are incentivized to do a better job.
Every improvement will be moderated increased demands from management, crunch, pressure to release, "good enough", add this extra library that monetizes/spys on the customer etc
In the same way that hardware improvements are quickly gobbled up by more demanding software.
The people doing the programming will also be more removed technically.
I can do Python, Java , Kotlin. I can do a little C++ ,less C, and a lot less assembly.
An endless succession of new tools, methodologies, and roles but failure persists because success is rooted in good judgment, wisdom, and common sense.
This has dot-com bubble written all over it. But there are some deeper issues.
First, we as a society should really be scrutinizing what we invest in. Trillions of dollars could end homelessness as a rounding error.
Second, real people are going to be punished for this as the layoffs go into overdrive, people lose their houses and people struggle to have enough to eat.
Third, the ultimate goal of all this investment is to displace people from the labor pool. People are annoying. They demand things like fair pay, safe working conditions and sick leave.
Who will buy the results of all this AI if there’s no one left with a job?
Lastly, the externalities of all this investment are indefensible. For example, air and water pollution and rising utility prices.
We’re bouldering towards a future with a few thousand wealthy people where everyone else lives in worker housing, owns nothing and is the next incarnation of brick kiln workers on wealthy estates.
A trillion in a money market fund @ 5% is 50B/year.
Over the course of a few years (so as to not drive up the price of politicians too quickly) one could buy the top N politicians from most countries. From there on out your options are many.
After a decade or so you can probably have your trillion back.
AI will absolutely solve these problems, by inventing nimble AI native companies that disrupt these business models into the Stone Age, worker by worker, role by role. Death by a billion cuts.
Completely off topic but when fonts are the size they are in this article I can't read it, the words don't register as words above a certain size. I assume this isn't normal or it wouldn't be so common...
Nuclear power plants usually only cost about twice as much as projected in phase II planning. IT projects are sort of open-ended. Interestingly, the simulator I was involved with (many decades ago) at a nuclear power plant came within about 10% of initial projections. The last "scan uploads for viruses" project I worked on was about 20x - 40x more expensive than projected. (Unfortunately the person who suggested we just pay a third party for this service was fired.) The bit with projecting cost and schedules for nuke plants is to ignore any initial costing and multiply the phase II planning estimate by 2 or 4.
From consulting point of view, a common joke we use to tell, because customers demand a Ferrari, but are only willing to pay for the development costs of a Fiat.
How much money do you need to build a skyscraper on top of a tarpit? None because it’s not possible. The whole stack has to be gutted. I can do it but no one wants to listen so I’ll do it myself.
Almost nobody who works in software development is a licensed professional engineer. Many are even self-taught, and that includes both ICs and managers. I'm not saying this is direct causation but I do think it odd that we are so utterly dependent on software for so many critical things and yet we basically YOLO its development compared to what we expect of the people who design our bridges, our chemicals, our airplanes, etc.
Licensing and the perceived rigor it signifies is irrelevant to whether something can be considered "professional engineering." Engineering exists at the intersection of applied science, business and economics. So most software projects can be YOLO'd simply because the economics permit it, but there are others where the high costs necessitate more rigor.
For instance, software in safety-critical systems is highly rigorously developed. However that level of investment does not make sense for run-of-the-mill internal LOB CRUD apps which constitute the vast majority of the dark matter of the software universe.
Software engineering is also nothing special when it comes to various failure modes, because you'll find similar examples in other engineering disciplines.
No big surprise. Taking a shitty process and "digitalizing" it will lead to a shitty process just on computers in the best case, in the worst case everything collapses.
What a joke blaming the IT community for not doing better, when most businesses refuse to look past anything but shipping features as fast as they can. "We take security and reliability very seriously", until management gets wind of the price tag. Guess what areas always get considered last and cut first. We all know.
But sure, blame the community at large, not the execs who make bad decisions in the name of short-term profits, then fail upward with golden parachutes into their next gig.
And definitely don't blame government for not punishing egregious behavior of corporations. Don't worry, you get a year of free credit monitoring from some scummy company who's also selling your data. Oh and justice was served to the offending corp, who got a one-time fine of $300k, when they make billions every quarter.
Maybe if we just outsource everything to AI, consultants, and offshore sweat shops things will improve!
> blaming the IT community for not doing better, when most businesses refuse to look past anything but shipping features
IT != software engineering. IT is a business function that manages a company's internal information. Software engineering is a time-tested process of building software.
A lot of projects fail because management thinks that IT is a software engineering department. It is not. It never was, and it never will be. Its incentives will never be aligned such that software engineering projects are set up for success.
The success rate of implementing software outside of IT and dragging them along later is much higher than implementing it through IT from the beginning.
I understand, but also, IT is an umbrella term for a wider industry that includes your definition of IT, software, and anything adjacent. If you read the article, you'll see it's the latter being referenced, and why I chose that terminology.
> The success rate of implementing software outside of IT and dragging them along later is much higher than implementing it through IT from the beginning.
That's a pretty strong statement. Isn't that the opposite of why the devops movement started?
managing software requirements and the corresponding changes to user/group/process behaviors is by far the hardest part of software development, and it is a task no one knows how to scale.
absent understanding, large companies engage in cargo cult behaviors: they create a sensible org chart, produce a gannt chart, have the coders start whacking code, presumably in 9 months a baby comes out.
There is no such thing as ‘simplicity science’ that can be
directly applied when dealing with IT problems. However, many insights of complexity science are applicable to solving real world IT problems. People love simple solutions. However Simple is a scam, https://nocomplexity.com/simple-is-a-scam/
There are no generic, simple solutions for complex IT challenges. But there are ground rules for finding and implementing simple solutions. I have created a playbook to prevent IT diasasters, The art and science towards simpler IT solutions see https://nocomplexity.com/documents/reports/SimplifyIT.pdf
The concerning aspect of all of this isn't the financial cost of these blunders, and what happened in the past. It is the increasing risk to human lives, and what will happen in the future. The Boeing case was only a sign of what's to come.
Take "AI", for instance. It is being adopted left and right as if it's the solution to all of our problems, and developers, managers, and executives are increasingly relying on it. Companies and governments love it because it can cut costs and potentially make us more productive. Most people are more than happy to offload their work to it, do a cursory check of its output, if at all, and ship it or publish it and claim the work as their own. After all, it can always serve as a scapegoat if things do go wrong, and its manufacturers can brush these off as user errors. Ultimately there is no accountability.
These are all components of a recipe for greater disasters. As these tools are adopted in industries where safety is paramount, in the military, etc., it's only a matter of time for more human lives to be impacted. Especially now when more egomaniacal autocrats are taking power, and surrounding themselves with yes-people. Losing face and admitting failure is not part of their playbook. We're digging ourselves into a hole we might not be able to get out of.
because most people are incompetent, produce incidental complexity to satisfy internal urge for busy work, and under-think the problem, greatly... that's why, and don't get me started on the morons who run the show
There are great big software projects and shitty ones.
IRCTC, UPI being examples of great ones.
Insurance and RTO being shitty ones.
I had an insurance deadline very near and the payment was not showing up in the insurance providers dashboard so had to do it twice and now it was still not showing up.
Also I have faced huge problems with getting the learner's licence online.
I got my name wrong in the drivers card and never went to correct it. However most of the problems the problems there were administrative not software. I agree both irctc and upi come to mind first as the successes. Insurance, could be a particular company as i never faced such problem. Websites for Tax filing and even starting an msme has been smooth.
The article is kind of dumb. eg it hangs its hat on the Phoenix payroll system, which
> Phoenix project executives believed they could deliver a modernized payment system, customizing PeopleSoft’s off-the-shelf payroll package to follow 80,000 pay rules spanning 105 collective agreements with federal public-service unions. It also was attempting to implement 34 human-resource system interfaces across 101 government agencies and departments required for sharing employee data.
So basically people -- none of them in IT, but rather working for the government -- built something extraordinarily complex (80k rules!), and then are like wow, it's unforeseen that would make anything downstream at least equally as complex. And then the article blames IT in general. When this data point tells us that replacing a business process that used to require (per [1]) 2,000 pay advisors to perform will be complex. While working in an organization that has shit the bed so thoroughly that paying its employees requires 2k people. For an organization of 290k, so 0.6% of headcount is spent on paying employees!
IT is complex, but incompetent people and incompetent orgs do not magically become competent when undertaking IT projects.
Also too, making extraordinarily complex things they shouting the word "computer" at them like you're playing D&D and it's a spell does not make them simple.
Slightly related but unpopular opinion I have: I think software, broadly, today is the highest quality its ever been. People love to hate on some specific issues concerning how the Windows file explorer takes 900ms to open instead of 150ms, or how sometimes an iOS 26 liquid glass animation is a bit janky... we're complaining about so much minutia instead of seeing the whole forest.
I trust my phone to work so much that it is now the single, non-redundant source for keys to my apartment, keys to my car, and payment method. Phones could only even hope to do all of these things as of like ~4 years ago, and only as of ~this year do I feel confident enough to not even carry redundancies. My phone has never breached that trust so critically that I feel I need to.
Of course, this article talks about new software projects. And I think the truth and reason of the matter lies in this asymmetry: Android/iOS are not new. Giving an engineering team agency and a well-defined mandate that spans a long period of time oftentimes produces fantastic software. If that mandate often changes; or if it is unclear in the first place; or if there are middlemen stakeholders involved; you run the risk of things turning sideways. The failure of large software systems is, rarely, an engineering problem.
But, of course, it sometimes is. It took us ~30-40 years of abstraction/foundation building to get to the pretty darn good software we have today. It'll take another 30-40 years to add one or two more nines of reliability. And that's ok; I think we're trending in the right direction, and we're learning. Unless we start getting AI involved; then it might take 50-60 years :)
Kind of strange take as though unique to software. Every sector that is large has issues since ambitious projects stretch what can be done by the current management and organizational practices. All software articles like these hark back to some mythical world smaller in scope/ambition/requirements. Humanity moves forward
* Construction and Engineering -- Massive cost overruns and schedule delays on large infrastructure projects (e.g., public transit systems, bridges)
* Military and Government -- Defense acquisition programs notorious for massive cost increases and years-long delays, where complex requirements and bureaucratic processes create an environment ripe for failure.
* Healthcare -- Hospital system implementations or large research projects that exceed budgets and fail to deliver intended efficiencies, often due to resistance to change and poor executive oversight.
> IT projects suffer from enough management hallucinations and delusions without AI adding to them.
Software is also incredibly hard, the human mind can understand the physical space very well but once we're deep into abstractions it simply struggles to keep up with it.
It is easier to explain how to build a house from scratch to virtually anyone than a mobile app/Excel.
I came to opposite conclusions. Technology is pretty easy, people are hard and the business culture we have fostered in the last 40 years gets in the way of success.
> We are left with only a professional and personal obligation to reemphasize the obvious: Ask what you do know, what you should know, and how big the gap is between them before embarking on creating an IT system. If no one else has ever successfully built your system with the schedule, budget, and functionality you asked for, please explain why your organization thinks it can
translation: "leave it to us professionals". Gate-keeping of this kind is exactly how computer science (the one remaining technical discipline still making reliable progress) could become like all of the other anemic, cursed fields of engineering. people thinking "hey im pretty sure I could make a better version of this" and then actually doing it is exactly how progress happens. I hope nobody reads this article and takes it seriously
There are 2 big problems with large software projects:
1. Connecting pay to work - estimates (replanning is learning, not failure)
2. Connecting work to pay - management (the world is fractal-like, scar tissue and band-aids)
I do not pre-suppose that there are definite solutions to these problems - there may be solutions, but getting there may require going far out of our way. As the old farmer said "Oh, I can tell you how to get there, but if I was you, I wouldn't start from here"
1. Pay to Work - someone is paying for the software project, and they need to know how much it will cost. Thus estimates are asked for, an architecture is asked for, and the architecture is tied to the estimates.
This is 'The Plan!'. The project administrators will pick some lifecycle paradigm to tie the architecture to the cost estimate.
The implementation team will learn as they do their work. This learning is often viewed as failure, as the team will try things that don't work.
The implementation team will learn that the architecture needs to change in some large ways and many small ways. The smallest changes are absorbed in regular work. Medium and Large changes will require more time (thus money); This request for more money will be viewed as a failure in estimation and not as learning.
2. Work to Pay - as the architecture is implemented, development tasks are completed. The Money People want Numbers, because Money People understand how they feel about Numbers. Also these Numbers will talk to other Numbers outside the company. Important Numbers with names like Share Price.
Thus many layers of management are chartered and instituted. The lowest layer of management is the self-managed software developer. The software developer will complete development tasks related to the architecture, tied to the plan, attached to the money (and the spreadsheets grew all around, all around [0]).
When the developer communicates about work, the Management Chain cares to hear about Numbers, but sometimes they must also involve themselves in failures.
It is bad to fail, especially repeated failures at the same kind of task. So managers institute rules to prevent failures. These rules are put in a virtual cabinet, or bureau. Thus we have Rules of the Bureau or Bureaucracy. These rules are not morally bad or good; not factually incorrect or correct, but whenever we notice them, they feel bad; We notice the ones that feel bad TO US. We are often in favor of rules that feel bad to SOMEONE ELSE. You are free to opt out of this system, but there is a price to doing so.
----
Too much writing, I desist from decoding verbiage:
Thus it is OK for individuals to learn many small things, but it is a failure for the organization to learn large things. Trying to avoid and prevent failure is viewed as admirable; trying to avoid learning is self-defeating.
This is a direct result of using leetcode in interviews instead of any other, more legitimate tests like winning a tekken 1v1. Have you ever seen a good developer who’s not good at real video games?
If companies had hired real developers instead of cosplayers who are stunlocked with imposter syndrome as the only candidate pool with time to memorize a rainbow table of arbitrary game trivia questions and answers, things would actually work.
The biggest reason is developer ego. Devs see their code as artwork an extension of themselves, so it's really hard to have critical conversations about small things and they erupt into holy wars. Off hand:
* Formatting
* Style
* Conventions
* Patterns
* Using the latest frameworks or whats en-vogue
I think where I've seen results delivered effectively and consistently is where there is a universal style enforced, which removes the individualism from the codebase. Some devs will not thrive in that environment, but instead it makes the code a means-to-the-end, rather than being-the-end.
As far as I can see in the modern tech industry landscape, virtually everyone has adopted style guides and automatic formatting/linting. Modern languages like Go even bake those decisions into the language itself.
I'd consider managing that stuff essentially table-stakes in big orgs these days. It doesn't stop projects from failing in highly expensive and visible ways.
Eh, you're not wrong, but management failures tend to be a bigger issue. On the hierarchy of ways software projects fail, developer ego is kind of upper-middle of the pack rather than top. Delusional, ignorant, or sadistic leadership tends to be higher.
It's a great article, until the end where they say what the solution would be. I'm afraid that the solution is: build something small, and use it in production before you add more features. If you need to make a national payroll, you have to use it for a small town with a payroll of 50 people first, get the bugs worked out, then try it with a larger town, then a small city, then a large city, then a province, and then and only then are you ready to try it at a national level. There is no software development process which reliably produces software that works at scale without doing it small, and medium sized, first, and fixing what goes wrong before you go big.
> If you need to make a national payroll, you have to use it for a small town with a payroll of 50 people first, get the bugs worked out, then try it with a larger town, then a small city, then a large city, then a province, and then and only then are you ready to try it at a national level.
At a large box retail chain (15 states, ~300 stores) I worked on a project to replace the POS system.
The original plan had us getting everything working (Ha!) and then deploying it out to stores and then ending up with the two oddball "stores". The company cafeteria and surplus store were technically stores in that they had all the same setup and processes but were odd.
When the team that I was on was brought into this project, we flipped that around and first deployed to those two several months ahead of the schedule to deploy to the regular stores.
In particular, the surplus store had a few dozen transactions a day. If anything broke, you could do reconciliation by hand. The cafeteria had single register transaction volume that surpassed a surplus store on most any other day. Furthermore, all of its transactions were payroll deductions (swipe your badge rather than credit card or cash). This meant that if anything went wrong there we weren't in trouble with PCI and could debit and credit accounts.
Ultimately, we made our deadline to get things out to stores. We did have one nasty bug that showed up in late October (or was it early November?) with repackaging counts (if a box of 6 was $24 and if purchased as a single item it was $4.50 ... but if you bought 6 single items it was "repackaged" to cost $24 rather than $27) which interacted with a BOGO sale. That bug resulted in absurd receipts with sales and discounts (the receipt showed you spent $10,000 but were discounted $9,976 ... and then the GMs got alerts that the store was not able to make payroll because of a $9,976 discount ... one of the devs pulled an all nighter to fix that one and it got pushed to the stores ).
I shudder to think about what would have happened if we had tried to push the POS system out to customer facing stores where the performance issues in the cafeteria where worked out first or if we had to reconcile transactions to hunt down incorrect tax calculations.
You could have, in principle, implemented the new system to be able to run in "dummy mode" alongside the existing system at regular stores, so that you see that it produces the 'same' results in terms of what the existing system is able to provide.
Which is to say, there is more than one approach to gradual deployment.
33 replies →
>> We did have one nasty bug that showed up in late October (or was it early November?)
Having worked in Ecommerce & payment processing, where this weekend is treated like the Superbowl, birth of your first child and wedding day all rolled into one, a nasty POS bug at this time of year would be incredibly stressful!
1 reply →
There is no solution because these projects are not failing because of technical reasons.
They are failing because of political scheming and bunch of people wanting to have a finger in the pie - "trillions spent" - I guess no one would mind earning couple millions.
Then you have "important people" who want to be important and want to have an opinion on font size and that some button should be 12px to the right because they are "important" it doesn't matter for the project but they have to assert the dominance.
You have 2 or 3 companies working on a project? Great! now they will be throwing stuff over the fence to limit their own cost and blame others while trying to get away with as least work done cashing most money possible.
That is how sausage is made. Coming up with "reasonable approach" is not the solution because as soon as you get different suppliers, different departments you end up with power/money struggle.
> They are failing because of political scheming and bunch of people wanting to have a finger in the pie - "trillions spent" - I guess no one would mind earning couple millions.
Not (necessarily) wrong, but if you start small, Important People may not want to bother with something that is Unimportant and may leave things alone so something useful and working can get going. If you starting with an Important project then Important People will start circling it right away.
4 replies →
Political corruption is like environmental radiation: a viable fix is never 'just get rid of political corruption'*. It's an environmental constant that needs to be handled by an effective approach.
That said, parent's size- and scope-iterative approach also helps with corruption, because corruption metastasizes in the time between {specification} and {deliverable}.
Shrink that, by tying incremental payments to working systems at smaller scales, and you shrink the blast radius for failure.
That said, there are myriad other problems the approach creates (encouraging architectures that won't scale to the final system, promoting duct taped features on top of an existing system, vendor-to-vendor transitions if the system builder changes, etc).
But on the whole, the pros outweigh the cons... for projects controlled by a political process (either public or private).
That's why military procurement has essentially landed on spiral development (i.e. iterative demonstrated risk burn-down) as a meta-framework.
* Limit political corruption, to the extent possible in a cost efficient manner, sure
> There is no solution because these projects are not failing because of technical reasons.
There is no technical solution. There are systems and governance solutions, if the will is there to analyze and implement them.
That's what works for products, not software systems. Gradual growth inevitably results in loads of technical debt that is not paid off as Product adds more feature requests to deliver larger and larger sales contracts. Eventually you want to rewrite to deal with all the technical debt, but nobody has enough confidence to say what is in the codebase that's important to Product and what isn't, so everybody is afraid and frozen.
Scale is separately a Product and Engineering question. You are correct that you cannot scale a Product to delight many users without it first delighting a small group of users. But there are plenty of scaled Engineering systems that were designed from the beginning to reach massive scale. WhatsApp is probably the canonical example of something that was a rather simple Product with very highly scaled Engineering and it's how they were able to grow so much with such a small team.
> Gradual growth inevitably results in loads of technical debt.
Why is this stated as though it's some de facto software law? The argument is not whether it's possible to waterfall a massive software system. It clearly is possible, but the failure ratios have historically been sufficiently uncomfortable to give rise to entirely different (and evidently more successful) project development philosophies, especially when promoters were more sensitive to the massive sums involved (which in my opinion also helps explains why so many wasteful government examples). The lean startup did not appear in a vacuum. Do things that don't scale did not become a motto in these parts without reason. In case some are still confused about the historical purpose of these benign sounding advices, no, they weren't originally addressed at entrepreneurs aiming to run "lifestyle" businesses.
7 replies →
Software is a component of a product, if not the product itself. Treating software like a product, besides being the underlying truth, also means it makes sense to manage it like one.
Technical debt isn’t usually the problem people think it is. When it does become a problem, it’s best to think of it in product-like terms. Does it make the product less useful for its intended purpose? Does it make maintenance or repair inconvenient or costly? Or does it make it more difficult or even impossible to add competitive features or improvements? Taking a product evaluation approach to the question can help you figure out what the right response is. Sometimes it’s no response at all.
2 replies →
Designing or intending a system to be used at massive scale is not the same as building and deploying it so that it only initially runs at that massive scale.
That's just a recipe for disaster, "We don't even know if we can handle 100 users, let's now force 1 million people to use the system simultaneously." Even WhatsApp couldn't handle hundreds of millions of users on the day it was first released, nor did it attempt to. You build out slowly and make sure things work, at least if you're competent and sane.
19 replies →
> Gradual growth inevitably results in loads of technical debt that is not paid off as Product adds more feature requests to deliver larger and larger sales contracts.
This isn't technical debt, necessarily. Technical debt is a specific thing. You probably mean "an underlying design that doesn't perfectly map to what ended up being the requirements". But then the world moves on (what if a regulation is added that ruins your perfect structure anyway?) and you can't just wish for perfect requirements. Or not in software that interacts directly with the real world, anyway.
You have to design for scale AND deploy gradually
1 reply →
Yes, it can be very difficult to add “scale” after the fact, once you already have a lot of data persisted in a certain way.
There's nothing wrong with technical debt per se. As with all debt, the problem is incurring it without a plan or means to pay it off. Debt based financing is the engine of modern capitalism.
Gradual growth to large scale implies an ongoing refactoring cost--that's the price of paying off the technical debt that got you started and built initial success in small scale rollouts. As long as you keep "servicing" your debt (which can include throwing away an earlier chunk and building a more scalable replacement with the lessons learned), you're doing fine.
The magic words here to management/product owners is "we built it that way the first time because it got us running quickly and taught us what we need to know to build the scalable version. If we'd tried to go for the scalable version first, we wouldn't have known foo, bar and baz, and we'd have failed and wouldn't have learned anything."
Gradual growth =/= many tacked on features. Many tacked on features =/= technical debt. Technical debt =/= "everybody is afraid and frozen." Those are merely often correlated, but not required.
Whatsapp is a terrible example because it's barely a product; Whatsapp is mostly a free offering of goodwill riding on the back of actual products like Facebook Ads. A great example would be a product like Salesforce, SAP, or Microsoft Dynamics. Those products are forced to grow and change and adapt and scale, to massive numbers doing tons of work, all while being actual products and being software systems. I think such products act as stark rebukes of what you've described.
we get paid to add to it, we don’t get paid to take away
1 reply →
[dead]
The dominant factor is: there is a human who understands the entire system.
That is vastly easier to achieve by making a small, successful system, which gets buy in from both users and builders to the extent that the former pay sufficient money for the latter to be invested in understanding the entire system and then growing it and keeping up with the changes.
Occasionally a moon shot program can overcome all of that inertia, but the “90% of all projects fail” is definitely overrepresented in large projects. And the Precautionary Principle says you shouldn’t because the consequences are so high.
This works for Clojure, git and even Linux. It seems there's a human who understands the entire system and decides what's allowed to be added to it. But these things are meant to be used by technical people.
The non-technical people I know might want to use Linux but stay on Windows or choose Mac OS because it's more straightforward. I use Windows+WSL at work even though I would like to use a native Linux distribution.
I know someone who created a MUD game (text online game) and said to him I wanted to make one with a browser client. He said something we could translate as "Good, you can have all the newbies." Not only was he right that a MUD should be played with a MUD client like tintin++, but making a good browser client is harder than it seems and that's time not spent making content for the game or improving the engine.
My point is that he was un uncomprimising person who refused adding layers to a project because they would come at a cost which isn't only time or dollars but also things like motivation and focus.
4 replies →
You will never get to the moon by making a faster and faster bus.
I see a lot of software with that initial small scale "baked into it" at every level of its design, from the database engine choice, schema, concurrency handling, internal architecture, and even the form design and layout.
The best-engineered software I've seen (and written) always started at the maximum scale, with at least a plan for handling future feature extensions.
As a random example, the CommVault backup software was developed in AT&T to deal with their enormous distributed scale, and it was the only decently scalable backup software I had ever used. It was a serious challenge with its competitors to run a mere report of last night's backup job status!
I also see a lot of "started small, grew too big" software make hundreds of silly little mistakes throughout, such as using drop-down controls for selecting users or groups. Works great for that mom & pop corner store customer with half a dozen accounts, fails miserably at orgs with half a million. Ripping that out and fixing it can be a decidedly non-trivial piece of work.
Similarly, cardinality in the database schema has really irritating exceptions that only turn up at the million or billion row scale and can be obscenely difficult to fix later. An example I'm familiar with is that the ISBN codes used to "uniquely" identify books are almost, but not quite unique. There are a handful of duplicates, and yes, they turn up in real libraries. This means that if you used these as a primary key somewhere... bzzt... start over from the beginning with something else!
There is no way to prepare for this if you start with indexing the book on your own bookshelf. Whatever you cook up will fail at scale and will need a rethink.
Counterpoint: the idea that your project will be the one to scale up to the millions of users/requests/etc is hubris. Odds are, your project won't scale past a scale of 10,000 to 100,000. Designing every project to scale to the millions from the beginning often leads to overengineering, adding needless complexity when a simpler solution would have worked better.
Naturally, that advice doesn't hold if you know ahead of time that the project is going to be deployed at massive scale. In which case, go ahead and implement your database replication, load balancing, and failover from the start. But if you're designing an app for internal use at your company of 500, well, feel free to just use SQLite as your database. You won't ever run into the problems of scale in this app, and single-file databases have unique advantages when your scale is small.
Basically: know when huge scale is likely, and when it's immensely UNlikely. Design accordingly.
3 replies →
You can by making a bigger and bigger rocket though.
While I think this is good advice in general, I don’t think your statement that “there is no process to create scalable software” holds true.
The uk gov development service reliably implements huge systems over and over again, and those systems go out to tens of millions from day 1. As a rule of thumb, the parts of the uk govt digital suite that suck are the parts the development service haven’t been assigned to yet.
The Swift banking org launches reliable features to hundreds of millions of users.
There’s honestly loads of instances of organisations reliably implementing robust and scalable software without starting with tens of users.
The uk government development service as you call it is not a service. It’s more of a declaration of process that is adhered to across diverse departments and organisations that make up the government. It’s usually small teams that are responsible for exploring what a service is or needs and then implementing it. They are able to deliver decent services because they start small, design and user test iteratively and only when there is a really good understanding of what’s being delivered do they scale out. The technology is the easy bit.
1 reply →
UK GDS is great, but the point there is that they're a crack team of project managers.
People complain about junior developers who pass a hiring screen and then can't write a single line of code. The equivalent exists for both project management and management in general, except it's much harder to spot in advance. Plus there's simply a lot of bad doctrine and "vibes management" going on.
("Vibes management": you give a prompt to your employees vaguely describing a desired outcome and then keep trying to correct it into what you actually wanted)
> and those systems go out to tens of millions from day 1
I like GDS (I even interviewed with them once and saw their dev process etc) but this isn't a great example. Technically GDS services have millions of users across decades, but people e.g. aren't constantly applying for new passports every day.
A much better example I think is Facebook's rollout of Messenger, which scaled to billions of actual users on day 1 with no issues. They did it by shipping the code early in the Facebook app, and getting it to send test messages to other apps until the infra held, and then they released Messenger after that. Great test strategy.
GDS's budget is about £90 million a year or something. There are many contracts that are still spent on digital, for example PA consulting for £60 million (over a few years) which do a lot of the gov.uk home-office stuff, and their fresh grads they hire cost more to the government than GDS's most senior staff...
SWIFT? Hold my beer. SWIFT did not launch anything substantial since its startup days in early 70-ies.
Moreover, their core tech did not evolve that far from that era, and the 70-ies tech bros are still there through their progeniture.
Here's an anecdote: The first messaging system built by SWIFT was text-based, somewhat similar to ASN.1.
The next one used XML, as it was the fad of the day. Unfortunately, neither SWIFT nor the banks could handle 2-3 orders of magnitude increase in payload size in their ancient systems. Yes, as engineers, you would think compressing XML would solve the problem and you would by right. Moreover, XML Infoset already existed, and it defined compression as a function of the XML Schema, so it was somewhat more deterministic even though not more efficient than LZMA.
But the suits decided differently. At one of the SIBOS conferences they abbreviate XML tags, and did it literally on paper and without thinking about back-and-forth translation, dupes, etc.
And this is how we landed with ISO20022 abberviations that we all know and love: Ccy for Currency, Pmt for Payment, Dt for Date, etc.
3 replies →
> https://www.amazon.com/How-Big-Things-Get-Done/dp/0593239512
This is what https://www.amazon.com/How-Big-Things-Get-Done/dp/0593239512 advocates too: start small, modularize, and then scale. The example of Tesla's mega factory was particular enticing.
> A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.
Gall’s law wins again.
> I'm afraid that the solution is: build something small, and use it in production before you add more features.
Gall's Law:
> A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.[8]
* https://en.wikipedia.org/wiki/John_Gall_(author)#Gall's_law
Came here to say this. I still think that Linus Torvalds has the most profound advice to building a large, highly successful software system:
"Nobody should start to undertake a large project. You start with a small trivial project, and you should never expect it to get large. If you do, you'll just overdesign and generally think it is more important than it likely is at that stage. Or worse, you might be scared away by the sheer size of the work you envision. So start small, and think about the details. Don't think about some big picture and fancy design. If it doesn't solve some fairly immediate need, it's almost certainly over-designed. And don't expect people to jump in and help you. That's not how these things work. You need to get something half-way useful first, and then others will say "hey, that almost works for me", and they'll get involved in the project."
-- Linux Times, October 2004.
I don't think this applies in any way to companies contracted to build a massive system for a government with a clear need. Linus is talking about growing a greenfield open-source project, which may or may not ever be used by anyone.
In contrast, if your purpose is "we need to manage our country's accounting without pen and paper", that's a clear need for a massive system. Starting work on this by designing a system that can solve accounting for a small firm is not the right way to go. Instead, you have to design with the end-goal in mind, since that's what you were paid for. But, you don't launch your system to the entire country at once: you first use this system designed for a country in a small shop, to make sure it actually handles the small scale well, before gradually rolling out to more and more people.
2 replies →
No Linus Torvalds would stand against people in projects from article, he would slam the door and quit.
Those projects that author pointed out are basically political horror stories. I can imagine how dozens of people wanted to have a cut on money in those projects or wanted to push things because “they are important people”.
There is nothing you can do technically to save such projects and it is NOT an IT failure.
This is a really dense paragraph of lifetime-accumulated wisdom in that single quote.
Works with implementations and not APIs though.
A bad API can constrain your implementation and often can't be changed once it's in use by loads of users. APIs should be right from day one if possible.
1 reply →
While I like the "start small and expand" strategy better than the "big project upfront", this trades project size for project length and often that is no better:
- It gives outside leadership types many more opportunities to add requirements later. This is nice is they are things missed in the original design, but it can also lead to massive scope creep.
- A big enough project that gets done the "start small and expand" way can easily grow into a decade-plus project. For an extreme example, see the multi-decade project by the Indian rail company to gradually replace all its railways to standard gauge. It works fine if you have the organisational backing for a long duration, but the constant knowledge leaks from people leaving, retiring, getting promoted, etc can be a real problem for a project like that. Especially in fields where the knowledge is the product, like in software.
- Not every project can feasibly start small.
> If you need to make a national payroll, you have to use it for a small town with a payroll of 50 people first, get the bugs worked out, then try it with a larger town, then a small city, then a large city, then a province, and then and only then are you ready to try it at a national level.
You could also try to buy some off-the-shelf solutions? Making payroll, even for very large organisations, isn't exactly a new problem.
As a corollary I would also suggest: subsidiarity.
> Subsidiarity is a principle of social organization that holds that social and political issues should be dealt with at the most immediate or local level that is consistent with their resolution.
(from https://en.wikipedia.org/wiki/Subsidiarity)
If you solve more problems more locally, you don't need that many people at the national level, thus making payroll there is easier.
I think you'll find that is exactly what people do. However, payroll solutions are highly customized for every individual company and even business unit. You don't buy a payroll software in a box, deploy it, and now you have payroll. Instead, you pay a payroll software company, they come in and get information about your payroll systems, and then they roll out their software on some of your systems and work with you to make sure their customizations worked etc. There's rarely any truly "off-the-shelf" software in B2B transactions, especially the type of end-user solutions that also interact with legal systems.
Also, governments are typically at least an order of magnitude larger than the largest companies operating in their countries, in terms of employees. So sure, the government of Liechtenstein has fewer employees than Google overall, but the US government certainly does not, and even Liechtenstein probably has way more government employees than Google employees in their country.
I work at a small shop, I'm a big advocate of giving customers the 0.1 version and then talking it out what they want. It's often not exactly what they asked for at the start ... but it often is better in the end.
It's hard to hit the target right the first time.
Yes. Also the same applies to companies. There should not be companies that are growing to $100 million revenue while losing money on a gamble that they will eventually get big enough to succeed. Good first, big later.
$100M maybe. But pretty much all tech needs an initial investment before you can start making profit. It takes a lot of development before you can get a product that anyone would want to pay for.
>It's a great article, until the end where they say what the solution would be. I'm afraid that the solution is: build something small, and use it in production before you add more features.
I think that is true for a lot of projects. But I'm not sure it is realistic to incrementally develop a control system for a nuclear reactor or an air traffic control system.
See also Gall's Law:
"All complex systems that work evolved from simpler systems that worked"
Not saying you're wrong, but I wonder what is the differentiating factor for software? We can build huge things like airliners, massive bridges and buildings without starting small.
Incremental makes less sense to me when you want to go to mars. Would you propose to write the software for such a mission in an incremental fashion too?
Yet for software systems it is sometimes proposed as the best way.
> We can build huge things like airliners, massive bridges and buildings without starting small.
We did start small with all of those things. We developed rigorous disciplines around engineering, architecture, material sciences. And people died along the way in the thousands[0][1]
People are still dying from those failures; The Boeing 737 MAX 9 crash was only two years ago.
> Incremental makes less sense to me when you want to go to mars.
This is yet another reason why a manned Mars mission will be exceedingly dangerous NOT a strike against incremental development and deployment.
[0] https://en.wikipedia.org/wiki/List_of_building_and_structure...
[1] https://en.wikipedia.org/wiki/List_of_accidents_and_incident...
All of the things you mentioned are designed and tested incrementally. Furthermore software has been used on Mars missions in the past, and that software was also developed incrementally. It’s proposed as the best way because it’s a way that has a track record
1 reply →
That sounds like the way nature handles growth and complexity: slowly and over long time scales. Assume there will be failures, don't die and keep trying.
When you bite off too much complexity at once you end up not shipping anything or building something brittle.
You just need: Plan -> Implement -> Test -> Repeat
Whether you are creating software, games or whatever, these iterations are foundational. How these steps look like in detail of course depends on the project itself.
That's the ideal, but a lot of these big problems can't start small because the problem they have is already big. A lot of government IT programs are set up to replace existing software and -processes, often combining a lot of legacy software's jobs and the manual labor involved.
If you have something like a tax office or payroll, they need to integrate decades of legislation and rules. It's doable, but you need to understand the problem (which at those scales is almost impossible to fit in one person's head) and more importantly have diligent processes and architecture to slowly build up and deploy the software.
tl;dr it's hard. I have no experience in anything that scale, I've been at the edges of large organizations (e.g. consumer facing front-ends) for most of my career.
The accounting, legal and business process requirements are vastly different at different scales, different jurisdictions, different countries, etc.
There's a crazy amount of complexity and customizability in systems like ERPs for multinational corporations (SAP, Oracle).
When you start with a small town, you'll have to throw most of everything away when moving to a different scale.
That's true for software systems in general. If major requirements are bolted on after the fact, instead of designed into the system from the beginning, you usually end up with an unmaintainable mess.
Knowing that the rules for your first small deployment are not the same as the rules for everywhere, is valuable for designing well. Trying to implement all of those sets of rules in your initial deployment, is not a good idea. There is a general principle that you shouldn't code the abstraction until you've coded for the concrete example 2 or 3 times, because otherwise you won't make the right abstraction. Looking ahead is not the same as starting with the whole enchilada for your initial deployment.
I do get concerned when the solution is to be more strict on the waterfall process.
I used to believe there were some worlds in which waterfalls are better: where requirements are well know in advance and set in stone. I’ve since come to realize neither of those assumptions is ever true.
What works at small scale possibly won't work at a huge scale.
But what hasn’t even been tried at a small scale definitely won’t work at a huge scale.
Which is absolutely true, and a reason to try at medium scale second. But what doesn't work at small scale, almost certainly won't work at huge scale.
Imagine if the only way to build a skyscraper was to start with a dollhouse and keep tacking extensions and pieces onto it until. Imagine if the only way to build a bridge across San Francisco bay was to start with pop sickle sticks.
The very specific example you chose: payroll, shows how it can be difficult to incrementally step from small to huge. As you grow from town to national, you will run into all the disadvantages without really hitting the advantages. I feel that incremental does help you move from one level to one just a few above. But only if there are enough customers at these starting levels exactly.
When developing for towns, you will have all small random subsets of the variations imposed by year after year of legal changes BUT small sales. You will have to implement niche variations in arbitrary aspects for all the towns you have to support AND you will not have the customer size on which to amortize this work. Each new customer will bring a new arbitrary set of legal aspects to be met. Each new customer may be arbitrarily difficult to support.
By the time you reach national, you will have already covered most of the historical legal quirks - but that will have been done in one kludgy manner after another - and then you will hit one more set of legal quirks at the level of national organizations (some of them will have their very own laws). You will now have a very large budget to finalize things but you will be burdened by an illogical software base.
So I agree that you will need experience and subject matter experts that have worked at the various levels. BUT, now that you have this experience you now know the degree of flexibility that is required (you know where and what needs to be variable and quirk-friendly and how far the quirks can go = "any size") as well as size-related issues (mailing, transaction, user support volume) and you can now plan for all this AS YOU restart a new development from scratch. Because at this new "master" level you need both systematic flexibility AND relience at size.
Payroll is exactly the kind of topic where "adding features" will be "fun" - I mean bewildering - while you learn, but probably economically difficult to manage, until it kills you "as you climb up"?
You will be killed by a large software project that can afford to hire out a bunch of your subject matter specialists (or hires new ones) and uses them in a "from scratch" project. If you are lucky, this large project will be from the same company but only if you are lucky.
Now. AFTER you have done the one top level project - for one country -, you will probably be in a good situation to sell service to all kinds of organizations. Because you now have a system in which you can implement ridiculous quirks without breaking everything. And if you have done the job just right, you can onboard smaller customers (towns) economically enough that they can afford your solution.
That's different from where you deploy your solution first. Sure, deploy a national-design solution first at a subset of the target employees - although that does impose more requirements still: now you need to coexist with the legacy solutions. Which would be another hard to meet handicap when developing for towns first.
I study and write quite a bit of tech history. IMHO from what I've learned over the last few years of this hobby, the primary issue is quite simple. While hardware folks study and learn from the successes and failures of past hardware, software folks do not. People do not regularly pull apart old systems for learning. Typically, software folks build new and every generation of software developers must relearn the same problems.
I work at $FANG, every one of our org's big projects go off the rails at the end of the project and there's always a mad rush at the end to push developers to solve all the failures of project management in their off hours before the arbitrary deadline arrives.
After every single project, the org comes together to do a retrospective and ask "What can devs do differently next time to keep this from happening again". People leading the project take no action items, management doesn't hold themselves accountable at all, nor product for late changing requirements. And so, the cycle repeats next time.
I led and effort one time, after a big bug made it to production after one of those crunches that painted the picture of the root cause being a huge complicated project being handed off to offshore junior devs with no supervision, and then the junior devs managing it being completely switched twice in the 8 month project with no handover, nor introspection by leadership. My manager's manager killed the document and wouldn't allow publication until I removed any action items that would constrain management.
And thus, the cycle continues to repeat, balanced on the backs of developers.
Of course the reason it works this way is that it works. As much as we'd like accountability to happen on the basis of principle, it actually happens on the basis of practicality. Either the engineers organize their power and demand a new relationship with management, or projects start going so poorly that necessity demands a better working relationship, or nothing changes. There is no 'things get better out from wisdom alone' option; the people who benefit from improvements have to force the hand of the people who can implement them. I don't know if this looks like a union or something else but my guess is that in large part it's something else, for instance a sophisticated attempt at building a professional organization that can spread simple standards which organizations can clearly measure themselves against.
I think the reasons this hasn't happened is (a) tech has moved too fast for anyone to actually be able to credibly say how things should be done for longer than a year or two, and (b) attempts at professional organizations borrowed too much from slower-moving physical engineering and so didn't adapt to (a). But I do think it can be done and would benefit the industry greatly (at the cost of slowing things down in the short term). It requires a very 'agile' sense of standards, though.. If standards mean imposing big constraints on development, nobody will pay attention to them.
76 replies →
For one project I got so far as to include in the project proposal some outcomes that showed whether or not it was a success: quote from the PM “if it doesn’t do that then we should not have bothered building this”. They objected to even including something so obviously required in the plan.
Waste of my bloody time. Project completed, taking twice as many devs for twice as long, great success, PM promoted. Doesn’t do that basic thing that was the entire point of it. Nobody has ever cared.
Edit to explain why I care: there was a very nice third party utility/helper for our users. We built our own version because “only we can do amazing direct integration with the actual service, which will make it far more useful”. Now we have to support our worse in-house tool, but we never did any amazing direct integration and I guarantee we never will.
Glad to hear that $FANG has similar incompetency as every other mid-tier software shop I've ever worked in. Your project progression sounds like any of them. Here I was thinking that $FANG's highly-paid developers and project management processes were actually better than average.
3 replies →
Reminds me of the military. Senior leaders often have no real idea of what is happening on the ground because the information funneled upward doesn't fit into painting a rosy report. The middle officer ranks don't want to know the truth because it impacts their careers. How can executives even hope to lead their organizations this way?
9 replies →
For how much power they have over team organization and processes, software middle management has nearly no accountability for outcomes.
27 replies →
^ This. Not at FAANG, but I am too familiar with this.
This is why software projects fail. We lowly developers always take the blame and management skates. The lack of accountability among decision makers is why things like the UK Post Office scandals happen.
Heads need to be put on pikes. Start with John Roberts, Adam Crozier, Moya Greene, and Paula Vennells.
So much of the world, especially the world we see today around corporate leadership and national politics makes much more sense once you realize this fundamental law:
People who desire infinite power only want it because it gives them the power to avoid consequences, not because they want both the power and the consequences.
The people who believe that with great power comes great consequences are exactly the people who don't want great power because they don't want the weight of those consequences. The only people who see that bargain and think "sign me up!" are the ones who intend to drop the consequences on the floor.
I was a developer for a bioinformatics software startup in which the very essential 'data import' workflow wasn't defined until the release was in the 'testing' phase.
> wouldn't allow publication until I removed any action items that would constrain management.
Thats what we call blameless culture lol
“I love deadlines. I love the whooshing noise they make as they go by.” ― Douglas Adams
Did they go off the rails at the end, or deadlines force acknowledging that the project is not where folks want it to be?
That said, I think I would agree with your main concern, there. If they question is "why did the devs make it so that project management didn't work?" Seems silly not to acknowledge why/how project management should have seen the evidence earlier.
Where I now work, in the government, all the devs are required to be part project managers. It’s a huge breath of fresh air. The devs are in all the customer meetings, assist in requirements gathering, and directly coach the customers as necessary to keep pushing the work towards completion.
This came about because our work isn’t too diverse but the requirements are wildly diverse and many of the customers have no idea how to achieve the proper level of readiness. I do management in an enterprise API project for a large organization.
Both happy and sad to know that the sh*t show is pretty much the same in FAANG as any regular corporate environment.
There are many pressures and this is all about a lack of transparent honesty about what the real priorities are. Getting the project done properly may be #1 priority but there's priority 0 and 0.1 and others which are unspoken because they don't sound good.
Obviously you work at AMZN. This is the most Amazonish HN comment I’ve ever seen.
1 reply →
[dead]
I've also considered a side-effect of this. Each generation of software engineers learns to operate on top of the stack of tech that came before them. This becomes their new operating floor. The generations before, when faced with a problem, would have generally achieved a solution "lower" down in the stack (or at their present baseline). But the generations today and in the future will seek to solve the problems they face on top of that base floor because they simply don't understand it.
This leads to higher and higher towers of abstraction that eat up resources while providing little more functionality than if it was solved lower down. This has been further enabled by a long history of rapidly increasing compute capability and vastly increasing memory and storage sizes. Because they are only interacting with these older parts of their systems at the interface level they often don't know that problems were solved years prior, or are capable of being solved efficiently.
I'm starting to see ideas that will probably form into entire pieces of software "written" on top of AI models as the new floor. Where the model basically handles all of the mainline computation, control flow, and business logic. What would have required a dozen Mhz and 4MB of RAM to run now requires TFlops and Gigabytes -- and being built from a fresh start again will fail to learn from any of the lessons learned when it was done 30 years ago and 30 layers down.
Yeah, people tend to add rather than improve. It's possible to add into lower levels without breaking things, but it's hard. Growing up as a programmer, I was taught UNUX philosophy as a golden rule, but there are sharp corners on this one:
To do a new job, build afresh rather than complicate old programs by adding new "features".
2 replies →
It's the "Lava Flow" antipattern [1][2] identified by the Gang of Five [3], "characterized by the lava-like 'flows' of previous developmental versions strewn about the code landscape, but now hardened into a basalt-like, immovable, generally useless mass of code which no one can remember much if anything about.... these flows are often so complicated looking and spaghetti-like that they seem important but no one can really explain what they do or why they exist."
[1] http://antipatterns.com/lavaflow.htm
[2] https://en.wikipedia.org/wiki/Lava_flow_(programming)
[3] http://antipatterns.com/
> While hardware folks study and learn from the successes and failures of past hardware, software folks do not
I've been managing, designing, building and implementing ERP type software for a long time and in my opinion the issue is typically not the software or tools.
The primary issue I see is lack of qualified people managing large/complex projects because it's a rare skill. To be successful requires lots of experience and the right personality (i.e. low ego, not a person that just enjoys being in charge but rather a problem solver that is constantly seeking a better understanding).
People without the proper experience won't see the landscape in front of them. They will see a nice little walking trail over some hilly terrain that extends for about a few miles.
In reality, it's more like the Fellowship of the Rings trying to make it to Mt Doom, but that realization happens slowly.
> In reality, it's more like the Fellowship of the Rings trying to make it to Mt Doom, but that realization happens slowly.
And boy to the people making the decisions NOT want to hear that. You'll be dismissed as a naysayer being overly conservative. If you're in a position where your words have credibility in the org, then you'll constantly be asked "what can we do to make this NOT a quest to the top of Mt Doom?" when the answer is almost always "very little".
5 replies →
I think part of it is that reading code isn't a skill that most people are taught.
When I was in grad school ages ago, my advisor told me to spend a week reading the source code of the system we were working with (TinyOS), and come back to him when I thought I understood enough to make changes and improvements. I also had a copy of the Linux Core Kernel with Commentary that I perused from time to time.
Being able to dive into an unknown codebase and make sense of where the pieces are put together is a very useful skill that too many people just don't have.
Being good at reading code isn't a skill that helps large software projects stay on rails.
It's more about being good at juggling 1000 balls at the same time. It's 99.9% of the time a management problem, not a software problem.
1 reply →
Reading (someone else's) code is a whole lot harder than writing it. Which is unfortunate because I do an awful lot of it at work.
I'm curious, what does "read code" mean to you? What does that skill look like and how is it taught?
3 replies →
Oh TinyOS! Did my thsis on that!
This is one part of the issue. The other major piece of this that I've seen over more than two decades in industry is that most large projects are started by and run by (but not necessarily the same person) non-technical people who are exercising political power, rather than by technical people who can achieve the desired outcomes. When you put the nexus of power into the hands of non-technical people in a technical endeavor you end up with outcomes that don't match expectations. Larger scale projects deeply suffering from "not knowing what we don't know" at the top.
If this were true all of the time then the fix would be simple - only have technical people in charge. My experience has shown that this (only technical people in charge) doesn't solve the problem.
6 replies →
Sometimes giving people what they want can be bad for them; management wants cheap compliant workers, management gets cheap compliant workers, and then the projects fall apart in easily predictable and preventable ways.
Because such failures are so common management typically isn’t punished when they do so it’s hard to keep interests inline. And because many producers are run on a cost plus basis there can be a perverse incentive to do a bad job, or at least avoid doing a good one.
I'm not entirely sure what you mean with "technical people" but it seems that you may not appreciate the problems that "non-technical people" try to tackle.
Do your two decades of experience cover both sides?
2 replies →
I have a theory that the churn in technology is by design. If a new paradigm, new language, new framework comes out every so many years, it allows the tech sector to always want to hire new graduates for lower salaries. It gives a thin veneer of we want to always hire the person who has X when really they just do not want to hire someone with 10 years of experience in tech but who may not have picked up X yet.
I do not think it is the only reason. The world is complex, but I do think it factors into why software is not treated like other engineering fields.
Constantly rewriting the same stuff in endless cycles of new frameworks and languages gives an artificial sense of productivity and justifies its own existence.
If we took the same approach to other engineering, we'd be constantly tearing down houses and rebuilding them just because we have better nails now. It sure would keep a lot of builders employed though.
7 replies →
The problem with that is that it would require a huge amount of coordination for it to be by design. I think it's better to look on it as systemic. Which isn't to say there aren't malign forces contributing.
3 replies →
There are rational explanations for this. When software fails catastrophically, people almost never die (considering how much software crashes every day). When hardware fails catastrophically, people tend to die, or lose a lot of money.
There's also the complexity gap. I don't think giving someone access to the Internet Explorer codebase is necessarily going to help them build a better browser. With millions of moving parts it's impossible to tell what is essential, superfluous, high quality, low quality. Fully understanding that prior art would be a years long endeavor, with many insights no doubt, but dubious.
I would boil this down to something else, but possibly related: project requirements are hard. That's it.
> While hardware folks study and learn from the successes and failures of past hardware, software folks do not. People do not regularly pull apart old systems for learning.
For most IT projects, software folks generally can NOT "pull apart" old systems, even if they wanted to.
> Typically, software folks build new and every generation of software developers must relearn the same problems.
Project management has gotten way better today than it was 20 years, so there is definitely some learnings that have been passed on.
A CIO once told me with Agile we didn’t need requirements. He thought my suggestion to document the current system before modifying was a complete waste of time. Instead he made all the developers go through a customer service workshop, how to handle and communicate with customers. Cough cough… most developers do not talk with customers. Instead where we worked developers took orders from product and project people whose titles changed every year but they operated with the mindset of a drill sergeant. My way or the highway.
2 replies →
"While hardware folks study and learn from the successes and failures of past hardware, software folks do not." Couldn't be further from the truth. Software folks are obsessed with copying what has been shown to work to the point that any advance quickly becomes a cargo cult (see microservices for example).
Once you've worked in both hardware and software engineering you quickly realize that they only superficially similar. Software is fundamentally philosophy, not physics.
Hardware is constrained by real world limitations. Software isn't except in the most extreme cases. Result is that there is not a 'right' way to do any one thing that everyone can converge on. The first airplane wing looks a whole lot like a wing made today, not because the people that designed it are "real engineers" or any such BS, but because that's what nature allows you to do.
Software doesn't operate in some magical realm outside of the physical world. It very much is constrained by real world limitations. It runs on the hardware that itself is limited. I wonder if some failures are a result of thinking it doesn't have these limitations?
4 replies →
> Software folks are obsessed with copying what has been shown to work to the point that any advance quickly becomes a cargo cult
Seems more accurate to say they are obsessed with copying "what sounds good". Software industry doesn't seem to copy what works, rather what sounds like it'd work, or what sounds cool.
If they copied what works software would just be faster by default, because very often big established tools are replaced by something that offers similar featurage, but offers it at a higher FPS.
I disagree. At least at the RTL level they're very similar. You don't really deal with physics there, except for timing (which is fairly analogous with software performance things like hard real-time constraints).
> Result is that there is not a 'right' way to do any one thing that everyone can converge on.
Are you trying to say there is in hardware? That must be why we have exactly one branch predictor design, lol
> The first airplane wing looks a whole lot like a wing made today, not because the people that designed it are "real engineers" or any such BS, but because that's what nature allows you to do.
"The first function call looks a whole lot like a function call today..."
4 replies →
What you and the GP said are not mutually exclusive. Software engineers are quick to drink every new Kool-Aid out there, which is exactly why we’re so damned blind to history and lessons learned before.
In my experience, a lot of the time the people who COULD be solving these issues are people who used to code or never have. The actual engineers who might do something like this aren't given authority or scope and you have MBAs or scrum masters in the way of actually solving problems.
I think this is too simple. First of all, hardware people have high incentive to fully replace components and systems for many reasons. Replacement is also the only way they can fix major design mistakes. Software people constantly do fix bugs and design mistakes. There is certainly no strong culture to document or dig up former mistakes made, but it's not like they don't learn from mistakes, it's just a constant process. In contrast to hardware, there is usually no point in time to retrospect. The incentives to rejuvenate systems are low and if considered often seem expensive. Software engineers self motivation is often ill-minded, new devs feeling uncomfortable with the existing system and calling for something "modern". But if the time comes to replace the "legacy" systems, then you are right, no one looks back at the former mistakes and the devs that know them, are probably long gone. The question is whether we should ever replace an software system or focus more on gradual and active modernization. But the latter can be very hard, in hardware everything is defined, most of the time backed by standards, in software we usually don't have that, so complex interconnected systems rarely have sane upgrade paths.
Agree 100%.
I know a lot of people on here will disagree with me saying this but this is exactly how you get an ecosystem like javascript being as fragmented, insecure, and "trend prone" as the old school Wordpress days. It's the same problems over and over and every new "generation" of programmers has to relearn the lessons of old.
The difficulty lies in the fact that most software is quite cheap to generate very complex designs compared to hardware. For software designs treated similarly to hardware (such as in medical devices or at NASA), you do gain back those benefits at great expense.
Most of the time, there's no need to study anything. Any experienced software engineer can tell you about a project they worked on with no real requirements, management constantly changing their mind, etc.
How do you study software history? Most of the lessons seem forever locked away behind corporate walls - any honest assessments made public will either end careers or start lawsuits
IME, "Why systems fail" almost always boils down to a principal-agent problem. This is another way of expressing the Mungerism "show me the incentive, I'll show you the outcome".
Systems that "work" tend to have some way of correcting for or mitigating the principal agent problem by aligning incentives.
I'd also point out that hardware is a much older discipline, in terms of how long it's been operating at scale. It's had more time to formalize and crystallize. Intel is 56 years old. Google is 27.
Some consequences of NOT learning from prior successes and failures: (a) no more training for the next generation of developers/engineers (b) fighting for the best developers, and this manifests in leetcode grinding (c) decrease in cooperation among team mates, etc.
This is an interesting distinction, but it ignores the reasons software engineers do that.
First, hardware engineers are dealing with the same laws of physics every time. Materials have known properties etc.
Software: there are few laws of physics (mostly performance and asymptotic complexity). Most software isnt anywhere near those boundaries so you get to pretend they dont exist. If you get to invent your own physics each time, yeah the process is going to look very different.
For most generations of hardware, you’re correct, but not all. For example, high-k was invented to mitigate tunneling. Sometimes, as geometries shrink, the physics involved does change.
I think there is a ton more nuance, but can still be explained by a simple observation, which TFA hints at: "It's the economics, stupid."
Engineering is the intersection of applied sciences, economics and business. The economics aspect is almost never recognized and explains many things. Projects of other disciplines have significantly higher costs and risks, which is why they require a lot more rigor. Taking hardware as example, one bad design decision can sink the entire company.
On the other hand, software has economics that span a much more diverse range than any other field. Consider:
- The capital costs are extremely low.
- Development can be extremely fast at the task level.
- Software, once produced, can be scaled almost limitlessly for very cheap almost instantly.
- The technology moves extremely fast. Most other engineering disciplines have not fundamentally changed in decades.
- The technology is infinitely flexible. Software for one thing can very easily be extended for an adjacent business need.
- The risks are often very low, but can be very high at the upper end. The rigor applied scales accordingly. Your LoB CRUD app going down might bother a handful of people, so who cares about tests? But your flight control software better be (and is) tested to hell and back.
- Projects vary drastically in stacks, scopes and risk profiles, but the talent pool is more or less common. This makes engineering culture absolutely critical because hiring is such a crapshoot.
- Extreme flexibility also masks the fact that complexity compounds very quickly. Abstractions enable elegant higher-level designs, but they mask internal details that almost always leak and introduce minor issues that cause compounding complexity.
- The business rules that software automates are extremely messy to begin with (80K payroll rules!) However, the combination of a) flexibility, b) speed, and c) scalability engender a false sense of confidence. Often no attempt is made at all to simplify business requirements, which is probably where the biggest wins hide. This is also what enables requirements to shift all the time, a prime cause for failures.
Worse, technical and business complexity can compound. E.g. its very easy to think "80K payroll rules linearly means O(80K) software modules" and not "wait, maybe those 80K payroll rules interact with each other, so it's probably a super-linear growth in complexity." Your architecture is then oriented towards the simplistic view, and needs hacks when business reality inevitably hits, which then start compounding complexity in the codebase.
And of course, if that's a contract up for bidding, your bid is going to be unsustainably low, which will be further depressed by the competitive bidding process.
If the true costs of a project -- which include human costs to the end users -- are not correctly evaluated, the design and rigor applied will be correspondingly out of whack.
As such I think most failures, in addition to regular old human issues like corruption, can be attributed to an insufficient appreciation of the economics involved, driven primarily by overindexing on the powers of software without an appreciation of the pitfalls.
As someone who's learning programming right now, do you have any suggestions on how one would go about finding and studying these successes and failures?
First, failures aren’t always obvious, and second, studying them isn’t either. This would likely need to be a formalized course. Still…
If people want to know why Microsoft hated DOS and wanted to kill it with Xenix, then OS/2, then Windows, and then NT it would be vital to know that it only came about as a result of IBM wanting a 16bit source-compatible CP/M which didn’t yet exist. Then, you would likely want to read Dissecting DOS to see what limits were imposed by DOS.
For other stuff, you would start backwards. Take the finished product and ask what the requirements were, then ask what the pain points are, then start digging through the source and flowcharting/mapping it. This part is a must because programs are often too difficult to really grok without some kind of map/chart.
There is likely an entire discipline to be created in this…
One place to look is https://thedailywtf.com/
The things people are talking about in this thread are less to do with the practice of programming, and more to do with the difficulties of managing (and being managed, within) an engineering organization.
You'll learn all of this for yourself, on the job, just via experience.
To be cynical, what's the point? You'll get employed and forced to be a part of them by circumstances.
Your company's root priorites are probably at odds with writing good software.
One Japanese company, not going to name names, kept trying to treat software as a depreciating asset. I didn't really understand well but the long and short was that fixing things that were supposed to be "done" was bad for the accounting. New things, however were good.
How can you run a software company like that? But they did and got the kind of outcome you'd expect. Japan made the laws this way and gets software to match.
Indeed.
That's why we see every now and then "new" programming paradigms which were once obsolete.
I think this is a downstream of effect of there being no real regulation or professional designations in software which leads to every company and team being wildly different leading to no standards leaving no time for anything but crunching since there are no barriers restricting your time, so nobody spends time doing much besides shipping constantly.
I’ve read one tech history book and I really enjoyed it. any you recommend?
Hardcore Software, Fire in the Valley, Life Under the Sun… there are many.
I was so annoyed when I found out the OTP library and realized we’ve been reinventing things for 20+ years
Software just feels so much more ephemeral than hardware. I haven't yet met a single 'old software enthusiast' in my life, yet there are so many enthusiasts for older hardware.
I am both a hardware and software enthusiast. Tons of DOS, Windows, and OS/2 software hanging around. While I don’t use them everyday, I do use them. From pre-Microsoft Visio to WordStar and MS Works for DOS, the applications are simple, powerful, and pleasing to use. While I don’t recommend anyone pull out Zenith 8bit and fire up COBOL-80 or LISP-80, they are interesting. Testing yourself in 64k is quite a challenge.
The retro community is huge and varied. If it exists, someone is really into it.
I have a pet passion for an old simulation language called Dynamo. I think you will find people passionate about LISP and people that care about COBOL, and C is already multiple decades old.
Doesn't retro gaming count?
Yes, and it's because there aren't very many textbook ways to do software engineering, because it's evolving too fast to reach consensus.
... are you saying that hardware projects fail less than software ones? just building a bridge is something that fails on a regular occurence all over the world. Every chip comes with list of erratas longer than my arm.
Software folks treat their output as if it's their baby or their art projects.
Hardware folks just follow best practices and physics.
They're different problem spaces though, and having done both I think HW is much simpler and easier to get right. SW is often similar if you're working on a driver or some low-level piece of code. I tried to stay in systems software throughout my career for this reason. I like doing things 'right' and don't have much need to prove to anyone how clever I am.
I've met many SW folks who insist on thinking of themselves as rock stars. I don't think I've ever met a HW engineer with that attitude.
Because the software market is bigger and more competitive; hardware is mature.
What are the silver bullets... I mean, best practices that keep getting ignored?
Having consulted on government projects - especially huge projects spanning dozens of government departments, what I have learnt is that the project is doomed right from the start. The specifications are written in such a way that it is impossible to get a working software which can address all of the millions (yes, literally) of specifications.
For instance, I had the opportunity to review an RFP put out by a state government for software to run a single state government. The specifications stated that a SINGLE software should be used for running the full administration of all of the departments of the government - including completely disparate things such as HR, CCTV management, AI enabled monitoring of rodents and other animals near/at warehouses, all healthcare facilities, recruitment, emergency response services etc...
ONE SOFTWARE for ALL of these!
There isn't a single company in the world who can write software to monitor rodents, hospital appointment booking, general payroll, etc. And since the integration required was so deep, it would be impossible to use existing best-of-breed software.. and everything has to be written from scratch.
How is such a software project ever going to suceeed?
I just fed the above into Claude Code and it one-shotted this in 5 minutes. Already doing $3B ARR after lunch.
Of course! This is actually very straightforward and easy, what you need is just:
- One MongoDB collection (`government_stuff`) to store employees, rodents, cardiac arrest surgeries and other items as JSON
- Core `/handler` API that forwards all requests to ChatGPT, figuring out if you're tracking a rodent or processing payroll
- AI Vision to analyze CCTV feeds for rodents, monitors employee productivity and verify hospital equipment status
- Blockchain layer for transparency (every rodent sighting minted as NFT).
Estimated timeline: 2 weeks, 1 junior developer. Cost: ~$10k including token credits. Should I start implementing the main.js?
1 reply →
This touches on the absolutely vital issue of domain knowledge. Everybody understands that you're not supposed to have the same people handle sewer maintenance and preschool teaching because these are two entirely separate skillsets. To an extent you can also treat kindergartens and treatment plants as black boxes that consume money and produce desired services.
For people who don't know much about programs it's sort of natural to assume that software engineering works the same way. Put in money and specs, get back programs. But of course it doesn't work like that, because software dev is not a single skillset. To write useful programs, you have to know how to code and understand the environment in which the program will be used.
But can this software monitor patients via CCTV and see if any of them are about to faint and call ER proactively for them? No? then your bid for the project will be discarded! :)
What about the CCTV monitoring software needing to verify if there are women in a particular room and trigger an alarm when too many men enter the area - I am not kidding, but this was really in the spec!
To be fair, that’s a rare exception. Most government tenders are quite narrow in scope.
What I have found is that they’re written by people with zero knowledge of either the solution requirements or the technology! Combine that with zero profit motive and zero personal consequences, and you can end up with total nonsense even on projects with billion dollar budgets.
A state school department here put out a tender for wiring over two thousand schools with fibre, but the way the contract was stipulated only a single applicant could win the contract and most handle every single location across a thousand miles of territory. Hence, only the largest incumbent telco could possibly win… which they did… at 15x the cost of a bunch of local contractors doing the work. This cost something like a billion dollars to taxpayers.
The excuse of the guy writing the tender was “it’s easier for me to get one contract signed than fifty.”
He’s a public servant getting paid $50K. He’s got nothing else on, no other pressing needs or distractions, but he’s too busy, you see? So much easier to waste a billion dollars to save himself a few months of effort.
On some of the infamous large public IT project failures, you just have to look at who gets the contract, how they work, and what their incentives are. (For example, don't hire management consulting partner smooth talkers, and their fleet of low-skilled seat-warmers, to do performative hours billing.)
It's also hard when the team actually cares, but there are skills you can learn. Early in my career, I got into solving some of the barriers to software project management (e.g., requirements analysis and otherwise understanding needs, sustainable architecture, work breakdown, estimation, general coordination, implementation technology).
But once you're a bit comfortable with the art and science of those, big new challenges are more about political and environment reality. It comes down to alignment and competence of: workers, internal team leadership, partners/vendors, customers, and investors/execs.
Discussing this is a little awkward, but maybe start with alignment, since most of the competence challenges are rooted in mis-alignments: never developing nor selecting for the skills that alignment would require.
Right, it's largely politically and ego driven; a people not a software problem.
Large-scale software is always a people problem. The hard part in software is communication, not typing the code.
> Early in my career, I got into solving some of the barriers to software project management (e.g., requirements analysis and otherwise understanding needs, sustainable architecture, work breakdown, estimation, general coordination, implementation technology).
Was there any literature or other findings that you came across that ended up clicking and working for you that you can recommend to us?
I could blather for hours around this space. A few random highlights:
* The very first thing I read about requirements was Weinberg, and it's still worth reading. (Even if you are a contracting house, with a hopeless client, and you want to go full reactive scrum participatory design, to unblock you for sprints with big blocks of billable hours, not caring how much unnecessary work you do... at least you will know what you're not doing.)
* When interviewing people about business or technical, learn to use a Data Flow Diagram. You can make it accessible to almost everyone, as you talk through it, and answer all sorts of questions, at a variety of levels. There are a bunch of other system modeling tools you can use, at times, but do not underestimate the usefulness and accessibility of a good DFD.
* If you can (or have to) plan at all, find and learn to use a serious Gantt chart centric planning tool (work breakdown, dependencies, resource allocations, milestones), and keep it up to date (which probably includes having it linked with whatever task-tracking tool you use, but you'll usually also be changing it for bigger-picture reasons too). Even if you are a hardware company, with some hard external-dependency milestones, you will be changing things around those unmoveables. And have everyone work from the same source of truth (everyone can see the same Gantt chart and the task
* Also learn some kind of Kanban-ish board for tasking, and have it be an alternative view on the same data that's behind the Gantt view and the tasks/issues that people can/should/are working on at the moment, and anything immediately getting blocked.
* In a rare disruptive startup emergency, know when to put aside Gantt, and fall back to an ad hoc text file or spreadsheet of chaos-handling prioritization that's changing multiple times per day. (But don't say that your startup is always in emergency mode and you can never plan anything, because usually there is time for a Kanban board, and usually you should all share an understanding of how those tasks fit into a larger plan, and trace back to your goals, even if it's exploratory or reactive.)
* Culture of communicating and documenting, in low-friction, high-value, accessible ways. Respect it as team-oriented and professional
* Avoid routine meetings; make it easy to get timely answers and discussion, as soon as possible. This includes reconsidering how accessible upper leadership should be: can you get closer to being responsive to the needs of the work on the project (e.g., if anyone needs a decision from the director/VP/etc., then quickly prep and ask, maybe with an async message, but don't wait for weekly status meeting or to schedule time on their calendar).
* Avoid unnecessary process. Avoid performances.
* People need blocks of time when they can get flow. Sometimes for plowing through a big chunk of stuff that only requires basic competence, and sometimes when harder thinking is required.
* Be very careful with individual performance metrics. Ideally you can incentive everyone to be aligned towards team success, through monetary incentives (e.g., real equity for which they can affect the value) and through culture (everyone around you seems to work as a team, and you like that, and that inspires you). I would even start by asking if we can compensate everyone equally, shun titles, etc., and how close can we practically get to that.
* Be honest about resume-driven-development. It doesn't have to be a secret misalignment. Don't let it be motivated solely as a secret goal of job-hoppers that is then lied about, or it will probably be to the detriment of your company (and also, that person will job-hop, fleeing the mess they made). If you're going to use new resume keyword framework for a project, the whole team should be honest that, say, there's elements of wanting to potentially get some win from it, wanting to trial it for possible greater use and build up organizational expertise in it, and also that it's a very conscious and honest perk for the workers to get to use the new toy.
* Infosec is an unholy dumpster fire, throughout almost the entire field. Decide if you want to do better, and if so, then back it up with real changes, not CYA theatre and what someone is trying to sell you.
* LeetCode frat pledging interviews select for so much misaligned thinking, and also signals that you are probably just more of the same as the low bar of our field, and people shouldn't take you seriously when you try to tell them you want to do things better.
* Nothing will work well if people aren't aligned and honest.
5 replies →
Most of the examples here are big government IT projects. But it's unfair to single out software projects here. There are a lot of big government projects that fail or face long and expensive delays. A lot of public sector spending is like that. In fact, you'd be hard pressed to find examples where everything worked on time and on budget.
Mostly the issues are non technical and grounded in a lack of accountability and being too big to fail. A lot of these failures are failing top down. Unrealistic expectations, hand wavy leadership, and then that gets translated into action. Once these big projects get going and are burning big budgets and it's obvious that they aren't working, people get very creative at finding ways to tap into these budgets.
Here in Germany, the airport in Berlin was opened only a few years ago after being stuck in limbo a decade after it was supposed to open and the opening was cancelled only 2 weeks before it was supposed to happen. It was hilarious, they had signs all over town announcing how they were going to shut down the highway so the interior of the old airport could be transported to the new one. I kid you not. They were going to move all the check-in counters and other stuff over and then bang on it for a day or two and then open the airport. Politicians, project leadership, etc. kept insisting it was all fine right up until the moment they could not possibly ignore the fact that there was lots wrong with the airport and that it wasn't going to open. It then took a decade to fix all that. There's a railway station in Stuttgart that is at this point very late in opening. Nuclear plant projects tend to be very late and over budget too.
Government IT projects aren't that different than these. It's a very similar dynamic. Big budgets, decision making is highly political, a lack of accountability, lots of top down pretending it's going to be fine, big budgets and companies looking to tap into those, and a lot of wishful thinking. These are all common ingredients in big project failures.
The software methodology is the least of the challenges these projects face.
It is not just government. Private companies also have the same problem.
One reason why aws got so big is because it took months to get infrastructure to provision a virtual machine.
There's an obvious imminent selection bias with government projects because they're by nature subject to public scrutiny, plus the stakeholders are literally everyone. Private companies can waste billions internally and it'll never make it into the news.
In my first big job in a big legacy company, 30% of ongoing effort was "how to implement this feature which needs a database without a database".
We also paid some security company to use it as a proxy in front of our server to implement some server redirects because it was simpler than configuring our own servers. Simple one-liner conf changes were a week of emails with support staff.
something left out there with government though is incentive misaligned and hence corruption, which is smaller in a private scale (exists nonetheless)
I think if we look at the lack of accountability it's obvious that one major problem is that many of these projects do heavily rely on contract work. No company or gov in the world can supply the perfect brain- and manpower necessary on day one (on a huge and complex project that requires expert knowledge). So there is an prevalent delusion that talent just spawns at project kickoff and those people even care about what they do.
Maybe this is some artifact we carried over from the industrial era. We expect that complex machinery is built by expert companies over night and they just work, with a bit of maintenance and knowledge transfer. But software doesn't work like that.
Fundamentally this is not a statement about programming or software. It is a statement that management at almost all companies is abysmally inept and are hardly ever held to account.
Most sizeable software projects require understanding, in detail, what is needed by the business, what is essential and what is not, and whether any of that is changing over the lifetime of the project. I don't think I've ever been on a project where any of that was known, it was all guess work.
Management is always a huge problem, but software engineers left to their own devices can be just as bad.
I very rarely hear actual technical reasons for why a decision was made. They're almost always invented after the fact to retroactive justify some tool or design pattern the developer wanted to use. Capabilities and features get tacked on just because it's something someone wanted to do, not because they solve an actual problem or can be traced back to requirements in any meaningful way.
Frankly as an industry we could learn a lot from other engineering fields, aerospace and electrical engineering in particular. They aren't perfect, but in general they're much better at keeping technical decisions tied to requirements. Their processes tend to be too slow for our industry of course, but that doesn't mean there aren't lessons to be learned.
Post fact justification seems to be a 'feature' of most people's cognitive function, according to the latest research.
"The mind is just a bullshit maker".
1 reply →
Exactly this. Not just large software projects tend to fail often; also large architectural and infrastructure projects do. There are loads of examples, one famous one for instance is the Berlin Airport.
Management is bad at managing large projects. Whatever those projects are. In particular when third parties are involved that have a financial interest.
This is precisely the point of the article. I mean, it's right there at the top in that weird arrow-shaped infographic. It's _almost_ always a management issue.
Software projects fail because humans fail. Humans are the drivers of everything in our world. All government, business, culture, etc... it's all just humans. You can have a perfect "process" or "tool" to do a thing, but if the human using it sucks, the result will suck. This means that the people involved are what determines if the thing will succeed or fail. So you have to have the best people, with the best motivations, to have a chance for success.
The only thing that seems to change this is consequences. Take a random person and just ask them to do something, and whether they do it or not is just based on what they personally want. But when there's a law that tells them to do it, and enforcement of consequences if they don't, suddenly that random person is doing what they're supposed to. A motivation to do the right thing. It's still not a guarantee, but more often than not they'll work to avoid the consequences.
Therefore if you want software projects to stop failing, create laws that enforce doing the things in the project to ensure it succeeds. Create consequences big enough that people will actually do what's necessary. Like a law, that says how to build a thing to ensure it works, and how to test it, and then an independent inspection to ensure it was done right. Do that throughout the process, and impose some kind of consequence if those things aren't done. (the more responsibility, the bigger the consequence, so there's motivation commensurate with impact)
That's how we manage other large-scale physical projects. Of course those aren't guaranteed to work; large-scale public works projects often go over-budget and over-time. But I think those have the same flaw, in that there isn't enough of a consequence for each part of the process to encourage humans to do the right thing.
> Software projects fail because humans fail. Humans are the drivers of everything in our world.
Ah finally - I've had to scroll halfway down to find a key reason big software projects fail.
<rant>
I started programming in 1990 with PL/1 on IBM mainframes and for 35 years have dipped in and out of the software world. Every project I've seen fail was mainly down to people - egos, clashes, laziness, disinterest, inability to interact with end users, rudeness, lack of motivation, toxic team culture etc etc. It was rarely (never?) a major technical hurdle that scuppered a project. It was people and personalities, clashes and confusion.
</rant>
Of course the converse is also true - big software projects I've seen succeed were down to a few inspired leaders and/or engineers who set the tone. People with emotional intelligence, tact, clear vision, ability to really gather requirements and work with the end users. Leaders who treated their staff with dignity and respect. Of course, most of these projects were bland corporate business data ones... so not technically very challenging. But still big enough software projects.
Gez... don't know why I'm getting so emotional (!) But the hard-core sofware engineering world is all about people at the end of the day.
> big software projects I've seen succeed were down to a few inspired leaders and/or engineers who set the tone. People with emotional intelligence, tact, clear vision, ability to really gather requirements and work with the end users. Leaders who treated their staff with dignity and respect.
I completely agree. I would just like to add that this only works where the inspired leaders are properly incentivized!
> But I think those have the same flaw, in that there isn't enough of a consequence for each part of the process
If there was sufficient consequence for this stuff, no one would ever take on any risk. No large works would ever even be started because it would be either impossible or incredibly difficult to be completely sure everything will go to plan.
So instead we take a medium amount of caution and take on projects knowing it's possible for them to not work out or to go over budget.
If software engineers want to be referred to as "engineers" then they should actually learn about engineering failures. The industry and educational pipeline (formal and informal) as a whole is far more invested in butterfly chasing. It's immature in the sense that many people with decades of experience are unwilling to adopt many proven practices in large scale engineering projects because they "get in the way" and because they hold them accountable.
Surely you mean managers, right? Most developers I interact with would love to do things the right way, but there's just no time, we have to chase this week's priority!
1 reply →
> While hardware folks study and learn from the successes and failures of past hardware, software folks do not.
I guess that’s the real problem I have with SV’s endemic ageism.
I was personally offended, when I encountered it, myself, but that’s long past.
I just find it offensive, that experience is ignored, or even shunned.
I started in hardware, and we all had a reverence for our legacy. It did not prevent us from pursuing new/shiny, but we never ignored the lessons of the past.
Why do you find it offensive? It’s not personal. Someone who thought webvan was a great lesson in hubris could not have built an Instacart, right? Even evolution shuns experience, all but throwing most of it out each generation, with a scant few species as exceptions.
> Someone who thought webvan was a great lesson in hubris could not have built an Instacart, right?
Not at all. The mistake to learn from in Webvan's case was expanding too quickly and investing in expensive infrastructure all before achieving product-market fit. Not that they delivered groceries.
By the time you realise the error of your comment, you'll have reached the age where your opinion can be safely discarded.
5 replies →
I think you're mistaking the funding and starting of companies with the execution of their vision through software engineering -- the entire point of the article, and the OP.
This is a classic straw man argument, which depends on the assumption that all people of a certain age would think a certain way.
Also, your understanding of evolution is incorrect. All species on Earth are the results of an enormous amount of accumulated "experience", over periods of up to billions of years. Even the bacteria we have today took hundreds of millions of years to reach anything similar to their current form.
I have never seen an industry that works so hard to self-immolate.
right now the industry is spending billions / trillions of $ on to train A.I on badly written open source code.
in universities we teach kids - DSA - but never about to really think about scoping work nor even the unix principle of how software should compose, how to prevent errors etc. hell how many working professionals know about the 10 NASA principles & actually use them in practice ?
we encourage the bright kids to go work at places which are the cathedrals of complexity - but never seeking simple solutions. & again merchants of complexity get paid more - find it easier to find jobs etc.
the tragedy is the failures are documented, but also the fixes to failures as well.
soon enough we're gonna raise a whole generation who doesn't know how to make reliable, robust software from scratch cz of 'vibecoding'. then ultimately civilization collapses.
> the 10 NASA principles
Could you be more specific?
I think GP means NASA's 10 rules, but these are specifically for C development in mission-critical systems. There's been the odd software issue at NASA, but supposedly their 10 rules should make C projects much safer.
So, I'm not a dev nor a project manager but I found this article very enlightening. At the risk of asking a stupid question and getting a RTFM or a LMGTFY can anyone provide any simple and practical examples of software successes at a big scale. I work at a hospital so healthcare specific would be ideal but I'll take anything.
FWIW I have read The Phoenix Project and it did help me get a better understanding of "Agile" and the DevOps mindset but since it's not something I apply in my work routinely it's hard to keep it fresh.
My goal is to try and install seeds of success in the small projects I work on and eventually ask questions to get people to think in a similar perspective.
Unix and Linux would be your quintessential examples.
Unix was an effort to take Multics, an operating system that had gotten too modular, and integrate the good parts into a more unified whole (book recommendation: https://www.amazon.com/UNIX-History-Memoir-Brian-Kernighan/d...).
Even though there were some benefits to the modularity of Multics (apparently you could unload and replace hardware in Multics servers without reboot, which was unheard of at the time), it was also its downfall. Multics was eventually deemed over-engineered and too difficult to work with. It couldn't evolve fast enough with the changing technological landscape. Bell Labs' conclusion after the project was shelved was that OSs were too costly and too difficult to design. They told engineers that no one should work on OSs.
Ken Thompson wanted a modern OS so he disregarded these instructions. He used some of the expertise he gained while working on Multics and wrote Unix for himself (in three weeks, in assembly). People started looking over Thompson's shoulder being like "Hey what OS are you using there, can I get a copy?" and the rest is history.
Brian Kernighan described Unix as "one of" whatever Multics was "multiple of". Linux eventually adopted a similar architecture.
More here: https://benoitessiambre.com/integration.html
Are you equating success with adoption or use? I would say there are lot's of software that are widely used but are a mess.
What would be a competitor to linux that is also FOSS? If there's none, how do you assess the success or otherwise of Linux?
Assume Linux did not succeed but was adopted, how would that scenario look like? Is the current situation with it different from that?
2 replies →
This is a noble and ambitious goal. I feel qualified to provide some pointers, not because I have been instrumental in delivering hugely successful projects, but because I have been involved, in various ways, in many, many failed projects. Take what you will from that :-)
- Define "success" early on. This usually doesn't mean meeting a deadline on time and budget. That is actually the start of the real goal. The real success should be determined months or years later, once the software and processes have been used in a production business environment.
- Pay attention to Conways Law. Fight this at your peril.
- Beware of the risk of key people. This means if there is a single person who knows everything, you have a risk if they leave or get sick. Redundancy needs to be built into the team, not just the hardware/architecture.
- No one cares about preventing fires from starting. They do care about fighting fires late in the project and looking like a hero. Sometimes you just need to let things burn.
- Be prepared to say "no", alot. (This is probably the most important one, and the hardest.)
- Define ownership early. If no one is clearly responsible for the key deliverables, you are doomed.
- Consider the human aspect as equally as the technical. People don't like change. You will be introducing alot of change. Balancing this needs to be built into the project at every stage.
- Plan for the worst, hope for the best. Don't assume things will work the way you want them to. Test _everything_, always.
[Edit. Adding some items.]
>No one cares about preventing fires from starting. They do care about fighting fires late in the project and looking like a hero. Sometimes you just need to let things burn.
As a Californian, I hate this mentality so much. Why can't we just have a smooth release with minimal drama because we planned well? Maybe we could properly fix some tech debt or even polish up some features if we're not spending the last 2 months crunching on some showstopper that was pointed out a year ago.
I find it kind of hard to define success or failure. Google search and Facebook are a success right? And they were able to scale up as needed, which can be hard. But the way they started is very different from a government agency or massive corporation trying to orchestrate it from scratch. I don't know if you'd be familiar with this, but maybe healthcare.gov is a good example... it was notoriously buggy, but after some time and a lot of intense pressure it was dealt with.
The untold story is of landing software projects at Google. Google has landed countless software projects internally in order for Google.com to continue working, and the story of those will never reach the light of day, except in back room conversations never to be shared publicly. How did they go from internal platform product version one to version two? it's an amazing feat of engineering that can't be shown to the public, which is a loss for humanity, honestly, but capitalism isn't going to have it any other way.
4 replies →
I don't think you should focus on successful large projects. Generally you should consider that all big successes are outliers from a myriad of attempts. They have been lucky and you can't reproduce luck.
I'd like try to correct your course a bit.
DevOps is a trash concept, that had good intentions. But today it's just an industry cheatcode to fill three dev positions with a single one that is on pager duty. The good takeaways from it: Make people care that things work end to end. If Joe isn't caring about Bob's problems, something is off. Either with the process, or with the people.
Agile is a very loose term nowadays. Broadly spoken it's the opposite of making big up front plans and implement them in a big swipe. Agile wants to start small and improve it iteratively as needed. This tends to work in the industry, but the iterative time buckets have issues, some teams can move fast in 2 week cycles, others don't. The original agile movement also wanted to give back control and autonomy back to those who actually do stuff (devs and lower management). This is very well intended and highly functional, but is often buried or ignored in corporate environments. Autonomy is extremely valuable, it motivates people and fosters personal growth, but being backed by a skilled peers also creates psychological safety. One of the major complaints I hear about agile practices is that there are too many routines, meetings and other in person tasks with low value that keep you from working. This is really bad and in my perception was never intended, but companies love that shit. This part is about communication, make it easy for people to share and engage, while also keeping their focus hours high. Problems have to bubble up quickly and everyone should be motivated and able to help solving them. If you listen to agile hardliners, they will also tell you that software can't be reliably planned, you won't make deadlines, none of them, never. That is very true, but companies are unable to deal with it.
India's UPI (digital payments) is almost as big a scale as it gets, and it's pretty universally considered a success: https://en.wikipedia.org/wiki/Unified_Payments_Interface
I heard Direct File was pretty successful. Something like a 94% reported it as a positive experience.
If it makes anyone feel better, it's not just software:
https://en.wikipedia.org/wiki/Auburn_Dam
https://en.wikipedia.org/wiki/Columbia_River_Crossing
If you're 97% over budget, are you successful? https://en.wikipedia.org/wiki/Big_Dig
> If you're 97% over budget, are you successful?
I don't like this as a metric of success, because who came up with the budget in the first place?
If they did a good job and you're still 97% over then sure, not successful.
But if the initial budget was a dream with no basis in reality then 97% over budget may simply have been "the cost of doing business".
It's easier to say what the budget could be when you're doing something that has already been done a dozen times (as skyscraper construction used to be for New York City). It's harder when the effort is novel, as is often the case for software projects since even "do an ERP project for this organization" can be wildly different in terms of requirements and constraints.
That's why the other comment about big projects ideally being evolutions of small projects is so important. It's nearly impossible to accurately forecast a budget for something where even the basic user needs aren't yet understood, so the best way to bound the amount of budget/cost mismatch is to bound the size of the initial effort.
I just picked one metric from Wikipedia. It was also 22 years late.
2 replies →
I taught these issues several times in the graduate Software Engineering Course. Good resources are the Standish Report:
https://www.csus.edu/indiv/v/velianitis/161/chaosreport.pdf
Also, anything that T. Capers Jones wrote. The most comprehensive one of these books is this:
Estimating Software Costs: Bringing Realism to Estimating Hardcover ISBN-13978-0071483001
Many believe the official recognition of the crisis in developing software were the two NATO conferences in 1968 and 1969.
See the Wikipedia article on the History of Software Engineering.
There have been two small scale experimental comparisons of the waterfall formal model (requirements, design, code, test) and the more prototyping and agile method. They seem to have the same productivity in terms of lines per programmer-month but the formal method tends to produce larger software.
I've started calling it EDD - Executive Driven Development.
Senior stakeholders get into a room, decide they need xyz for the project to really succeed, push this down to managers who in turn try perform miracles with what little development resource they have. Very often they also have an offshore team who is only concerned with prolonging the contract as much as possible, rather than delivering. 2 weeks later senior stakeholders get back into the room...
I don't know any other industry where non-technical unqualified people can just decide everything in the way that it's done with software/IT
Oh they TRY to ... it's just that the "non-technical unqualified people" get brought to heal (usually) by regulations. I've been in the room where people have tried to force a decision and a PEng, Lawyer, or CA/CPA had to say "absolutely not". It happens all the time, which is why you NEED regulations.
It’s so “nice” to know, that trillions spent on AI not only won’t make this better, but it’ll make it significantly worse.
"Worse" won't even start to describe the economical crisis we will be in once the bubble bursts.
And although that, in itself, should be scary enough, it is nothing compared to the political tsunami and unrest it will bring in its wake.
Most of the Western world is already on shaky political ground, flirting with the extreme-right. The US is even worse, with a pathologically incompetent administration of sociopaths, fully incapable of coming up with the measures necessary to slow down the train of doom careening out of control towards the proverbial cliff of societal collapse.
If the societal tensions are already close to breaking point now, in a period of relative economical prosperity, I cannot start to imagine what they will be like once the next financial crash hits. Especially one in the multi trillion of dollars.
They say that humanity progresses through episodes of turmoil and crisis. Now that we literally have all the knowledge of the world at our fingertips, maybe it is time to progress past this inadequate primeval advancement mechanism, and to truly enter an enlightened age where progress is made from understanding, instead of crises.
Unfortunately, it looks like it's going to take monumental changes to stop the parasites and the sociopaths from making at quick buck at the expense of humanity.
Year zero now. Reset real estate prices due to sudden lack of demand.
Not really, by most indications AI seems to be an amplifier more than anything else. If you have strong discipline and quality control processes it amplifies your throughput, but if you don't, it amplifies your problems. (E.g. see the DORA 2025 report.)
So basically things will still go where they were always going to go, just a lot faster. That's not necessarily a bad thing.
>If you have strong discipline and quality control processes
you're placing a lot of faith on this if-statement. in an article pretty much say that we in fact lack strong discipline and quality control.
2 replies →
Yes, AI can help, but it won’t. That’s my point.
In practice, it will make people even less care or pay attention. These big disasters will be written by people without any skills using AI.
1 reply →
I mean, I can fart into a megaphone and it'll get amplified, too.
The failed UK NHS IT project, known as the National Programme for IT (NPfIT), cost the UK government over £10 billion and produced almost nothing of value. I'm surprised that didn't get a mention.
Again those bastards, Fujitsu, were involved. They even sued UK government and won £465 million settlement when their contract was cancelled. But, despite this and their complicity in covering up the failures of the Horizons posts office system, the UK government is still giving them fat contracts.
If senior managers can preside over a massive failure and walk away with a huge pension, there isn't much incentive for them to do better, is there?
The GOV.UK project is a rare success story for IT in the UK government. They need to take that team, give them big incentives to stay, and give them more projects. Why are we outsourcing to international companies that don’t give a shit when we have a wealth of talent at home? Why aren’t we investing in our own people?
The people who did the UK COVID app also did a good job, as far as I am aware. The lessons seems to be that it is better to employ a small, experienced and talented team and get out of their way, than outsource to a huge government contractor.
[dead]
Despite its overall failure, some parts of the infrastructure and national applications, such as the Summary Care Record and the Electronic Prescriptions Service, are considered to have survived and continue to be used
That doesn't seem much of a return on a £10 billion investment.
>Global IT spending has more than tripled in constant 2025 dollars since 2005, from US $1.7 trillion to $5.6 trillion, and continues to rise. Despite additional spending, software success rates have not markedly improved in the past two decades.
Okay but how much more software is used ? If IT spending has tripled since 2005 but we use 10x more software I'd say the trend is good.
Success rates imply a ratio. Constant dollars are adjusted.
Yes there is a lot more spending overall. But nothing improved quality wise, despite everyone in software somehow says they "make software better". (Which is phrased by people that don't do software, but own it.)
The point is not that the growth of IT spending is bad. That was just to show the scale of spending. The point of the article is that, a billion spent on software could well lead to a loss of hundred billion.
The difference between success and failure of large projects comes down to technical leadership. I've seen it time and time again. Projects that are managed by external consulting companies (name brand or otherwise) have a very poor track record of delivering. An in-house technical lead that is committed to the success of the project will always do better. And yes, this technical lead must have the authority to limit the scope of the system rewrite. Endless scope creep is a recipe for failure. Outside consulting firms will never say "No" to any new request - it means more business for them - their goals are not aligned with the client.
Do non-software projects succeed at a higher rate in any industry? I get the impression that projects everywhere go over time, over budget, and frequently get canceled.
How many bridges have you used that have collapsed? How much software have you used that has been broken or otherwise not served your interests? If we built the rest of society like we build software, millions of people would be dead.
The reason bridges don't fail often is because they over-build them. There's no obvious equivalent with software. One mistake in a large code base can make it fail.
Bridges to go ridiculously over budget and schedule all the time, however.
the UK Post Office scandal would be the equivalent of the Morandi bridge collapsing - the big, catastrophic failure you hope to see few times in your lifetime.
but bridges collapsing is not the only failure mode for non software projects. I know plenty of newly built houses that had serious issues with insulation, wiring, painting, heating infrastructure, etc.
Systematic decimation of test teams, elimination of test managers, and contemptuous treatment of the role of tester over the past 40 years has not yet led to a more responsible software industry. But maybe if we started burning testers at the stake all these problems will go away?
Many specialties were eliminated / absorbed over the past few decades. I started working almost 30 years ago. Today, I rarely see dedicated testers, just like I rarely see dedicated DBAs. Sysadmins went away with the "DevOps" movement. Now they are cloud engineers who are more likely to understand a vendor-specific implementation than networking fundamentals.
Except testers are needed. Testing is not merely a technical role. It's a social role. It's a commitment to think differently from the rest of the team. We do this because that provides insurance against complacency.
But, by the nature of testing, we testers are outsiders. No one is fully comfortable with a tester in the room, unless they are more afraid of failure than irritation.
Mandatory reference to Gall’s Law [0]:
> “A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.”
[0]: https://personalmba.com/galls-law/
So in the 1990s Canada failed to do a payroll system where they paid Accenture Canada $70M
Then in 2010s they spent $185M on a customized version of IBM's PeopleSoft that was managed directly by a government agency https://en.wikipedia.org/wiki/Phoenix_pay_system
And now in 2020s they are going to spend $385M integrating an existing SaaS made by https://en.wikipedia.org/wiki/Dayforce
That's probably one of the worst and longest software failures in history.
Oh, it's much more interesting than that. Phoenix started as an attempt to create a gun registry. Ottawa had a bunch of civil servants that'd be reasonably compotent at overseeing such a thing, but the government decided that it wanted to build it in Miramichi, New Brunswick. The relevant people refused to move to Miramichi, so the project was built using IBM contractors and newbies. The resulting fiasco was highly predictable.
Then when Harper came in he killed the registry mostly for ideological reasons.
But then he didn't want to destroy a bunch of jobs in Miramichi, so he gave them another project to turn into a fiasco.
> Then when Harper came in he killed the registry mostly for ideological reasons.
The registry was started mostly for ideological reasons.
1 reply →
> Canada failed to do a payroll system
New Zealand tried to do a new payroll system for teachers called Novopay which imploded spectacularly and is still creating grief. The system is now called EdPay (the government decided to take over the privately created system). The total cost of the debacle was north of $200M NZD. Somehow they managed to fail to replace a working system!
Software development, like most other things, are part of the same make-believe market, that we run our societies in, in most countries around the world. Lets face it, most of the big money in software is believe money, not actual proven value of a thing. The word "evaluation" sort of already tells us this. It's not fact checking "How much did they sell?" or "How many users bought access or a license?", it is "How much do we believe in the future of this thing?" and risky investment "How much could we make, if this thing takes off?".
For software, I am not sure this is helpful. Maybe we would develop way less trash software, if it was different. But then again we would probably still develop engagement farming software, because people would still use or buy that.
As someone that has seen technological solutions applied when they make no sense, I think the next revolution in business processes will be de-computerization. The trend has probably already started thank to one of the major cloud outages.
> de-computerization
I would think cloud-disconnectedness (eg. computers without cloud hosted services) would come far before de-computerization.
Can you please provide a few examples to get the gist of such trend?
Honest question.
> Phoenix project executives believed they could deliver a modernized payment system, customizing PeopleSoft’s off-the-shelf payroll package to follow 80,000 pay rules spanning 105 collective agreements with federal public-service unions.
Somehow I come away skeptical of the inevitable conclusion that Phoenix was doomed to fail and instead that perhaps they were hamstrung by architecture constraints dictated by assholes.
Wasn't the Agile movement kicked off by a group of people writing payroll software for Chrysler?
https://en.wikipedia.org/wiki/Chrysler_Comprehensive_Compens...
Payroll systems seem to be a massively complicated beast.
Arbitrary payroll is absurdly complicated. The trick is to not make it arbitrary - have a limited amount of stuff you do, and always have backdoors to manually pushing data through payroll.
You don't want to get me started on Agile.
My reaction also. 80K payroll rules!!! Without much prompt effort, I got about 350K Canada Federal Service employees (sorry if not correct).
Sounds like they put zero effort into simplifying those rules the first time around.
Now in the new project they put together a committee to attempt it
> The main objective of this committee also includes simplifying the pay rules for public servants, in order to reduce the complexity of the development of Phoenix's replacement. This complexity of the current pay rules is a result of "negotiated rules for pay and benefits over 60 years that are specific to each of over 80 occupational groups in the public service." making it difficult to develop a single solution which can handle each occupational groups specific needs.
7 replies →
Because you don’t just rewrite all your payroll systems with hundreds of variations in one go. That will never work. But they keep trying it.
You update the system for one small piece, while reconciling with the larger system. Then replace other pieces over time, broadening your scope until you have improved the entire system. There is no other way to succeed without massive pain.
> Finally, project cost-benefit justifications of software developments rarely consider the financial and emotional distress placed on end users of IT systems when something goes wrong.
Most users and most management of software projects live in denial that the norm is dystopia.
I can’t help think of any required and useful feature that has happened in computer usage since the early days.
Easier to swallow is that the user interface of desktop operating systems hasn’t changed fundamentally in many years, yet hardware requirements continue to grow.
But even the invention of a mouse requires excessive movement to move a pointer to click on something that a key combination could’ve done much more quickly. The original intention of the mouse was just as another device to use, not necessarily a primary device to direct the majority of workflow.
From a dark storage area I may someday again get out an early Sceptre gaming monitor from the DOS days.
I held on to it throughout the 1990's precisely because it was not a plug & play monitor and it was real good to install Windows with so nothing would interfere with higher resolution alternative graphics you were going to install later.
Now by the 21st century it was seldom seen but these were well-made and it still worked, however the most obsolete feature that got the most interest was the sleek aftermarket plastic accessory unit attached to the side of the monitor with those sticky 3M tacky pads that are so tenacious.
Yes, you've all seen it and remember it fondly, the mouse holder.
Kind of like a custom cup holder that fits the standard mouse perfectly, it's obviously where you keep your mouse most of the time, except for those rare occasions when you dabble in a bit of software which actually supports a mouse.
You want to keep it out of the way of your everyday desktop activities :)
In the book "How Big Things Get Done" [1], Bent Flyvbjerg, among other things, identifies one common feature of the projects that do not have large outliers to go over-budget and under-deliver: modularity. Ideally, fractal modularity. His favorite examples: solar power, electric transmission, pipelines, roads. Ironically, IT/software is only slightly better than nuclear power and Olympic games [2].
[1] https://www.amazon.com/-/en/dp/B0B63ZG71H
[2] https://www.scribd.com/document/826859800/How-Big-Things-Get...
I often see big money put behind software projects, but the money then makes stake holders feel entitled to get in the way.
>Frustratingly, the IT community stubbornly fails to learn from prior failures.
So far I believe that there has been too much emphasis in education on coding and algorithms (small scale/tactical stuff) and not enough emphasis on the engineering side of things like version control, QA, system design, management etc. I think the situation has changed (40 years ago most professional programmers didn't even know what version control was, let alone use vc systems) but the scope of the projects has increased faster than our skills and tools.
Please forgive me if I get something wrong. Not a native English speaker. The article boils it down to: all is a management failure. This is also my feeling after 35 years in software development. There are no such thing than a competent middle or upper management in software development. I see sometimes even devs being promoted and in an instant forget how software is made. In the other hand I see promotion of the most stupid dev. Al this leads to massive Missmanagements and hiding of problems to the upper managers. Even worse sometimes I see the best devs promoted only to watch them break because the toxin they get from there managers kills them
Frederick Brooks in his essay "No Silver Bullet" (included in the collection Mythical Man Month) talked about the conventions of software development and I recall had called for taking an iterative approach to software development similar to what I had followed for the Automunge project, I went into a little more detail about that in my 2019 essay of the same name: https://medium.com/automunge/no-silver-bullet-95c77bc4bde1
This is what I’ve been thinking about when I talk to other people in software development when they can’t stop talking about how efficient they are with AI… yet they didn’t ship anything in their startup, or side project, or in a corporate setting, the project is still bug riddled, the performance is poor, now there code quality suffers too as people barely read what Cursor (etc) are spitting out.
I have “magical moments” with these tools, sometimes they solve bugs and implement features in 5 minutes that I couldn’t do in a day… at the same time, quite often they are completely useless and cause you to waste time explaining things that you could probably just code yourself much faster.
I'm pretty sure that we can remove the word "software" from the article headline and it remains just as true. I don't believe that software projects are unique in this regard: big, complex projects are big and complex, and prone to unexpected issues, scope creep, etc. Throw in multiple stakeholders, ineffective management, the sunk cost fallacy etc. and it's a wonder that any large projects get finished at all.
Yup, and with an equal amount of mindblowing-units-of-money spent, infrastructure projects all around me are still failing as well, or at least being modified (read: downsized), delayed and/or budget-inflated beyond recognition.
So, what's the point here, exactly? "Only licensed engineers as codified by (local!) law are allowed to do projects?" Nah, can't be it, their track record still has too many failures, sometimes even spectacularly explosive and/or implosive ones.
"Any public project should only follow Best Practices"? Sure... "And only make The People feel good"... Incoherent!
Ehhm, so, yeah, maybe things are just complicated, and we should focus more on the amount of effort we're prepared to put in, the competency (c.q. pay grade) of the staff we're willing to assign, and exactly how long we're willing to wait prior to conceding defeat?
One of the problems is scale.
Large scale systems tend to fail. large centralised and centrally managed systems with big budgets and large numbers of people who need to coordinate, lots of people with an interest in the project pushing and lobbying for different things.
Multiple smaller systems is usually a better approach, where possible. Not possible for things like transport infrastructure, but often possible for software.
> Not possible for things like transport infrastructure
It depends what you define as a system. Arguably a lot of transport infrastructure is a bunch of small systems linked with well-understood interfaces (e.g. everyone agrees on the gauge of rail that's going to be installed and the voltage in the wires).
Consider how construction works in practice. There are hundreds or thousands of workers working on different parts of the overall project and each of them makes small decisions as part of their work to achieve the goal. For example, the electrical wiring of a single train station is its own self-contained system. It's necessary for the station to work, but it doesn't really depend on how the electrical system is installed in the next station in the line. The electricians installing the wiring make a bunch of tiny decisions about how and where the wires are run that are beyond the ability of someone to specify centrally - but thanks to well known best practices and standards, everything works when hooked up together.
1 reply →
In manufacturing there are economies of scale and adding more people increases workforce, in mindfacturing there are diseconomies of scale and adding more people increases confusion, yet many managers view software with a manufacturing mindset.
Nailed it, but I fear this wisdom will be easily passed by by someone who doesn’t already intuit it from years of experience. Like the Island de la Muerta: wisdom that can only be found if you already know where it is.
>For the foreseeable future, there are hard limits on what AI can bring to the table in controlling and managing the myriad intersections and trade-offs
?? I don’t think if thinks stopped advancing in terms of significant model improvements that actual utility would be saturated for a while. We have barely begun to consolidate potential into the tooling, use cases, knowledge sharing and depth of that knowledge throughout the workforce on how to make best use.
If someone is looking at AI as a monolith in thing and thinking “oh, silver bullet to the problems of enterprise software etc” then, I really don’t know what to say except that’s on them, not on any true big claims being pushed unless you’re breaking out the long ladders to pick those cherries, or listening to people whose background and placement within things clearly makes them a bad messenger.
Looking at other domains where companies are developing complex products in highly regulated industries there’s one thing they all share in common, they invest a lot of capital in infrastructure for testing their designs. I spent years at a company trying to convince upper management in setting up a lab where we could simulate a production environment that would allow us to do a real integration test. It’s an idea hard to sell because testing is actually part of the budget in every project, so lack of testing couldn’t be attributed to our high rate of failures (going over budget fixing bugs during commissioning). Perhaps we should stop calling unit testing, testing so that we don’t confuse people. Until you we don’t put all the pieces together and do a proper stress test under close-to-realistic production conditions, our software cannot be considered tested. I think that’s the case for 99% of software companies.
Plausible article, but it reads like a preschooler frustrated that his new toy is broken. "Fix it! Make it work!" - without ever specifying how.
Granted, this is an exceedingly hard problem, and I suppose there's some value in reminding ourselves of it - but I'd much rather read thoughts on how to do it better, not just complaints that we're doing it poorly.
I wonder how much software project failure comes from lacking clear processes. Many teams, whether in companies or open source projects, never define step by step procedures for common tasks like feature development, bug triage, code review, or architectural decisions. When six developers each follow their own approach, even with consistent code style, the team can’t improve the system in a predictable and systematic way. Clear procedures don’t guarantee success, but without them teams often end up with chaos and inconsistent delivery. This lack of structured methodology seems far more common in software engineering than in other engineering disciplines.
This should be a criticism of the kinds of bloated firms that take on large government projects, the kinds of people they hire, the incentives at play, the bidding processes, the corruption and all the rest. It has very little to do with software and more just organizations that don't face any pressure to deliver.
> "Why worry about something that isn’t going to happen?”
Lots to break down in this article other than this initial quotation, but I find a lot of parallels in failing software projects, this attitude, and my recent hyper-fixation (seems to spark up again every few years), the sinking of the Titanic.
It was a combination of failures like this. Why was the captain going full speed ahead into a known ice field? Well, the boat can't sink and there (may have been) organizational pressure to arrive at a certain time in new york (aka, imaginary deadline must be met). Why wasn't there enough life jackets and boats for crew and passengers? Well, the boat can't sink anyway, why worry about something that isn't going to happen? Why train crew on how to deploy the life rafts and emergency procedures properly? Same reason. Why didn't the SS Californian rescue the ship? Well, the 3rd party Titanic telegraph operators had immense pressure to send telegrams to NY, and the chatter about the ice field got on their nerves and they mostly ignored it (misaligned priorities). If even a little caution and forward thinking was used, the death toll would have been drastically lower if not nearly nonexistent. It took 2 hours to sink, which is plenty of time to evacuate a boat of that size.
Same with software projects - they often fail over a period of multiple years and if you go back and look at how they went wrong, there often are numerous points and decisions made that could have reversed course, yet, often the opposite happens - management digs in even more. Project timelines are optimistic to the point of delusion and don't build in failure/setbacks into schedules or roadmaps at all. I've had to rescue one of these projects several years ago and it took a toll on me I'm pretty sure I carry to this day, I'm wildly cynical of "project management" as it relates to IT/devops.
> and my recent hyper-fixation (seems to spark up again every few years), the sinking of the Titanic.
But the rest of your comment reveals nothing novel other than anyone would find after watching James Cameron's movie multiple times.
I suggest you go to the original inquiries (congressional in the US, Board of trade in the UK). There is a wealth of subtle lessons there.
Hint: Look at the Admiralty Manual of Seamanship that was current at that time and their recommendations when faced with an iceberg.
Hint: Look at the Board of Trade (UK) experiments with the turning behaviour of the sister ship. In particular of interest is the engine layout of the Titanic and the attempt by the crew, inexperienced with the ship, to avoid the iceberg. This was critical to the outcome.
Hint: Look at the behaviour of Captain Rostron. Lots of lessons there.
Thanks for your feedback, I’m well aware of the inquiries and the history there. However, this post was meant to be a simple analogy that related to the broader topic, not a deep dive into the theories of how and why the titanic sank. Thanks!
1 reply →
The lesson from “big software projects are still failing” isn’t that we need better methodologies, better project management, or stricter controls. The lesson is "don't do big software projects".
Software is not the same as building in the physical world where we get economies of scale.
Building 1,000 bridges will make the cost of the next incremental bridge cheaper due to a zillion factors, even if Bridge #1 is built from sticks (we'll learn standards, stable, fundamental engineering principles, predicable failure modes, etc.) we'll eventually reach a stable, repeatable, scalable approach to build bridges. They will very rarely (in modernity) catastrophically fail (yes, Tacoma Narrows happened but in properly functioning societies it's rare.)
Nobody will say "I want to build a bridge upside-down, out of paper clips and can withstand a 747 driving over it". Because that's physically impossible. But nothing's impossible in software.
Software isn't scalable in this way. It's not scalable because it doesn't have hard constraints (like the laws of physics) - so anything goes and can be in scope; and since writing and integrating large amounts of code is a communication exercise, suffers from diseconomies of scale.
Customers want the software to do exactly what they want and - within reason - no laws of physics are violated if you move a button or implement some business process.
Because everyone wants to keep working the way they want to work, no software project (even if it sounds the same) is the same. Your company's bespoke accounting software will be different than mine, even if we are direct competitors in the same market. Our business processes are different, org structures are different, sales processes are different, etc.. So they all build different accounting software, even if the fundamentals (GaaP, double-entry bookkeeping, etc.) are shared.
It's also the same reason why enterprise software sucks - do you think that a startup building expense management starts off being a giant mess of garbage? No! IT starts off simple and clean and beautiful because their initial customer base (startups) are beggars and cannot be choosers, so they adapt their process to the tool. But then larger companies come along with dissimilar requirements and, Expense Management SaaS Co. wins that deal by changing the product to work with whatever oddball requirements they have, and so on, until the product essentially is a bunch of config options and workflows that you have to build yourself.
(Interestingly, I think these products become asymptotically stuck - any feature you add or remove will make some of your customers happy and some of your customers mad, so the product can never get "better" globally).
We can have all the retrospectives and learnings we want but the goal - "Build big software" - is intractable, and as long as we keep trying to do that, we will inevitably fail. This is not a systems problem that we can fix.
The lesson is: "never build big software".
(Small software is stuff like Bezos' two pizza team w/APIs etc. - many small things make a big thing)
I agree with you on "don't do big software project" Specially do not fast scale them out to hundreds of people. You have to scale them more organically ensuring that every person added is a net gain. They think that adding more people will reduce the time.
I am surprised on the lack of creativity when doing these projects. Why don't they start 5 small projects building the same thing and let them work for a year. At the end of the year you cancel one of the projects, increasing the funding in the other four. You can do that every year based on the results. It may look like a waste but it will significantly increase your chances of succeeding.
>Building 1,000 bridges will make the cost of the next incremental bridge cheaper due to a zillion factors, even if Bridge #1 is built from sticks (we'll learn standards, stable, fundamental engineering principles, predicable failure modes, etc.) we'll eventually reach a stable, repeatable, scalable approach to build bridges. They will very rarely (in modernity) catastrophically fail (yes, Tacoma Narrows happened but in properly functioning societies it's rare.)
Build 1000 JSON parsers and tell me if the next one isn't cheaper to develop with "(we'll learn standards, stable, fundamental engineering principles, predicable failure modes, etc.)"
>Software isn't scalable in this way. It's not scalable because it doesn't have hard constraints (like the laws of physics)
Uh, maybe fewer but none is way to far. Get 2 billion integer operations per second out of a 286, the 500 mile email, big data storage, etc. Physical limits are everywhere.
>It's also the same reason why enterprise software sucks.
The reason enterprise software sucks is because the lack of introspection and learning from the garbage that went before.
You have to be able to turn away unsuitable customers.
Working on AI that helps to manage IT shops that learns from failure & success might be better for both results and culture than most IT management roles, a profession (painting an absurdly broad brush) that tends to attract a lot of miserable creatures.
... If this happens, the next hacks will be context poisoning. A whole cottage industry will pop around preserving and restoring context.
Sounds miserable.
Also, LLMs don't learn. :)
LLMs themselves don’t learn but AI systems based around LLMs can absolutely learn! Not on their own but as part of a broader system: RLHF leveraging LoRAs that get re-incorporated as model fine tunings regularly, natural language processing for context aggregation, creative use of context retrieval with embeddings databases updated in real time, etc.
A slightly different take, its probably more of people failure, the lack of required expertise, skillset, motivation and coordination. People have motivations to do the job to make a living, success of any long term project is rarely the driving factor for most people working on it. People would know ahead of time when a project is going towards the direction of failure, its just how the things are structured. From systems perspective, an unknown system/requirement would be a good example where you build iteratively, a known set of requirements should give good enough idea about the feasibility and rough timelines even if its complex.
"big software project"
there you are why its failling, the fact that these system is overly massive and complex that sometimes the original creator and architecture that design this system cant foresee the future needs and etc
You could says its incompetence but the fact that software change so much is last 20 years make it most people cant really design a "future proof" system in a way that it cant cause trouble in the future
>By then, the general public and the branch managers themselves finally joined Computer Weekly’s reporters (who had doggedly reported on Horizon’s problems since 2008) in the knowledge that there was something seriously wrong with Horizon’s software.
Computer Weekly first broke the story, for which they deserve much credit. But I believe Private Eye did much of the long term campaigning.
I like that the author propagates software developer liability. That makes sense. Unless we introduce such system, the incentives are not there to avoid failure
https://queue.acm.org/detail.cfm?id=3489045
https://therecord.media/cybersecurity-software-liability-sta...
While I don't disagree, it would increase the cost of software by an extraordinary amount.
It's good that the author makes the distinction between developers and managers. This distinction is rarely made and most media outlets talk about the wrongdoings of developers, who are almost never the decision makers of failing projects. It's quite the opposite, they are the ones who if brave enough criticize bad management practices and the lack of proper management of the software project.
It's possible that most business projects fail.
Most advertising campaigns fail.
In my very humble opinion, the impact that software has on our lives is getting to the point where software engineering should become a true profession like the other engineering branches (electrical, mechanical, etc).
There should be things like professional certifications that engineers have to maintain through continuous education, a professional code of ethics, a board of review, and other functions.
My reasoning is that we are at the point where a software "engineer" can make a mistake that can have the same impact as a civil engineer making a bad calculation and causing a bridge collapse.
There's different levels to this, of course. An app for booking restaurant reservations wouldn't need that much oversight. But we've seen some outages having massive impacts that quite frankly did not happy twenty years ago.
Software was failing and mismanaged.
So we added a language and cultural barrier, 12 hour offset, and thousands of miles of separation with outsourcing.
Software was failing and mismanaged.
So now we will take the above failures, and now tack on an AI "prompt engineering" barrier (done by the above outsourced labor).
And on top of that, all engineers that know what they are doing are devalued from the market, all the newer engineers will be AI braindead.
Everything will be fixed!
The most mind boggling number in this article to me was PeopleSoft claiming it would cost $500 million to make a payroll system for the Canadian government. That’s thousands of working years of software developers. It’s such a huge scale that it seems pretty clear the project should never start. PeopleSoft should have been dumped and the project’s scope massively reevaluated.
Failure typically comes from two directions. Unknown and changing requirements, and management that relies on (often external) technical (engineering) leadership that is too often incompetent.
These projects are often characterized by very complex functional requirements, yet are undertaken by those who primarily only know (and endlessly argue about) non-functional requirements.
So I haven't looked through the comments, and assume this has been discussed, but the simple solution is to limit contracts to, say, $4M, and pay only on successful completion. Then build a large project through a series of smaller steps.
The main problem are incentives and risks: in most of the cases you are not incentivized to build secure and reliability SW because, most of the time, it's easy to fix it. With particular categories of SW(eg. one distributed on remote system, medical sw, military sw) or HW it's the opposite: a failure it's not so easy to fix so you are incentivized to do a better job.
The second problem are big con.
Every improvement will be moderated increased demands from management, crunch, pressure to release, "good enough", add this extra library that monetizes/spys on the customer etc
In the same way that hardware improvements are quickly gobbled up by more demanding software.
The people doing the programming will also be more removed technically. I can do Python, Java , Kotlin. I can do a little C++ ,less C, and a lot less assembly.
will be moderated by* increased demands.
An endless succession of new tools, methodologies, and roles but failure persists because success is rooted in good judgment, wisdom, and common sense.
This has dot-com bubble written all over it. But there are some deeper issues.
First, we as a society should really be scrutinizing what we invest in. Trillions of dollars could end homelessness as a rounding error.
Second, real people are going to be punished for this as the layoffs go into overdrive, people lose their houses and people struggle to have enough to eat.
Third, the ultimate goal of all this investment is to displace people from the labor pool. People are annoying. They demand things like fair pay, safe working conditions and sick leave.
Who will buy the results of all this AI if there’s no one left with a job?
Lastly, the externalities of all this investment are indefensible. For example, air and water pollution and rising utility prices.
We’re bouldering towards a future with a few thousand wealthy people where everyone else lives in worker housing, owns nothing and is the next incarnation of brick kiln workers on wealthy estates.
Systemically, how would you solve homelessness, if I gave you a trillion dollars?
A trillion in a money market fund @ 5% is 50B/year.
Over the course of a few years (so as to not drive up the price of politicians too quickly) one could buy the top N politicians from most countries. From there on out your options are many.
After a decade or so you can probably have your trillion back.
1 reply →
The article isn't really about AI (for a change).
Is it a failure if we ship the project a year late? What if everyone involved would have predicted exactly that outcome
AI will absolutely solve these problems, by inventing nimble AI native companies that disrupt these business models into the Stone Age, worker by worker, role by role. Death by a billion cuts.
Hot take: It's not technical problems causing these projects to fail.
It's leadership and accountability (well, the lack of them).
And that often takes a particular form: The requirements never converge, or at least never converge on anything realistically buildable.
I spent way less - and they still fail!
Completely off topic but when fonts are the size they are in this article I can't read it, the words don't register as words above a certain size. I assume this isn't normal or it wouldn't be so common...
>not only are IT projects risky, they are the riskiest from a cost perspective.
Nuclear reactor projects seem to be regularly delivered massively late and over budget.
Nuclear power plants usually only cost about twice as much as projected in phase II planning. IT projects are sort of open-ended. Interestingly, the simulator I was involved with (many decades ago) at a nuclear power plant came within about 10% of initial projections. The last "scan uploads for viruses" project I worked on was about 20x - 40x more expensive than projected. (Unfortunately the person who suggested we just pay a third party for this service was fired.) The bit with projecting cost and schedules for nuke plants is to ignore any initial costing and multiply the phase II planning estimate by 2 or 4.
>20x - 40x more expensive than projected
That is an impressive cost overrun!
I guess the issue with nuclear is that they are so expensive that even a x2 overrun is disastrous.
1 reply →
From consulting point of view, a common joke we use to tell, because customers demand a Ferrari, but are only willing to pay for the development costs of a Fiat.
Change your idea of success to being "Propping up the consultant market", and by that definition these projects are smashing it out of the park.
Worth a view also. Is software engineering still an oxymoron?
https://youtu.be/D43PlUr1x_E?si=em2nNYuI8WDvtP21
How much money do you need to build a skyscraper on top of a tarpit? None because it’s not possible. The whole stack has to be gutted. I can do it but no one wants to listen so I’ll do it myself.
Almost nobody who works in software development is a licensed professional engineer. Many are even self-taught, and that includes both ICs and managers. I'm not saying this is direct causation but I do think it odd that we are so utterly dependent on software for so many critical things and yet we basically YOLO its development compared to what we expect of the people who design our bridges, our chemicals, our airplanes, etc.
Licensing and the perceived rigor it signifies is irrelevant to whether something can be considered "professional engineering." Engineering exists at the intersection of applied science, business and economics. So most software projects can be YOLO'd simply because the economics permit it, but there are others where the high costs necessitate more rigor.
For instance, software in safety-critical systems is highly rigorously developed. However that level of investment does not make sense for run-of-the-mill internal LOB CRUD apps which constitute the vast majority of the dark matter of the software universe.
Software engineering is also nothing special when it comes to various failure modes, because you'll find similar examples in other engineering disciplines.
I commented about this at length a few days ago: https://news.ycombinator.com/item?id=45849304
https://en.wikipedia.org/wiki/Productivity_paradox
No big surprise. Taking a shitty process and "digitalizing" it will lead to a shitty process just on computers in the best case, in the worst case everything collapses.
What a joke blaming the IT community for not doing better, when most businesses refuse to look past anything but shipping features as fast as they can. "We take security and reliability very seriously", until management gets wind of the price tag. Guess what areas always get considered last and cut first. We all know.
But sure, blame the community at large, not the execs who make bad decisions in the name of short-term profits, then fail upward with golden parachutes into their next gig.
And definitely don't blame government for not punishing egregious behavior of corporations. Don't worry, you get a year of free credit monitoring from some scummy company who's also selling your data. Oh and justice was served to the offending corp, who got a one-time fine of $300k, when they make billions every quarter.
Maybe if we just outsource everything to AI, consultants, and offshore sweat shops things will improve!
Okay cool, good article.
> blaming the IT community for not doing better, when most businesses refuse to look past anything but shipping features
IT != software engineering. IT is a business function that manages a company's internal information. Software engineering is a time-tested process of building software.
A lot of projects fail because management thinks that IT is a software engineering department. It is not. It never was, and it never will be. Its incentives will never be aligned such that software engineering projects are set up for success.
The success rate of implementing software outside of IT and dragging them along later is much higher than implementing it through IT from the beginning.
I understand, but also, IT is an umbrella term for a wider industry that includes your definition of IT, software, and anything adjacent. If you read the article, you'll see it's the latter being referenced, and why I chose that terminology.
> The success rate of implementing software outside of IT and dragging them along later is much higher than implementing it through IT from the beginning.
That's a pretty strong statement. Isn't that the opposite of why the devops movement started?
managing software requirements and the corresponding changes to user/group/process behaviors is by far the hardest part of software development, and it is a task no one knows how to scale.
absent understanding, large companies engage in cargo cult behaviors: they create a sensible org chart, produce a gannt chart, have the coders start whacking code, presumably in 9 months a baby comes out.
every time, ugly baby
There is no such thing as ‘simplicity science’ that can be directly applied when dealing with IT problems. However, many insights of complexity science are applicable to solving real world IT problems. People love simple solutions. However Simple is a scam, https://nocomplexity.com/simple-is-a-scam/
There are no generic, simple solutions for complex IT challenges. But there are ground rules for finding and implementing simple solutions. I have created a playbook to prevent IT diasasters, The art and science towards simpler IT solutions see https://nocomplexity.com/documents/reports/SimplifyIT.pdf
Interesting article ... with the wisdom of software engineering being forgotten, it will unfortunately get worse ...
Projects don’t fail, people do. If projects fail it means the wrong people are hired for them.
The concerning aspect of all of this isn't the financial cost of these blunders, and what happened in the past. It is the increasing risk to human lives, and what will happen in the future. The Boeing case was only a sign of what's to come.
Take "AI", for instance. It is being adopted left and right as if it's the solution to all of our problems, and developers, managers, and executives are increasingly relying on it. Companies and governments love it because it can cut costs and potentially make us more productive. Most people are more than happy to offload their work to it, do a cursory check of its output, if at all, and ship it or publish it and claim the work as their own. After all, it can always serve as a scapegoat if things do go wrong, and its manufacturers can brush these off as user errors. Ultimately there is no accountability.
These are all components of a recipe for greater disasters. As these tools are adopted in industries where safety is paramount, in the military, etc., it's only a matter of time for more human lives to be impacted. Especially now when more egomaniacal autocrats are taking power, and surrounding themselves with yes-people. Losing face and admitting failure is not part of their playbook. We're digging ourselves into a hole we might not be able to get out of.
because most people are incompetent, produce incidental complexity to satisfy internal urge for busy work, and under-think the problem, greatly... that's why, and don't get me started on the morons who run the show
Fujitsu has never been served justice for Horizon.
'Managers' aren't really getting any better as time goes on...
Why don’t i hear news of such failures from India and China?
There are great big software projects and shitty ones. IRCTC, UPI being examples of great ones.
Insurance and RTO being shitty ones.
I had an insurance deadline very near and the payment was not showing up in the insurance providers dashboard so had to do it twice and now it was still not showing up.
Also I have faced huge problems with getting the learner's licence online.
Do let me know if you've faced anything same.
I got my name wrong in the drivers card and never went to correct it. However most of the problems the problems there were administrative not software. I agree both irctc and upi come to mind first as the successes. Insurance, could be a particular company as i never faced such problem. Websites for Tax filing and even starting an msme has been smooth.
The article is kind of dumb. eg it hangs its hat on the Phoenix payroll system, which
> Phoenix project executives believed they could deliver a modernized payment system, customizing PeopleSoft’s off-the-shelf payroll package to follow 80,000 pay rules spanning 105 collective agreements with federal public-service unions. It also was attempting to implement 34 human-resource system interfaces across 101 government agencies and departments required for sharing employee data.
So basically people -- none of them in IT, but rather working for the government -- built something extraordinarily complex (80k rules!), and then are like wow, it's unforeseen that would make anything downstream at least equally as complex. And then the article blames IT in general. When this data point tells us that replacing a business process that used to require (per [1]) 2,000 pay advisors to perform will be complex. While working in an organization that has shit the bed so thoroughly that paying its employees requires 2k people. For an organization of 290k, so 0.6% of headcount is spent on paying employees!
IT is complex, but incompetent people and incompetent orgs do not magically become competent when undertaking IT projects.
Also too, making extraordinarily complex things they shouting the word "computer" at them like you're playing D&D and it's a spell does not make them simple.
[1] https://www.oag-bvg.gc.ca/internet/English/parl_oag_201711_0...
To stop failing we could use AI to replace managers not software developers.
No need to waste GPUs, a simple bash script that alternates between asking for status updates and randomly changing requirements would do
We are seeking improvements, not the status quo.
Slightly related but unpopular opinion I have: I think software, broadly, today is the highest quality its ever been. People love to hate on some specific issues concerning how the Windows file explorer takes 900ms to open instead of 150ms, or how sometimes an iOS 26 liquid glass animation is a bit janky... we're complaining about so much minutia instead of seeing the whole forest.
I trust my phone to work so much that it is now the single, non-redundant source for keys to my apartment, keys to my car, and payment method. Phones could only even hope to do all of these things as of like ~4 years ago, and only as of ~this year do I feel confident enough to not even carry redundancies. My phone has never breached that trust so critically that I feel I need to.
Of course, this article talks about new software projects. And I think the truth and reason of the matter lies in this asymmetry: Android/iOS are not new. Giving an engineering team agency and a well-defined mandate that spans a long period of time oftentimes produces fantastic software. If that mandate often changes; or if it is unclear in the first place; or if there are middlemen stakeholders involved; you run the risk of things turning sideways. The failure of large software systems is, rarely, an engineering problem.
But, of course, it sometimes is. It took us ~30-40 years of abstraction/foundation building to get to the pretty darn good software we have today. It'll take another 30-40 years to add one or two more nines of reliability. And that's ok; I think we're trending in the right direction, and we're learning. Unless we start getting AI involved; then it might take 50-60 years :)
Kind of strange take as though unique to software. Every sector that is large has issues since ambitious projects stretch what can be done by the current management and organizational practices. All software articles like these hark back to some mythical world smaller in scope/ambition/requirements. Humanity moves forward
* Construction and Engineering -- Massive cost overruns and schedule delays on large infrastructure projects (e.g., public transit systems, bridges)
* Military and Government -- Defense acquisition programs notorious for massive cost increases and years-long delays, where complex requirements and bureaucratic processes create an environment ripe for failure.
* Healthcare -- Hospital system implementations or large research projects that exceed budgets and fail to deliver intended efficiencies, often due to resistance to change and poor executive oversight.
Throwing money at a problem never works and never will!
> IT projects suffer from enough management hallucinations and delusions without AI adding to them.
Software is also incredibly hard, the human mind can understand the physical space very well but once we're deep into abstractions it simply struggles to keep up with it.
It is easier to explain how to build a house from scratch to virtually anyone than a mobile app/Excel.
I came to opposite conclusions. Technology is pretty easy, people are hard and the business culture we have fostered in the last 40 years gets in the way of success.
Easy, just imagine a 1GB array as a 2.5mm long square in RAM (assuming a DRAM cell is 10nm). Now it's physical.
> We are left with only a professional and personal obligation to reemphasize the obvious: Ask what you do know, what you should know, and how big the gap is between them before embarking on creating an IT system. If no one else has ever successfully built your system with the schedule, budget, and functionality you asked for, please explain why your organization thinks it can
translation: "leave it to us professionals". Gate-keeping of this kind is exactly how computer science (the one remaining technical discipline still making reliable progress) could become like all of the other anemic, cursed fields of engineering. people thinking "hey im pretty sure I could make a better version of this" and then actually doing it is exactly how progress happens. I hope nobody reads this article and takes it seriously
People concerned about small benefits from AI should consider that IT and the internet is failing upwards for more than three decades now.
The purpose of a system is what it does.
1. Enable grift to cronies
2. Promo-driven culture
3. Resume-oriented software architecture
There are 2 big problems with large software projects:
1. Connecting pay to work - estimates (replanning is learning, not failure)
2. Connecting work to pay - management (the world is fractal-like, scar tissue and band-aids)
I do not pre-suppose that there are definite solutions to these problems - there may be solutions, but getting there may require going far out of our way. As the old farmer said "Oh, I can tell you how to get there, but if I was you, I wouldn't start from here"
1. Pay to Work - someone is paying for the software project, and they need to know how much it will cost. Thus estimates are asked for, an architecture is asked for, and the architecture is tied to the estimates.
This is 'The Plan!'. The project administrators will pick some lifecycle paradigm to tie the architecture to the cost estimate.
The implementation team will learn as they do their work. This learning is often viewed as failure, as the team will try things that don't work.
The implementation team will learn that the architecture needs to change in some large ways and many small ways. The smallest changes are absorbed in regular work. Medium and Large changes will require more time (thus money); This request for more money will be viewed as a failure in estimation and not as learning.
2. Work to Pay - as the architecture is implemented, development tasks are completed. The Money People want Numbers, because Money People understand how they feel about Numbers. Also these Numbers will talk to other Numbers outside the company. Important Numbers with names like Share Price.
Thus many layers of management are chartered and instituted. The lowest layer of management is the self-managed software developer. The software developer will complete development tasks related to the architecture, tied to the plan, attached to the money (and the spreadsheets grew all around, all around [0]).
When the developer communicates about work, the Management Chain cares to hear about Numbers, but sometimes they must also involve themselves in failures.
It is bad to fail, especially repeated failures at the same kind of task. So managers institute rules to prevent failures. These rules are put in a virtual cabinet, or bureau. Thus we have Rules of the Bureau or Bureaucracy. These rules are not morally bad or good; not factually incorrect or correct, but whenever we notice them, they feel bad; We notice the ones that feel bad TO US. We are often in favor of rules that feel bad to SOMEONE ELSE. You are free to opt out of this system, but there is a price to doing so.
----
Too much writing, I desist from decoding verbiage:
Thus it is OK for individuals to learn many small things, but it is a failure for the organization to learn large things. Trying to avoid and prevent failure is viewed as admirable; trying to avoid learning is self-defeating.
----
0. https://www.google.com/search?q=the+green+grass+grew+all+aro...
> git commit -am "decomposing recapitulating and recontextualizing software development bureaucracy" && git push
Bureaucracy is: scar tissue, someone else's moat, someone else's data model
AI will fix this
[dead]
[dead]
[dead]
People are _so amazingly close_ to realizing what is wrong with this entire industry. So close.
Enlighten us.
This is a direct result of using leetcode in interviews instead of any other, more legitimate tests like winning a tekken 1v1. Have you ever seen a good developer who’s not good at real video games?
If companies had hired real developers instead of cosplayers who are stunlocked with imposter syndrome as the only candidate pool with time to memorize a rainbow table of arbitrary game trivia questions and answers, things would actually work.
The biggest reason is developer ego. Devs see their code as artwork an extension of themselves, so it's really hard to have critical conversations about small things and they erupt into holy wars. Off hand:
* Formatting
* Style
* Conventions
* Patterns
* Using the latest frameworks or whats en-vogue
I think where I've seen results delivered effectively and consistently is where there is a universal style enforced, which removes the individualism from the codebase. Some devs will not thrive in that environment, but instead it makes the code a means-to-the-end, rather than being-the-end.
As far as I can see in the modern tech industry landscape, virtually everyone has adopted style guides and automatic formatting/linting. Modern languages like Go even bake those decisions into the language itself.
I'd consider managing that stuff essentially table-stakes in big orgs these days. It doesn't stop projects from failing in highly expensive and visible ways.
> in the modern tech industry landscape, virtually everyone has adopted style guides and automatic formatting/linting
the modern tech industry landscape is in absolute terms is small compared to the wider tech industry landscape afaik
The UK Post Office lied and made people kill themselves ... because of dev ego?
To me it screams more like an organization not wanting to assume blame and risk paying for their errors.
Ironically, the downvotes pretty much prove this is exactly correct.
Eh, you're not wrong, but management failures tend to be a bigger issue. On the hierarchy of ways software projects fail, developer ego is kind of upper-middle of the pack rather than top. Delusional, ignorant, or sadistic leadership tends to be higher.