Comment by ellius

5 years ago

I'm a younger engineer, so I say this with a degree of humility, but is trying to modify these monstrous legacy systems on the fly during a worldwide catastrophe really the most responsible or intelligent solution? Is there no way to authorize some kind of emergency system built with modern tooling to offload the system burden and then process it later through the system of record? If I were trying to deal with this problem, my approach would be the following:

- Organize the legislature(s) to pass some kind of emergency act that gives protection around PCI compliance and that sort of thing to new engineering teams

- Hire business and technical experts, fast

- Build a system on AWS that can handle massive scale and start offloading the most important business functionality to that system (things like registering for aid, fraud prevention)

- when things begin to return to normal, start processing requests through the original system, deal with problems, whatever

I know all of this is not easy. Legal and compliance stuff is tough, and I know the domain itself must not exactly be simple. But this is an emergency, and it calls for emergency measures. Trying to hack your way through an ancient code base on the fly seems like it might be doomed to failure.

4 comments

ellius

gregjor 5 years ago

I think you underestimate the scope and scale of systems like this. And maybe overestimate the success rate of rewrites, made worse when the developers have no domain experience and a rushed schedule. Under the best conditions rewrites fail more often than not.

Maintaining legacy code isn’t always easy, but it has a big advantage over a rewrite: it works and satisfies requirements to some non-zero percentage, whereas imaginary code built with “modern tooling” does not work or meet any requirements. COBOL and old business systems aren’t that hard to understand.

chberry 5 years ago

I completely agree. Rewriting a system on the fly is absolutely going to fail. There are no requirements and without the domain knowledge, there is no way that anything could be completed correctly in any reasonable time. This is a project that should have been scheduled years ago.
Some level of technical debt will exist in every organization. The problem is that NJ never had prioritized this work and now they have an emergency and they are looking for volunteers. They should absolutely should be paying for this work.
ellius 5 years ago
I wasn't imagining a full rewrite, more like a "holding area" that could defer some of the load. Again I may be totally wrong about this—I have never written a COBOL system, and if there is some plausible way to make it effective for this problem in a reasonable timeframe, then I understand why that would make sense. I understand all of your reasoning and agree with it in general. I know rewrites are very expensive and often ineffective. I'm just skeptical that the system can be modified quickly to deal with this new load. But again I am not an expert and could be totally wrong, and I'm open to being convinced.
- gregjor 5 years ago
  
  The big problems likely have very little to do with COBOL, which is just another programming language, not an alien artifact. The problem is legacy systems like this that pre-date relational databases (mid-1980s) have huge volumes of data stored in application-specific formats that modern languages don't have drop-in packages or libraries for. The data is always a bigger problem than the code in large systems.
  For the record, I wrote enterprise software in COBOL back in the late 70s and early 80s. COBOL itself is no big deal to learn or read. Working with huge volumes of data stored in proprietary binary formats (which COBOL was designed for) is not something I want to go back to -- relational databases rule the world for a good reason.
  As for shunting some work to another system, imagine how you would do that with something much simpler. Suppose Excel suddenly stopped working for you with some big spreadsheets. How would you shunt that to another application? How would you do it without risking the important data? It's not a simple problem of just spinning up more servers and replacing some old data management code with MySQL or Redis.