Comment by abigail95

2 days ago

Something is missing here, why do batch jobs take 13 hours? If this thing was started on an old mainframe why isn't the downtime just 5 minutes at 3:39 AM?

Exactly how much data is getting processed?

Edit: Why does rebuilding take a decade or more? This is not a complex system. It doesn't need to solve any novel engineering challenges to operate efficiently. Article does not give much insight into why this particular task couldn't be fixed in 3 months.

The batch jobs don't take 13 hours. They're just scheduled to run some time at night where the old offices used to be closed and the jobs could be ran with some expectations regarding data stability over the period. There are probably many jobs scheduled to run at 1AM then 2AM, etc, all depending on the previous to be finished so there is some large delay to ensure that a job does not start before the previous one is finished.

As to your "not a complex system" remark, when a system is built for 60 years, piling up new rules to implement new legislation and needs over time, you tend to end up with a tangled mess of services all interdependent that are very difficult to replace piece-wise with a new shiny architecturally pure one. This is closer to a distributed monolith than a microservices architecture. In my experience you can't rebuild such a thing "in 3 months". People who believe that are those that don't realize the complexity and the extraordinary amount of specifics, special cases, that are baked into the system, and any attempt to just rebuild from scratch in a few months hits that wall and ends up taking years.

  • Anyone who doesn't understand what's so difficult should read this:

    https://wiki.c2.com/?WhyIsPayrollHard

    Its from a different domain, but it gives you a flavour of the headaches you encounter. These systems always look simple from the outside, but once you get inside you find endless reams of interrelated and arbitrary business rules that have accumulated. There is probably no complete specification (unless you count the accumulated legal, regulatory and procedural history of the DVLA), and the old code will have little or no accurate documentation (if you are lucky there will be comments).

    • Basically this. The people running the show would desperately like to make it simpler, but ultimately it’s left overly complicated due to priorities from past leadership well above our paygrade.

      The right solution is always to just rip off the bandaid and do it again by hand in a new language or platform, and to eliminate useless complexity while doing so. Unfortunately no leader would ever do this because the Board and/or Shareholders would crucify them for not outsourcing it to McKinsey first and using the fancy-pants automation tool their report recommended.

      2 replies →

  • The code will be spaghettified and hideous. The queries will be nonsense.

    That doesn't change the fact that the ultimate goal of the system is to manage drivers licenses.

    > In my experience you can't rebuild such a thing "in 3 months".

    Me and my team rebuilt the core stack for the central bank of a developing country. In 3 months. The tech started in the 70s just like this. Think bigger.

    • Good for you, it means either your system was sufficiently simple to be fully implemented in 3 months from scratch including all business rules, or you build a new system which left out some amount of rules from the original system without this posing a problem. I don't know much about how central banks work so that might be possible. But not all systems present those characteristics.

    • For some reason those comments always seem to imply that every business doesn't have these problems too.

      Every business has these problems. In most cases, the ones who don't change get swept away. The places that do not change are usually ones that can't go out of business. But every place has systems like this, you have to rebuild them, it isn't fun but there is no choice.

      A tiny system like the DVLA is complex, hilarious (this is the same place that has had to reduce service provision because some staff just stopped turning up for weeks after Covid, public-sector productivity in the UK is at the same level as 1997, to just get to the same level as the private sector...which isn't growing productivity very fast...you would need to fire ~2m workers, the total workforce in the UK is 30m btw).

      2 replies →

    • > Think bigger.

      One of the easier parts of this involves addressing, which in the UK is notoriously easy, reliable, and easy to process - especially the best-in-class Ordnance Survey stuff like AddressBase Premium, right?

      A quick trawl of Github will shed some light on it - especially how much of a pain it is to get ABP into a usable state - and this is data that's core and integral to the service, the "are you a real user, a typo, a fraudster, a data supply issue, or getting things wrong in good or bad faith?" kind of business logic.

      And it's doubly hard, because the government requires people to update their license when they change address - which often enough involves a new-build property, where the address (let alone UPRN - sometimes even the USRN!) is completely new to you.

      Thinking bigger: imagine sitting at your desk during the first couple of weeks on the job, database validation checks running merrily in the background while you're staring at a screen. There's a mild frown forming on your face. You'd been scrolling over a list of rejected records in front of you, largely looking good - _how did they miss THAT fraud _ you'd briefly chuckled to yourself - but _this_ one...

      It's a valid business entity, trading from the valid address, and you've hand-checked both _and_ got a junior who lives nearby to send you a photo of it, and, well, the wit running the business has decided to trade under the name _FUCKOFFEE_, and... that's... just going to have to be someone else's problem, you shrug.

      (to be clear: the hard part of the DVLA project is _not_ implementing the coding, database, and systems design work)

      6 replies →

    • Yeah, I always raise an eyebrow at attitudes like that too.

      I've also reimplemented or gradually replaced several out-of-date systems. Albeit on a smaller scale.

      In my experience, when you start picking the programs apart you find 90% of the code is redundant or boilerplate. Much of it isn't even called from anywhere, abandoned code, and can be deleted en masse. A lot of programmers don't clean code up "just in case" and then no-one else deletes it.

      They can also often be vastly simplified because programmers back then didn't have the patterns and knowledge to write consisely.

      I often find myself simplifying the original code first, which gets rid of 50% of it. Then I can see what the code actually does and rewrite it which gets rid of the other 40%.

      On the other hand, many programmers don't have the patience, stubbornness or skill to do this kind of work.

      And the ability to get through the major panic you have when you're half way through and wondering if you were mad to even start.

      1 reply →

  • > In my experience you can't rebuild such a thing "in 3 months". People who believe that are those that don't realize the complexity and the extraordinary amount of specifics, special cases, that are baked into the system, and any attempt to just rebuild from scratch in a few months hits that wall and ends up taking years.

    Rebuilding a legacy system doesn't require you to support every single edge case that the older system did. It's okay to start off with some minor limitations and gradually add functionality to account for those edge cases.

    Furthermore, you've got a huge advantage when remaking something: you can see all the edge cases from the start, and make an ideal design for that, rather than bolting on things as you go (which is done in the case of many of these legacy systems, where functionality was added over time with dirty code in lieu of refactoring).

    • > Rebuilding a legacy system doesn't require you to support every single edge case that the older system did.

      Depends on context.

      This isn't some social media fun site where you can live with some rough edges; in this context "edge case" may be someone with an health condition who is still entitled to a drivers license; or it could be someone who normally could get one but due to a health condition really shouldn't be allowed one!

    • This generally isn’t true in the case of government systems. For the most part they are performing tasks that are required by law, and it is not acceptable to stop some of them, even temporarily. If you’re lucky you can run the old and new systems side-by-side while the 100% feature migration occurs, but that isn’t always feasible.

      2 replies →

Per their own data, the DVLA are responsible for the records of 52 million drivers and 46 million vehicles. Those records are immensely complex, because they reflect decades of accumulated legislation, regulation and practice. Every edge case has an edge case.

There's someone, somewhere in the bowels of the DVLA who understands the rules for drivers with visual field defects who use a bioptic device. There's someone who knows which date code applies to a vehicle that has been built with a brand new kit chassis but an old engine and drive train. There's someone who understands the special rates of tax that apply to goods vehicles that are solely used by showmen, or are based on certain offshore islands. God help any outsider who has to condense all of that institutional knowledge into a working piece of software.

Government does not have a good track record of ground-up refactors of complex IT systems. The British government in particular does not have a good track record. Considering all that, the fact that most interactions with DVLA can be done entirely online is borderline miraculous.

https://assets.publishing.service.gov.uk/media/675ad406fd753...

  • I would be really really surprised if the database actually encodes all of these edge cases you are thinking about in a structured way. In other words, I really doubt there's code like `if engine_age > drivetrain_age` or whatever.

    • The point is until you start ripping the application apart you have no idea what the internal logic looks like.

      When you look you can find terrors that will haunt you in the night where some ancient limit, say number of columns in a database end up holding multiple structures that are getting if/then'd later in the application.

      I would completely and totally believe there is code just like that.

      1 reply →

Our systems took 8 hours to back up. Then it grew to 12 hours [0]. The system was a side project by an intern fresh out of college. Over the years, it grew into a crucial software the company relied on. I joined over 10 years later and was able to bring it down to few minutes.

[0]: https://news.ycombinator.com/item?id=38456429

It’s funny to me that I would never ask those questions. I’ve specialized in legacy rehab projects (among other things) and there seems to be no upper bound on how bad things can be or how many annoying reasons there are for why we can’t “just fix it.” Those “just” questions—which I ask too—end up being hopelessly naive. The answers will crush your soul if you let them, so you can’t let them, and you should always assume things are worse than you think.

TFA is spot on - the way to make progress is to cut problems up and deliver value. The unfortunate consequence is that badness gets more and more concentrated into the systems that nobody can touch, sort of like the evolution of a star into an eventual black hole.

  • I made a lot of money moving mid size enterprises from legacy ERP systems to custom in house ones.

    The DVLA dataset and the computations that are run on it can be studied and replicated in 3 months by a competent team. From there it can be improved.

    There is no way that this system requires 13 hours of downtime. If it required two hours - even if the code was generated through automation it can be reverse engineered and optimized.

    It is absolute rubbish that this thing is still unavailable outside of 8am-7pm.

    I maintain my position that it could be replaced in 3 months.

    I got my start in this business when I was in university and they told us our online learning software was going offline for 3 days for an upgrade. Those are the gatekeepers and low achievers we fight against. Think bigger.

    • Ya I don't think I'd let you in two miles of a system like this.

      Replacing legacy stuff always expands in scope far beyond the initial changes.

      When you have to come back and add wait() entries in your new program because it spits data back faster than the old program ever could which then causes peripheral devices/drivers to crash which then pulls a dev and testers off something else important for days figuring out what kind of fresh hell is occurring is just status quo for ancient systems.

      2 replies →

    • > The DVLA dataset and the computations that are run on it can be studied and replicated in 3 months by a competent team. From there it can be improved.

      Such an HN comment. Made me lol. Think funnier!

> Edit: Why does rebuilding take a decade or more? This is not a complex system. It doesn't need to solve any novel engineering challenges to operate efficiently. Article does not give much insight into why this particular task couldn't be fixed in 3 months.

You do know the UK government has been cutting all their budgets to the bone for about 10 years? That means everywhere is pretty much understaffed.

And how do you know it's not a complex system? I would think that a system like that would be somewhat complex. It's not just driving licenses but a whole bunch of other things that are handled by the DVLA.

  • Public-sector employment in the UK is at record highs. Despite apparently cutting inputs, productivity has collapsed to the same level as 1997 in the public sector. It is wildly overstaffed/overfunded by any estimation (and to be clear, there have been no cuts...the cuts in the early 2010s were not particularly significant, around 2-3% of GDP, public spending is as high as it has ever been in the UK, it was significantly lower under Blair...the only time it has reached this level is WW2 and 1975, the financial year the UK govt was bankrupted).

    DVLA isn't complex. We live in a world of regulation, rules, and standards. Almost every large business does stuff like this at a global scale. It isn't complex, it just has to be complex so the budget is filled (and Fujitsu can get their contract).

    • This is what we call cherry-picking stats. You pick a single stat but ignore everything else. Your comment seems to imply that the UK government has been lying about austerity for 10 years. While I don't trust the tory party and think they're corrupt with their deals-for-buddies approach, I don't think they outright lie for 10 years.

      The policing budget is so bare-bones that the police have literally admitted they will not attend all 999 calls. To make that clear, they have admitted if you call them in an emergency they may not show up. NHS waiting times are sky-high. The number of NHS hospitals and beds are rock bottom. We can dig and dig into various public sectors and see them being terrible because of austerity. Which Labour is kinda being forced to continue due to the effects of covid on the finincals of the UK (and Brexit)

    • Public sector employment is at a high because we had to hire thousands of staff due to Brexit… probably the most stupid productivity destroying own goal a country has ever committed (so far)

  • The system may or may not be complex but the data is has to store and transform is not. Because it handles drivers licenses. A function that has been done on pen and paper and filing cabinets.

    Study the data, study the operations, reduce complexity.

    Since you imply you know more about UK budgets than I do - how much is the DVLA budgeted for IT operations like this and how much more would you give them to expect this problem solved?

    I can argue real numbers but vibes about bone dry budgets I cannot.

    • Are you suggesting that a process once done using pen and paper can't possibly be complicated?

      I have no insight into the DVLA, but the idea that no paper process could ever be complicated is really funny. The UK enjoyed/loathed centuries of bureaucracy before computers were invented. At one point getting a divorce required an Act of Parliament specifically naming the unhappy couple! Being restricted to pen and paper hardly inhibited the human ability to create complex systems.

    • > The system may or may not be complex but the data is has to store and transform is not. Because it handles drivers licenses. A function that has been done on pen and paper and filing cabinets.

      It handles more than just driving licenses... The DVLA do more than just driving licenses.

      > Since you imply you know more about UK budgets than I do - how much is the DVLA budgeted for IT operations like this and how much more would you give them to expect this problem solved?

      It's not budgeted anything for this as far as I know. I believe it's handled by Government Digital Services which handles lots of the digital services for various departments. The budget for all of GDS is about 90 million most of which isn't for .gov.uk. A rewrite of that size I would expect to cost about 50-60 million in total but take several years.

      5 replies →

  • I think it's "local councils, yes cut to the bone; Whitehall and its satellites, no". Similarly, Whitehall ended out the Thatcher and Major ministries with more regulators for privatized industries, more centralized decision-making, and a larger bureaucracy than ever.

    • Central government departments have been cut too… the staff needed for the Brexit disaster disguise this

The problem is that the full set of specifications accumulated over three decades of usage is exactly as complicated as the code that still runs.

Just wait 10 more years and hope AI can solve it.

In the meantime, people can't renew their driver's license at 3:36. So what? Is that a requirement?