Comment by ajnin

2 days ago

The batch jobs don't take 13 hours. They're just scheduled to run some time at night where the old offices used to be closed and the jobs could be ran with some expectations regarding data stability over the period. There are probably many jobs scheduled to run at 1AM then 2AM, etc, all depending on the previous to be finished so there is some large delay to ensure that a job does not start before the previous one is finished.

As to your "not a complex system" remark, when a system is built for 60 years, piling up new rules to implement new legislation and needs over time, you tend to end up with a tangled mess of services all interdependent that are very difficult to replace piece-wise with a new shiny architecturally pure one. This is closer to a distributed monolith than a microservices architecture. In my experience you can't rebuild such a thing "in 3 months". People who believe that are those that don't realize the complexity and the extraordinary amount of specifics, special cases, that are baked into the system, and any attempt to just rebuild from scratch in a few months hits that wall and ends up taking years.

Anyone who doesn't understand what's so difficult should read this:

https://wiki.c2.com/?WhyIsPayrollHard

Its from a different domain, but it gives you a flavour of the headaches you encounter. These systems always look simple from the outside, but once you get inside you find endless reams of interrelated and arbitrary business rules that have accumulated. There is probably no complete specification (unless you count the accumulated legal, regulatory and procedural history of the DVLA), and the old code will have little or no accurate documentation (if you are lucky there will be comments).

  • Basically this. The people running the show would desperately like to make it simpler, but ultimately it’s left overly complicated due to priorities from past leadership well above our paygrade.

    The right solution is always to just rip off the bandaid and do it again by hand in a new language or platform, and to eliminate useless complexity while doing so. Unfortunately no leader would ever do this because the Board and/or Shareholders would crucify them for not outsourcing it to McKinsey first and using the fancy-pants automation tool their report recommended.

    • There are a few shareholder-friendly patterns to get this done, but it is domain-specific. I’d say it’s more “rip off the bandaid slowly and carefully”.

      Eg a common one is to wrap a new no-op new service around the old one, and gradually replace parts of the old one (the “strangler fig pattern”).

      This is technically great, but it’s also financially great because you are don’t spending large sums on a big-bang rewrite. You’re spending relatively small sums on a “pay as you go” basis, something board members and shareholders do like.

      But of course this depends on how your systems are set up.

    • Well, that, and any organization that has gotten themselves into this situation tend to have a very strong risk aversion principal. Which means they _can't_ approve something like this organisationally since there is simply too much risk embedded, and someone has to accept that.

The code will be spaghettified and hideous. The queries will be nonsense.

That doesn't change the fact that the ultimate goal of the system is to manage drivers licenses.

> In my experience you can't rebuild such a thing "in 3 months".

Me and my team rebuilt the core stack for the central bank of a developing country. In 3 months. The tech started in the 70s just like this. Think bigger.

  • Good for you, it means either your system was sufficiently simple to be fully implemented in 3 months from scratch including all business rules, or you build a new system which left out some amount of rules from the original system without this posing a problem. I don't know much about how central banks work so that might be possible. But not all systems present those characteristics.

  • For some reason those comments always seem to imply that every business doesn't have these problems too.

    Every business has these problems. In most cases, the ones who don't change get swept away. The places that do not change are usually ones that can't go out of business. But every place has systems like this, you have to rebuild them, it isn't fun but there is no choice.

    A tiny system like the DVLA is complex, hilarious (this is the same place that has had to reduce service provision because some staff just stopped turning up for weeks after Covid, public-sector productivity in the UK is at the same level as 1997, to just get to the same level as the private sector...which isn't growing productivity very fast...you would need to fire ~2m workers, the total workforce in the UK is 30m btw).

    • I worked for an aircraft parts manufacturer, they closed an entire factory / production site rather than try and upgrade the manufacturing system or move the part production onto the new one they had implemented.

      500 people out of work. Tell me again how simple everything is to fix.

      1 reply →

  • > Think bigger.

    One of the easier parts of this involves addressing, which in the UK is notoriously easy, reliable, and easy to process - especially the best-in-class Ordnance Survey stuff like AddressBase Premium, right?

    A quick trawl of Github will shed some light on it - especially how much of a pain it is to get ABP into a usable state - and this is data that's core and integral to the service, the "are you a real user, a typo, a fraudster, a data supply issue, or getting things wrong in good or bad faith?" kind of business logic.

    And it's doubly hard, because the government requires people to update their license when they change address - which often enough involves a new-build property, where the address (let alone UPRN - sometimes even the USRN!) is completely new to you.

    Thinking bigger: imagine sitting at your desk during the first couple of weeks on the job, database validation checks running merrily in the background while you're staring at a screen. There's a mild frown forming on your face. You'd been scrolling over a list of rejected records in front of you, largely looking good - _how did they miss THAT fraud _ you'd briefly chuckled to yourself - but _this_ one...

    It's a valid business entity, trading from the valid address, and you've hand-checked both _and_ got a junior who lives nearby to send you a photo of it, and, well, the wit running the business has decided to trade under the name _FUCKOFFEE_, and... that's... just going to have to be someone else's problem, you shrug.

    (to be clear: the hard part of the DVLA project is _not_ implementing the coding, database, and systems design work)

    • You've sort of identified how to do it: break it up into problems.

      Addresses are hard? Use https://postcodes.io or make your own - that's a project in its own right.

      Separately out trading name from registered names needs to be an API from Companies House, or an internal service that API-ifies Companies House data.

      Fraud detection? That needs to sit somewhere - let's break out all the fraud detection into a separate system that can talk to the other systems, and have it running continuously over the data. It'll need people to update fraud queries and also to make sure the other systems' data stays integrated with it.

      Finally you need something on top that orchestrates the services and exposes them via a gov.uk website, and copes with things like "I don't have my address yet; can I use What3Words instead?" and another one with a UI and lots of RBAC and approvals for DVLA users to do lookups and internal admin.

      5 replies →

  • Yeah, I always raise an eyebrow at attitudes like that too.

    I've also reimplemented or gradually replaced several out-of-date systems. Albeit on a smaller scale.

    In my experience, when you start picking the programs apart you find 90% of the code is redundant or boilerplate. Much of it isn't even called from anywhere, abandoned code, and can be deleted en masse. A lot of programmers don't clean code up "just in case" and then no-one else deletes it.

    They can also often be vastly simplified because programmers back then didn't have the patterns and knowledge to write consisely.

    I often find myself simplifying the original code first, which gets rid of 50% of it. Then I can see what the code actually does and rewrite it which gets rid of the other 40%.

    On the other hand, many programmers don't have the patience, stubbornness or skill to do this kind of work.

    And the ability to get through the major panic you have when you're half way through and wondering if you were mad to even start.

    • > And the ability to get through the major panic you have when you're half way through and wondering if you were mad to even start.

      I feel seen, thank you.

> In my experience you can't rebuild such a thing "in 3 months". People who believe that are those that don't realize the complexity and the extraordinary amount of specifics, special cases, that are baked into the system, and any attempt to just rebuild from scratch in a few months hits that wall and ends up taking years.

Rebuilding a legacy system doesn't require you to support every single edge case that the older system did. It's okay to start off with some minor limitations and gradually add functionality to account for those edge cases.

Furthermore, you've got a huge advantage when remaking something: you can see all the edge cases from the start, and make an ideal design for that, rather than bolting on things as you go (which is done in the case of many of these legacy systems, where functionality was added over time with dirty code in lieu of refactoring).

  • > Rebuilding a legacy system doesn't require you to support every single edge case that the older system did.

    Depends on context.

    This isn't some social media fun site where you can live with some rough edges; in this context "edge case" may be someone with an health condition who is still entitled to a drivers license; or it could be someone who normally could get one but due to a health condition really shouldn't be allowed one!

  • This generally isn’t true in the case of government systems. For the most part they are performing tasks that are required by law, and it is not acceptable to stop some of them, even temporarily. If you’re lucky you can run the old and new systems side-by-side while the 100% feature migration occurs, but that isn’t always feasible.

    • Ya it's funny looking at all these 'business' programmers that if the application doesn't work can just loose the customer/money to another competitor. In regulated stuff you have to serve everyone. Much worse if your systems don't work there are potential consequences where people die and or there are riots in the street.

      1 reply →