Comment by abigail95

6 months ago

Do you know why the downtime window hasn't been decreasing over time as it gets deployed onto faster hardware over the years?

Nobody would care or notice if this thing had 99.5% availability and went read only for a few minutes per day.

10 comments

abigail95

roryirvine 6 months ago

Most likely because it's not just a single batch job, but a whole series which have been scheduled based on a rough estimate of how long the jobs around them will take.

For example, imagine it's 1997 and you're creating a job which produces a summary report based on the number of total number of cars registered, grouped by manufacturer and model.

Licensed car dealers can submit updates to the list of available models by uploading an EDIFACT file using FTP or AS1. Those uploads are processed nightly by a job which runs at 0247. You check the logs for the past year, and find that this usually takes less than 5 minutes to run, but has on two occasions taken closer to 20 minutes.

Since you want to have the updated list of models available before you run your summary job, you therefore schedule it to run at 0312 - leaving a gap of 25 minutes just in case. You document your reasoning as a comment in the production control file used to schedule this sequence of jobs.

Ten years later, and manufacturers can now upload using SFTP or AS2, and you start thinking about ditching EDIFACT altogether and providing a SOAP interface instead. In another ten years you switch off the FTP facility, but still accept EDIFACT uploads via AS2 as a courtesy to the one dealership that still does that.

Another eight years have passed. The job which ingests the updated model data is now a no-op and reliably runs in less than a millisecond every night. But your summary report is still scheduled for 0312.

And there might well be tens of thousands of jobs, each with hundreds of dependencies. Altering that schedule is going to be a major piece of work in itself.

pjc50 6 months ago

Maybe it isn't running on faster hardware? These systems are often horrifyingly outdated.

pwg 6 months ago

Or maybe it is running on faster hardware, but the UK budget office decided not to pay IBM's fees required to make use of the extra speed, so it has been "throttled" to run at the same speed that it ran on the old hardware.

kalleboo 6 months ago

Why would they spend the money to deploy it on faster hardware when the new cloud-based system rewrite is just around the corner? It's just 3 months way, this time, for sure...

mike_hearn 6 months ago

It doesn't get deployed onto faster hardware. Mainframes haven't really got faster.

ndriscoll 6 months ago

Mainframes have absolutely gotten faster. They're basically small supercomputers.
throw16180339 6 months ago

You're mistaken about this. IBM's z-series had 5GHz CPUs well over a decade ago and they haven't gotten any slower.
abigail95 6 months ago
It must be. Maintaining the original hardware would be more expensive that upgrading to compatible but faster systems.
- mike_hearn 6 months ago
  
  What compatible systems? Mainframes are maintained in more or less their original state by teams from IBM. They are designed to be single machines that scale vertically and never shut down, every component can be hot-swapped including CPUs but IBM charge a lot for CPU capacity if I recall correctly. Given that nighttime doesn't get shorter, the DVLA probably don't see much reason to pay a lot more for a slightly smaller window.
  And mainframes from the 80s are slow. It sounds like they're running on the original.
  
  1 reply →