← Back to context

Comment by ndriscoll

2 days ago

Getting rid of batch jobs shouldn't be a goal; batch processing is generally more efficient as things get amortized, caches get better hit ratios, etc.

What software engineers should understand is there's no reason a batch can't take 3 ms to process and run every 20 ms. "Batch" and "real-time" aren't antonyms. In a language/framework with promises and thread-safe queues it's easy to turn a real time API into a batch one, possibly giving an order of magnitude increase in throughput.

Batch size is usually fixed by the business problem in these scenarios, I doubt you can process them in 3msec if the job requires reading in every driving license in the country and doing some work on them for instance.

  • This particular thing might be difficult to change because it's 50 year old COBOL or whatever, but my point was more that I've encountered pushes from architects to "eliminate batches" and it makes no sense. It just means that now I have to re-batch things in my code. The correct way to think about it is that you want smaller, more frequent batches.

    Do they really need to do work on all records every night? Probably not. Most people aren't changing their license or vehicle info most days. So the problem is that somewhere they're (conceptually) doing a table scan instead of using an index. That might still be hard to fix, but at least identify the correct problem. Otherwise as you say moving to different tech won't fix it.