← Back to context

Comment by vasco

2 days ago

The pragmatic answer is Jenkins. Always has been.

Jenkins is a place where you can be safe for a long time, however, it starts to break down at scale. I see it time after time for these batch workflow jobs. At the start, jobs run in seconds and everyone is happy.

Over time, jobs start taking long enough to the point where you need to split them. Separate jobs are assigned slices of the original batch. Eventually, there are so many slices that you make a Jenkins job where the sole responsibility is firing off these individual jobs.

Then you start hitting the real painpoints in Jenkins. Poor allocation of jobs across your nodes/agents, often overloading CPU/Mem on machines, and you struggle to manage the ungodly interface that is the Jenkins REST endpoint. You install many Jenkins addons to try and address the scheduling problems, and end up with a team dedicated to managing this Jenkins infrastructure.

The scaling struggles continue to amass and you end up needing separate Jenkins instances to battle the load. Any attempt at replacing the Jenkins infrastructure goes on standstill, as the amount of random scripts found in Jenkinsfiles has created an insurmountable vendor lock-in.

You read a post about a select-for-update job scheduler and reflect on simpler times. You cry as you refactor your Jenkins Groovy DSL.

  • it’s actually much more common than you think for people to reuse CI systems for cron tasking.

    It’s always a mistake, but it’s easy in the moment and sticks around longer than I’d like.

    • CI systems like Jenkins are there and they're corp-approved.

      Getting a weird 3rd party scheduling system with access to internal stuff approved is HARD in big corps.

      So we (ab)use the CI system we have. It has scheduling and it already accesses internal resources.

  • What's the thing you should replace Jenkins with at scale?

    • Im a firm believer that there will never be a perfect general purpose job scheduler. The priority for how jobs are scheduled is always deeply coupled to your business needs. General purpose schedulers always end up as a jack of all trades but master of none. With a custom built scheduler you get that control, but do have to re-invent the wheel for a lot of features. Jenkins, Argo, Airflow, Cron, etc, all have their own pros and cons.

Ugh no. It was good enough for its time, but times have moved on.

The danger is that it's so easy to start and it's decent for small and simple applications. Once your jobs start growing, both in number of contributors and in workload, the problems start. DSL is difficult to debug, plugins are buggy and the brittle master node will become your most precious pet that need constant supervising to not grind the whole system to a stop. By the time you realize this you have a hard time to get out of this lockin.

Jenkins is terrible for just about everything. Cron has real problems but at least you can version control the crontab. Jenkins is fat, hard to work with since you'll just have one shared instance, and everything is burred in special objects hidden behind a very unergonomic and undiscoverable web GUI.