Comment by MortyWaves

5 days ago

It’s why I’ve started making CI simply a script that I can run locally or on GitHub Actions etc.

Then the CI just becomes a bit of yaml that runs my script.

How does that script handle pushing to ghcr, or pulling an artifact from a previous stage for testing?

In my experience these are the bits that fail all the time, and are the most important parts of CI once you go beyond it taking 20/30 seconds to build.

A clean build in an ephemeral VM of my project would take about 6 hours on a 16 core machine with 64GB RAM.

  • Sheesh. I've got a multimillion line modern C++ protect that consists of a large number of dylibs and a few hundred delivered apps. A completely cache-free build is an only few minutes. Incremental and clean (cached) builds are seconds, or hundreds of milliseconds.

    It sounds like you've got hundreds of millions of lines of code! (Maybe a billion!?) How do you manage that?

    • It’s a few million lines of c++ combined with content pipelines. Shader compilation is expensive and the tooling is horrible.

      Our cached builds on CI are 20 minutes from submit to running on steam which is ok. We also build with MSVC so none of the normal ccache stuff works for us, which is super frustrating

      2 replies →

    • I have 15 million lines of C++, and builds are several hours. We split into multi-repo (for other reasons) and that helps because compiling is memory bandwidth limited - on the CI system by we can split the different repos to different CI nodes.

  • To be honest I haven’t really thought about it and it’s definitely something it can’t do, you’d probably need to call their APIs or something.

    I am fortunate in that the only things I want to reuse is package manager caches.

    • That’s fair, but surely you must see that’s a very simple build.

      The complicated part comes when you have job A that builds and Job B that deploys - they run on two different machine specs so you’re not paying for a 16 core machine to wait for helm apply to wait for 5 minutes - they need somewhere secure to shuffle that artifact around. Their access to that service is likely different to your local access to that service, so you run your build locally and it’s fine but then the build machine doesn’t have write access to the new path you’ve just tested and it fails.

      90% of the time these are where I see CI failures

You must be very lucky to be in a position where you know what needs to be done before the run begins. Not everyone is in that position.

At my place, we have ~400 wall hours of testing, and my run begins by figuring out what tests should be running and what can be skipped. This depends on many factors, and the calculation of the plan already involves talking to many external systems. Once we have figured out a plan for the tests, we can understand the plan for the build. Only then we can build, and test afterwards. I haven't been able to express all of that in "a bit of yaml" so far.

Are you not worried about parallelisation in your case? Or have you solved that in another way (one big beefy build machine maybe?)

  • Honestly not really… sure it might not be as fast but the ability to know I can debug it and build it exactly the same way locally is worth the performance hit. It probably helps I don’t write C++, so builds are not a multi day event!