Comment by jjmarr
13 hours ago
I'm hosting a Kubernetes cluster on Azure and trying to autoscale it to tens of thousands of vCPUs. The goal is to transparently replace dedicated developer workstations (edit: transparently replace compiling) because our codebase is really big and we've hired enough people this is viable.
edit: to clarify, I'm using recc which wraps the compiler commands like distcc or ccache. It doesn't require developers to give up their workspace.
Right now I'm using buildbarn. Originally, I used sccache but there's a hard cap on parallel jobs.
In terms of how LLMs help, they got me through all the gruntwork of writing jsonnet and dockerfiles. I have barely touched that syntax before so having AI churn it out was helpful to driving towards the proof of concept. Otherwise I'd be looking up "how do I copy a file into my Docker container".
AI also meant I didn't have to spend a lot of time evaluating competing solutions. I got sccache working in a day and when it didn't scale I threw away all that work and started over.
In terms of where the LLM fell short, it constantly lies to me. For example, it mounted the host filesystem into the docker image so it could get access to the toolchains instead of making the docker images self-contained like it said it would.
It also kept trying to not to the work, e.g. It randomly decides in the thinking tokens "let's fall back to a local caching solution since the distributed option didn't work" then spams me with checkmark emojis and claims in the chat message the distributed solution is complete.
A decent amount of it is slop, to be honest, but an 80% working solution means I am getting more money and resources to turn this into a real initiative. At which point I'll rewrite the code again but I'll pay closer attention now that I know docker better.
Isn't there a less convoluted way of making the best engineers leave? I am half serious here. If you want your software to run slow, IT could equally well install corporate security software on developer laptops. Oops, I did it again. Oh well, in all seriousness, I have never seen any performance problem being solved by running it on Azure's virtualization. I am afraid you are replacing the hardware layer by a software layer with ungodly complexity, which you are sure of will be functionally incomplete.
Are you sure they don't have to fix the build pipeline first? Tens of thousands of vCPUs for a single compilation run, or to accommodate 100 developers who try to compile their own changes?
> I have never seen any performance problem being solved by running it on Azure's virtualization
Sorry, I wasn't clear. I am not virtualizing the workspace. I'm using `recc` which is like `distcc` or `ccache` in that it wraps the compiler job. Every developer keeps their workstation. It just routes the actual `clang` or `gcc` calls to a Kubernetes cluster which provides distributed build and cache.
> Isn't there a less convoluted way of making the best engineers leave?
We have 7000+ compiler jobs in a clean build because it is a big codebase. People are waiting hours for CI.
I'm sure that drives attrition and bringing that down to minutes will help retain talent.
> Tens of thousands of vCPUs for a single compilation run, or to accommodate 100 developers who try to compile their own changes?
Because it uses remote execution, it will ideally do both. My belief is that an individual developer launching 6000 compiler jobs because they changed a header will smooth out over 300 developers that generally do incremental builds. Likewise, this'll eliminate redundant recompilation when git pulling since this also serves as a cache.
This makes absolutely no sense to me. Are you really recompiling 6000 things each time a dev in the company needs to add a line somewhere in the codebase? Have you thought about splitting that giant thing in smaller chunks?
2 replies →
You seem exceptionally bright. Most people are not like this. This is why they are struggling.
It sounds like you have a job, right out of college, but you're griping about not getting promoted faster. People generally don't get promoted 9 months into a job.
I'm reading your post and I am genuinely impressed but what you claim to have done. At the same time I am confused about what you would like to achieve within the first year of your professional career. You seem to be doing quite well, even in this challenging environment.
> At the same time I am confused about what you would like to achieve within the first year of your professional career.
I am in great fear of ending up on the wrong side of the K shaped recovery.
Everyone is telling me I need to be exceptional or unemployed because the middle won't exist in 2 years.
I want to secure the résumé that gives me the highest possibility of retaining emoloyment if there's a sudden AI layoff tomorrow. A fast career trajectory catches HR's eye even if they don't understand the technicals.