Comment by hansmayer
7 months ago
Sure man, maybe also share that bit with your clients and see how excited they'll be to learn their vital code or infrastructure may be designed by a stochastical system (*reliable a solid number of times).
7 months ago
Sure man, maybe also share that bit with your clients and see how excited they'll be to learn their vital code or infrastructure may be designed by a stochastical system (*reliable a solid number of times).
My clients are perfectly happy about that, because they care about the results, not FUD. They know the quality of what I deliver from first-hand experience.
Human-written code also needs reviews, and is also frequently broken until subjected to testing, iteration, and reviews, and so our processes are built around proper qa, and proper reviews, and then the original source does not matter much.
It's however a lot easier to force an LLM into a straighjacket of enforced linters, enforced test-suite runs, enforced sanity checks, enforced processes at a level that human developers would quit over, and so as we build out the harness around the AI code generation, we're seeing the quality of that code increase a lot faster than the quality delivered by human developers. It still doesn't beat a good senior developer, but it does often deliver code that handles tasks I could never hand to my juniors.
(In fact, the harness I'm forcing my AI generated code through was written about 95%+ by an LLM, iteratively, with its own code being forced through the verification steps with every new iteration after the first 100 lines of code or so)
So to summarise - the quality of code you generated with LLM is increasing a lot faster, but somehow never reaching senior level. How is that a lot faster? I mean if it never reaches the (fairly modest) goal. But that's not the end of it. Your mid-junior LLMs are also enforcing quality gates and harnesses on the rest of your LLM-mid-juniors. If only there was some proof for that, like a project demo, so it could at least look believable...
It's a lot faster compared to new developers who still cost magnitudes more from day 1. It's not cost effective to hand every task to someone senior. I still have juniors on teams because in the long term we still need actual people who need a path to becoming senior devs, but in financial terms they are now a drain.
You can feel free not to believe it, as I have no plans to open up my tooling anytime soon - though partly because I'm considering turning it into a service. In the meantime these tools are significantly improving the margins for my consulting, and the velocity increases steadily as every time we run into a problem we make the tooling revise its own system prompt or add additional checks to the harness it runs to avoid it next time.
A lot of it is very simple. E.g a lot of these tools can produce broken edits. They'll usually realise and fix them, but adding an edit tool that forces the code through syntax checks / linters for example saved a lot of pain. As does forcing regular test and coverage runs, not just on builds.
For one of my projects I now let this tooling edit without asking permission, and just answer yes/no to whether it can commit once it's ready. If no, I'll tell it why and review again when it thinks it's fixed things, but a majority of commit requests are now accepted on the first try.
For the same project I'm now also experimenting with asking the assistant to come up with a todo list of enhancements for it based on a high level goal, then work through it, with me just giving minor comments on the proposed list.
I'm vaguely tempted to let this assistant reload it's own modified code when tests pass and leave it to work on itself for a a while and see what comes of it. But I'd need to sandbox it first. It's already tried (and was stopped by a permissions check) to figure out how to restart itself to enable new functionality it had written, so it "understands" when it is working on itself.
But, by all means, you can choose to just treat this as fiction if it makes you feel better.
4 replies →