Comment by jacquesm

1 month ago

Interesting, so this is effectively 'guided closed loop' software development with the testset as the control.

It gives me a bit of a 'turtles all the way down' feeling because if the test set can be 'good' why couldn't the code be good as well?

I'm quite wary of all of this, as you've probably gathered by now: the idea that you can toss a bunch of 'pass' tests into a box and then generate code until all of the tests pass is effectively a form of fuzzing, you've got some thing that passes your test set, but it may do a lot more than just that and your test set is not going to be able to exhaustively enumerate the negative cases.

This could easily result in 'surprise functionality' that you did not anticipate during the specification phase. The only way to deal with that then is to audit the generated code, which I presume would then be farmed out to yet another LLM.

This all places a very high degree of trust into a chain of untrusted components and that doesn't sit quite right with me. It probably means my understanding of this stuff is still off.

2 comments

jacquesm

fooker 1 month ago

You are right.

What you are missing is that the thing driving this untrusted pile of hacks keep getting better at a rapid pace.

So much that the quality of the output is passable now, mimicking man-years of software engineering in a matter of hours.

If you don’t believe me, pick a project that you have always wanted to build from scratch and let cursor/claude code have a go at it. You get to make the key decisions, but the quality of work is pretty good now, so much that you don’t really have to double check much.

jacquesm 1 month ago

Thank you, I will try that and see where it leads. This all suggests a massive downward adjustment for any capitalized software is on the menu.