Comment by baq
9 hours ago
They miss but can self correct, this is the paradigm shift. You need a harness to unlock the potential and the harness is usually very buildable by LLMs, too.
9 hours ago
They miss but can self correct, this is the paradigm shift. You need a harness to unlock the potential and the harness is usually very buildable by LLMs, too.
Hm, that is a a lot of generic talk - but very little concrete data and examples.
Concrete examples are in your code just as they're in my employer's which I'm not at the liberty to share - but every little bit counts, starting from the simplest lints, typechecks, tests and going to more esoteric methods like model checkers. You're trying to get the probability of miss down with the initial context; then you want to minimize the probability of not catching a miss, then you want to maximize the probability of the model being able to fix a miss itself. Due to the multiplicative nature of the process the effect is that the pipeline rapidly jumps from 'doesn't work' to 'works well most of the time' and that is perceived as a step function by outsiders. Concrete examples are all over the place, they're just being laughed at (yesterday's post about 100% coverage was spot on even if it was an ad).