Comment by pron

2 months ago

If you carefully review the code then you're not doing what Armstrong was talking about. If you're not reviewing the code, then you don't really know what it is that the AI built. Of course it passes tests; that's not the problem. The problem is that the code is complicated and obtuse, even if it doesn't seem that way on the surface, and after some rounds of evolution, the agents are no longer able to evolve or maintain the code.

The difference between it's working now and it will continue working in two years is exactly the problem with AI-generated code because the tests can't tell you that, and you don't know which one you have if you don't look really carefully.

1 comment

pron

aprilthird2021 1 month ago

I was pretty clear that we did not review all the code, and we have kept the original service on standby exactly because we are aware code is complicated and could have obscure failure modes while passing our whole test suite.