Comment by pron

4 hours ago

> The code must pass property-based tests

Who writes the tests? It can be ok to trust code that passes tests if you can trust the tests.

There are, however, other problems. I frequently see agents write code that's functionally correct but that they won't be able to evolve for long. That's also what happened with Anthropic's failed attempt to have agents write a C compiler (not a trivial task, but far from an exceptionally difficult one). They had thousands of good human-written tests, but the agents couldn't get the software to converge. They fixed one bug only to create another.

1 comment

pron

lielcohen 21 minutes ago

The "who writes the tests" thing is spot on. i ran into this exact problem - agent writes code, then writes tests for it, and surprise, tests pass because it's testing what it built, not what it should have built. same blind spots everywhere. the anthropic c compiler thing makes sense too. i've seen this where the agent keeps fixing stuff and each fix just makes the next bug weirder because it's dragging all its previous failed attempts in the context. just going in circles. what helped me was keeping the test-writing agent totally separate from the coder - different model, clean context, only sees the spec. and for the fix loops, wiping the context between attempts instead of letting it pile up. not a magic fix but it stops the tail-chasing thing. the evolvability point though, no idea how to solve that honestly. that feels like a fundamentally different problem.