Comment by ChoGGi
2 days ago
I assume most of the complaints about the massive rewrite are in regards to AI, not Rust.
As lots of large and small companies have shown, test suites can only find what you test for. Vibe coded test suites can find?
2 days ago
I assume most of the complaints about the massive rewrite are in regards to AI, not Rust.
As lots of large and small companies have shown, test suites can only find what you test for. Vibe coded test suites can find?
On the whole, I think vibe coded test suites can be pretty good. But it really depends on how you prompt. I often get the AI to brainstorm needed tests into a text file while it works. Then later I get another agent to write tests based on the list.
It does a reasonable job. Its also pretty good at writing regression tests when it fixes a bug.
Where LLMs struggle - or at least where claude struggles - is fixing the actual bugs. Its very good at getting the test suite to pass. But it cheats. It'll sometimes disable a test, or do some hacky workaround that makes the test pass that doesn't fix the underlying issue. It'll say "All done, the tests pass". But sometimes you really wish they didn't.
I'm wondering if it might be better to set up 2 agents adversarially for bug hunting. Give one agent the goal of finding as many bugs as possible (via tests and other techniques). And another agent has the goal of fixing the bugs.
I find that adversarial multi agent setups eventually fall down because one side or the other always manages to convince the other side to give up given enough time.
I’ve tried all sorts of things to keep Claude from cheating, but the only one that works is to restrict access to the tests files, which obviously isn’t a real solution.
We recently had an “AI week” at work and I spent $1000 in tokens trying out different iterations of this.
What did you find works best?
I have the opposite experience. The library I develop depends significantly on external behavior, so point an AI at it and it will come up with tests like "test if a string is NULL" (even though I annotated it to be nonnull/a null-terminated string argument, but LLMs love ignoring my compiler annotations unless I tell them otherwise). So the "tests" end up being nonsensical ones like "test this range linear piecewise transformation; if you pass in x and round-trip it does it come back out as x?".
Disagree: I've played with vibe coding, and the tests are worse quality than the code, which makes sense and that's how real world tests tend to trend.
That was a snarky question, but I appreciate knowing where AI is when it comes to test suites. Let it write them, babysit the crap outta them when it comes to passing.