Comment by cowboy_henk

5 hours ago

This only holds if you understand what's in the tests, and the tests are realistic. The moment you let the LLM write the tests without understanding them, you may as well just let it write the code directly.

> The moment you let the LLM write the tests without understanding them, you may as well just let it write the code directly.

I disagree. Having tests (even if the LLM wrote them itself!) gives the model some grounding, and exposes some of its inconsistencies. LLMs are not logically-omniscient; they can "change their minds" (next-token probabilities) when confronted with evidence (e.g. test failure messages). Chain-of-thought allows more computation to happen; but it doesn't give the model any extra evidence (i.e. Shannon information; outcomes that are surprising, given its prior probabilities).

I disagree to some degree. Tests have value even beyond whether they test the right thing. At the very least they show something worked and now doesnt work or vice versa. That has value in itself.