Comment by whatever1

11 hours ago

If you have a solid test environment that would allow for an agent to check if it is right or wrong, I encourage you to do the experiment.

Put the agent on the wheel and observe it as it tries ruthlessly to pass the test. These days, likely it will manage to pass the tests after 3-5 loops, which I find fascinating.

Close the loop, and try an LLM. You will be surprised.