Comment by whatever1
11 hours ago
If you have a solid test environment that would allow for an agent to check if it is right or wrong, I encourage you to do the experiment.
Put the agent on the wheel and observe it as it tries ruthlessly to pass the test. These days, likely it will manage to pass the tests after 3-5 loops, which I find fascinating.
Close the loop, and try an LLM. You will be surprised.
No comments yet
Contribute on Hacker News ↗