← Back to context

Comment by raesene9

3 hours ago

Yeah they definitely can be true (IME), as there's a massive difference depending on how LLMs are used to the quality of the output.

For example if you just ask an LLM in a browser with no tool use to "find a vulnerability in this program", it'll likely give you something but it is very likely to be hallucinated or irrelevant.

However if you use the same LLM model via an agent, and provide it with concrete guidance on how to test its success, and the environment needed to prove that success, you are much more likely to get a good result.

It's like with Claude code, if you don't provide a test environment it will often make mistakes in the coding and tell you all is well, but if you provide a testing loop it'll iterate till it actually works.