Comment by uhx

2 months ago

Thank you for clarification. It actually helped: at first I was overcomplicating it in my head.

After thinking about it for an hour I came up with this:

LLM claims that there is a bug. We dont know whether it really exist. We run a second LLM that is capable to write unit-tests/reproducer (dont have to be E2E, shorter data flow -> bigger success rate for LLM), compile program and run the test for ASAN assert. ASAN error means proven bug. No error, as you said, does not prove anything, because it may simply mean LLM failed to write a correct test.

Still don't know how much $ it would cost for LLM reasoning, but this technically should work much better than manually investigating everything.