Comment by mbh159
13 hours ago
The methodology debate in this thread is the most important part.
The commenter who says "add obfuscation and success drops to zero" is right but that's also the wrong approach imo. The experiment isn't claiming AI can defeat a competent attacker. It's asking whether AI agents can replicate what a skilled (RE) specialist does on an unobfuscated binary. That's a legitimate, deployable use case (internal audit, code review, legacy binary analysis) even if it doesn't cover adversarial-grade malware.
The more useful framing: what's the right threat model? If you're defending against script kiddies and automated tooling, AI-assisted RE might already be good enough. If you're defending against targeted attacks by people who know you're using AI detection, the bar is much higher and this test doesn't speak to it.
What would actually settle the "ready for production" question: run the same test with the weakest obfuscation that matters in real deployments (import hiding, string encoding), not adversarial-grade obfuscation. That's the boundary condition.
Why does that matter? Being oblivious to obfuscated binaries is like failing the captcha test.
Let's say instead of reversing, the job was to pick apples. Let's say an AI can pick all the apples in an orchard in normal weather conditions, but add overcast skies and success drops to zero. Is this, in your opinion, still a skilled apple picking specialist?
What if it’s 10x as fast during clear conditions? Then it doesn’t matter.
No hate. My only point is that’s it’s easy for analogies to fail. I can’t tell the point of either of your analogies, where the OP made several clear and cogent points.
Maybe not, but also maybe you would no longer need skilled apple picking specialists.