← Back to context

Comment by 7777332215

19 hours ago

I know they said they didn't obfuscate anything, but if you hide imports/symbols and obfuscate strings, which is the bare minimum for any competent attacker, the success rate will immediately drop to zero.

This is detecting the pattern of an anomaly in language associated with malicious activity, which is not impressive for an LLM.

One of the authors here.

The tasks here are entry level. So we are impressed that some AI models are able to detect some patterns, while looking just at binary code. We didn't take it for granted.

For example, only a few models understand Ghidra and Radare2 tooling (Opus 4.5 and 4.6, Gemini 3 Pro, GLM 5) https://news.ycombinator.com/item?id=46846101.

There is a long way ahead from "OMG, AI can do that!" to an end-to-end solution.

When I was developing my ghidra-cli tool for LLMs to use, I was using crackmes as tests and it had no problem getting through obfuscation as long as it was prompted about it. In practice when reverse engineering real software it can sometimes spin in circles for a while until it finally notices that it's dealing with obfuscated code, but as long as you update your CLAUDE.md/whatever with its findings, it generally moves smoothly from then on.

  • Is it also possible that crackme solutions were already in the training data?

    • I used the latest submissions from sites like crackmes.ones which were days or weeks old to guard against that.

in the article they explicitly said they stripped symbols. If you look at the actual backdoors many are already minimal and quite obfuscated,

see:

- https://github.com/QuesmaOrg/BinaryAudit/blob/main/tasks/dns...

- https://github.com/QuesmaOrg/BinaryAudit/blob/main/tasks/dro...

  • The first one was probably found due to the reference to the string /bin/sh, which is a pretty obvious tell in this context.

    The second one is more impressive. I'd like to see the reasoning trace.

    • Reply to self: I managed to get their code running, since they seemingly haven’t published their trajectories. At least in my run (using Opus 4.6), it turns out that Claude is able to find the backdoored function because it’s literally the first function Claude checks.

      Before even looking at the binary, Claude announces it will“look at the authentication functions, especially password checking logic which is a common backdoor target.” It finds the password checking function (svr_auth_password) using strings. And that is the function they decided to backdoor.

      I’m experienced with reverse engineering but not experienced with these kinds of CTF-type challenges, so it didn’t occur to me that this function would be a stereotypical backdoor target…

      They have a different task (dropbear-brokenauth2-detect) which puts a backdoor in a different function, and zero agents were able to find that one.

      On the original task (dropbear-brokenauth-detect), in their runs, Claude reports the right function as backdoored 2 out of 3 times, but it also reports some function as backdoored 2 out of 2 times in the control experiment (dropbear-brokenauth-detect-negative), so it might just be getting lucky. The benchmark seemingly only checks whether the agent identifies which function is backdoored, not the specific nature of the backdoor. Since Claude guessed the right function in advance, it could hallucinate any backdoor and still pass.

      But I don’t want to underestimate Claude. My run is not finished yet. Once it’s finished, I’ll check whether it identified the right function and, if so, whether it actually found the backdoor.

      1 reply →

I've used Opus 4.5 and 4.6 to RE obfuscated malicious code with my own Ghidra plugin for Claude Code and it fully reverse engineered it. Granted, I'm talking about software cracks, not state-level backdoors.

I have seen LLMs be surprisingly effective at figuring out such oddities. After all it has ingested knowledge of a myriad of data formats, encryption schemes and obfuscation methods.

If anything, complex logic is what'll defeat an LLM. But a good model will also highlight such logic being intractable.

Stripping symbols is fairly normal, but hiding imports ought to be suspicious in its own right.