Comment by VonGuard
3 days ago
I think this is popping up in Hacker News because the concept of decompilers has become a bit more acceptable recently. (strokes beard)Time was, decompilation was said to be Impossible (as my wise friend syke said: most things people say are impossible are just tedious). Then, it just became "something you could only do in a targeted, single-application fashion.)
Somewhere in there, Alan Kaye laughed and handed everyone dynamic code.
These days, with AI in tow, decompilation is becoming the sort of thing that could be in the toolchain, replacing IDA and such. Why debug and examine when you can literally decompile?!
So, maybe, that idea being considered to be newly on the table, someone felt the need to post a counter-point, proving once again that everything old is new again.
Hats off for decomiling Java apps that mostly predate generics and annotations... both of which were added in 5.
I'm not sure you lived the same history I did. Decompiling for intermediate languages has always been a thing. Hell, back in college as an intern I was looking at the assembly of a decompiled C# binary, and back in highschool using intellij's Java decompiler to poke at some game applets to see if there we hacking opportunities. This was back when ruinscape didn't have a paid version
Is there anything especially hard about decompiling (to) Java?
.NET/C# decompilers are widespread and generally work well (there is one built into Visual Studio nowdays, JetBrains have their own, there were a bunch of stand-alone tools too back in the the day).
< disclaimer - I wrote CFR, which is one of the original set of 'modern' java decompilers >
Generic erasure is a giant pain in the rear. C# doesn't do this. You don't actually keep any information about generics in the bytecode, however some of the metadata is present. BUT IT COULD BE FULL OF LIES.
There's also a huge amount of syntactic sugar in later java versions - take for example switch expressions.
https://www.benf.org/other/cfr/switch_expressions.html
and OH MY GOD FINALLY
https://www.benf.org/other/cfr/finally.html
>Generic erasure is a giant pain in the rear
Personally, I don't get the sentiment. Yeah, decompiling might not produce the original source code, which is fair. It's possible to generate code using invokeDynamic and what not - still being valid code if a compiler opts to do so.
When decomiling bytecode there has to be a reason for, and a good one. There has to be a goal.
If the code is somewhat humanly understandable that's ok. if it's more readable than just bytecode, that's already an improvement.
Reading bytecode alone is not hard when it comes to reverse engineering. Java already comes with methods and fields available by design. Having local variable names and line numbers preserved is very common, due to exception stack traces being an excellent debugging tool. Hence debugging info gets to be preserved.
try/finally shares the same issues, albeit less pronounced.
C# doesn't erase all generics; but there's also some type erasure happening: nullable reference types, tuple element names, and the object/dynamic distinction are all not present in .NET bytecode; these are only stored in attributes for public signatures, but are erased for local variable types.
C# also has huge amounts of syntactic sugar: `yield return` and `await` compile into huge state machines; `fixed` statements come with similar problems as "finally" in java (including the possibility of exponential code growth during decompilation).
You're awesome! I had really good experiences with CFR in the mid 2010s.
I used it for game modding and documentation (and caught/reported a few game bugs + vulnerabilities along the way). I'd pull game files from Steam depots with steamkit, decompile with CFR, and run the resulting java through doxygen.
My personal experience with both is that decompilers work great for easy code. I still have both Java and C# projects that I wish I decompiled even to worst possible, but almost compilable code. Instead getting just decompiler errors or code where all variables got the same letter/name and of course different types...
I think I've tried all available free tools and some paid in Java case. Finally I just deducted logic and reverse engineered the most important path.
One of the use case of décompilation is bug hunting / vulnerability research. And that’s still one of the use cases where AI isn’t that good because you must be precise.
I’m not saying that won’t change but I still see a bright future for reversing tools, with or without AI sidekicks (like the BN plugin)
I used codex 5.1 yesterday to point at a firmware blob and let it extract and explore it targeting a specific undisclosed vulnerability and it managed (after floundering for a bit) to read the Lua bytecode and identify and exploit the vuln on a device running the firmware.
Do you have a write up of what exactly happened, how trivial the vulnerability was?
If anything, vulnerability research should be good target for AI because failure to find an exploit isn't costly (and easily verified) but 1 in N success is very useful.
>Hats off for decomiling Java apps that mostly predate generics and annotations... both of which were added in 5.
the 1st very famous and good decompiler was written in C. Other than that generics and annotation didn't not make the work easier at all decmopilation wise
Is AI really useful in decompiling or does it just create similar code that does the same as the original?