Show HN: I wrote a Java decompiler in pure C language

4 days ago (github.com)

You've used GPL2 code taken from git (hashmap.c) in your Apache 2.0 project.

https://opensource.stackexchange.com/questions/10737/inclusi...

  • I'm curious, how did you notice this?

    Do you have a scanner that checks these sorts of things or is it something that you are passionate about?

    • By the following very short garden path:

      1. How silly to write such a thing in C from scratch. Such a project will invariably invent half of Lisp in order to have the right kind of infrastructure for doing this and that.

      2. Let's look for some of it up and down the tree. Oh look, there is a bitset and hashmap, see? I don't see test cases for these anywhere; is it original work from this project or battle-tested code taken from elsewhere?

      3. Open hashmap.c ...

      GPL violation found in half a minute.

      3 replies →

I am always curious how different C programs decide how to manage memory.

In this case there are is a custom string library. Functions returned owned heap-allocated strings.

However, I think there's a problem where static strings are used interchangably with heap-allocated strings, such as in the function `string class_simple_name(string full)` ( https://github.com/neocanable/garlic/blob/72357ddbcffdb75641... )

Sometimes it returns a static string like `g_str_int` and sometimes a newly heap-allocated string, such as returned by `class_type_array_name(g_str_int, depth)`.

Callers have no way to properly release the memory allocated by this function.

  • Many command line tools do not need memory management at all, at least to first approximation. Free nothing and let the os cleanup on process exit. Most libraries can either use an arena internally and copy any values that get returned to the user to the heap at boundaries or require the user to externally create and destroy the arena. This can be made ergonomic with one macro that injects an arena argument into function defs and another that replaces malloc by bumping the local arena data pointer that the prior macro injected.

    • That might be true, but leaking is neither the critical nor the most hard to find memory management issue, and good luck trying to adapt or even run valgrind with a codebase that mindlessly allocates and leaks everywhere.

      6 replies →

  • Interesting. Someone should come up with a language that prevents these sorts of mistakes!

  • In the same file:

      static bool is_java_identifier_start(char c)
      {
        return (isalpha(c) || c == '_' || c == '$');
      }
    

    Undefined behavior in isalpha if c happens to be negative (and not equal to EOF), like some UTF-8 byte.

    I think some <ctype.h> implementations are hardened against this issue, but not all.

  • > I am always curious how different C programs decide how to manage memory.

    At a basic level, you can create memory on the stack or on the heap. Obviously I will focus on the heap as that is dynamically allocating memory of a certain size.

    The C programming language does not force you how to handle memory. You are pretty much on your own. For some C programmers (and likely more inexperienced ones) they will malloc individual variables like they are creating a 'new' instance in a typical OOP language like Java. This can be a telltale sign of a programmer working with C that comes from an OOP background. As they learn and improve on their C skills they realise they should create a chunk of memory of a certain type, but could still be malloc(ing) and free(ing) all over the code, making it difficult to understand what is being used and where -- especially if you are looking at code you did not write.

    You can also have programs that do not bother free(ing) memory. For example, a simple shell program that just does simple input->process->output and terminates. For these types of programs, just let the OS deal with freeing the memory.

    Good C code (in my opinion) uses malloc and free in only a handful of functions. There are higher level functions for proper Allocators. One example is an Arena Allocator. Then if you want a function which may require dynamic memory, you can tell it which allocator to use. It gives you control, generally speaking. You can create a simple string library or builder with an allocator.

    Of course an Allocator does not have to use memory on the heap. It can still use on the stack as well.

    There are various other patterns to use in the world of memory, especially in C.

I don't think it's available in a standalone repo but it IS available as a standalone library, IntelliJ's FernFlower decompiler is the gold standard https://github.com/JetBrains/intellij-community/blob/master/... https://www.jetbrains.com/intellij-repository/releases

I guess there's some history there that I'm not familiar with because JBoss also has a FernFlower decompiler library https://mvnrepository.com/artifact/org.jboss.windup.decompil...

Very cool project! Love the idea of a Java decompiler written in C — the speed must be great.

Any plan to support `.dex` in the future? Also curious how you handle inner classes inside JARs.

Nice job! I don't know whether you know https://github.com/java-decompiler/jd-gui or not, but in case you haven't seen it before, maybe you could use it as a reference, since it's written in Java, for extra fun with your adventure?

By hand or with AI? Fascinating. So much work! What was your motivation for this?

  • 90% by hand, 10% AI. I do this for fun and to learn about jvm.

    • I think that sort of ratio is the sweet spot for learning. I've been writing an 8086 simulator in C++ and using an LLM for answering specific technical questions I come up with has drastically sped up my progress without it actually doing the work for me.

  • Irrelevant to me. People would never ask whether someone has created something looking at SO or not. If the thing works as advertised, good for them!

  • A great question to ask. We're in the middle of learning where AI can and can't be effective. Knowing where and how it's being used is quite useful.

I cannot help but wonder why starting a new project in C in 2025. It’s like driving a car with no seat belts. You sure you want to do that?

  • I moved from C++ to C and I am more productive. I also think this "no seat belts" meme is exaggerated, as there are plenty of tools and strategies to make C fairly safe to use. (it is true though that many people do not put the seat belts on).

  • In my experience, although many of the other programming languages do improve some things compared with C, they also make many things worse and avoid some of the benefits of C programming.

    • I can't recall anything in that sense regarding Modula-2 and Object Pascal, other than not bringing UNIX to the party.

  • I cannot help but wonder why I would learn a whole new language before even beginning to start a new project when I already know C. Though, generally speaking, I tend to use C++ for new projects -- usually depending on what libraries I'm using, if the lib is in C I use C and if the lib is in C++ I use C++. The current thing I'm working on is intended as a Python extension module and Python is written in C so...

    And, yes, I know it's trivial to interface the Python C-API with C++ and quite often better as the 'object model' is very similar but the underlying concept I wanted to explore (guaranteed tailcalls) isn't possible in C++ from what I can tell.

  • This is the best question for me. Writing these codes in C language is the best way to learn the file structure of jvm/dalvik/pe. This process makes me like C language more. For me, I think it is simple and pure, which is enough.

  • i only write in C. if id build a car it wouldnt have seatbelts. boring, put in ejector seats! not safe? no problem for C :).

    • ejector seats in C car?

      goto eject; ...more code we are going to ignore, it could be important but nah, ignore it, what could be happen?...

      eject: up_through_the_roof();

      :D

  • When debugging complex projects, the C language is more flexible and convenient to view data in memory.

  • It's thanks to people like you that rust is not more widely used, you actively make people avoid the rust cummunity because they will think everybody i like you!