Jank is C++

1 day ago (jank-lang.org)

I'm not surprised to see that Jank's solution to this is to embed LLVM into their runtime. I really wish there was a better way to do this.

There are a lot of things I don't like about C++, and close to the top of the list is the lack of standardization for name-mangling, or even a way mangle or de-mangle names at compile-time. Sepples is a royal pain in the ass to target for a dynamic FFI because of that. It would be really nice to have some way to get symbol names and calling semantics as constexpr const char* and not have to deal with generating (or writing) a ton of boilerplate and extern "C" blocks.

It's absolutely possible, but it's not low-hanging fruit so the standards committee will never put it in. Just like they'll never add a standardized equivalent for alloca/VLAs. We're not allowed to have basic, useful things. Only more ways to abuse type deduction. Will C++26 finally give us constexpr dynamic allocations? Will compilers ever actually implement one of the three (3) compile-time reflection standards? Stay tuned to find out!

  • Carmack did very much almost exactly the same with the Trinity / Quake3 Engine: IIRC it was LCC, maybe tcc, one of the C compilers you can actually understand totally as an individual.

    He compiled C with some builtins for syscalls, and then translated that to his own stack machine. But, he also had a target for native DLLs, so same safe syscall interface, but they can segv so you have to trust them.

    Crazy to think that in one computer program (that still reads better than high-concept FAANG C++ from elite lehends, truly unique) this wasn't even the most dramatic innovation. It was the third* most dramatic revolution in one program.

    If you're into this stuff, call in sick and read the plan files all day. Gives me googebumps.

    • Carmack actually deserves the moniker of 10x engineer. Truly his work in his domain has reached far outside it because id the quality of his ideas and methodologies

      7 replies →

    • Linking directly to C++ is truly hell just considering symbol mangling. The syntax <-> semantics relationship is ghastly. I haven't seen a single project tackle the C++ interface in its entirety (outside of clang). It nearly seems impossible.

      There's a reason Carmack tackled the C abi and not whatever the C++ equivalent is.

      10 replies →

  • I hear you when it comes to C++ portability, ABI, and standards. I'm not sure what you would imagine jank using if not for LLVM, though.

    Clojure uses the JVM, jank uses LLVM. I imagine we'd need _something_ to handle the JIT runtime, as well as jank's compiler back-end (for IR optimization and target codegen). If it's not LLVM, jank would embed something else.

    Having to build both of these things myself would make an already gargantuan project insurmountable.

  • > the lack of standardization for name-mangling, or even a way mangle or de-mangle names at compile-time.

    Like many things, this isn't a C++ problem. There is a standard and almost every target uses it ... and then there's what Microsoft does. Only if you have to deal with the latter is there a problem.

    Now, standards do evolve, and this does give room for different system libraries/tools to have a different view of what is acceptable/correct (I still have nightmares of trying to work through `I...E` vs `J...E` errors) ... but all the functionality does exist and work well if you aren't on the bleeding edge (fortunately, C++11 provided the bits that are truly essential; everything since has been merely nice-to-have).

    • Like many things people claim "isn't a C++ problem but an implementation problem"... This is a C++ problem. Anything that's not nailed down by the standard should be expected to vary between implementations.

      The fact that the standard doesn't specify a name mangling scheme leads to the completely predictable result that different implementations use different name mangling schemes.

      The fact that the standard doesn't specify a mechanism to mangle and demangle names (be it at runtime or at compile time) leads to the completely predictable result that different implementations provide different mechanisms to mangle and demangle names, and that some implementations don't provide such a mechanism.

      These issues could, and should, have been fixed in the only place they can be fixed -- the standard. ISO is the mechanism through which different implementation vendors collaborate and find common solutions to problems.

      9 replies →

    • > There is a standard and almost every target uses it ... and then there's what Microsoft does. Only if you have to deal with the latter is there a problem.

      Sounds like there isn't a standard, then.

      1 reply →

  • > the lack of standardization for name-mangling

    I don't see the point of standardizing name mangling. Imagine there is a standard, now you need to standardize the memory layout of every single class found in the standard library. Without that, instead of failing at link-time, your hypothetical program would break in ugly ways while running because eg two functions that invoke one other have differing opinions about where exactly the length of a std::string can be found in the memory.

    • The naive way wouldn't be any different than what it's like to dynamically load sepples binaries right now.

      The real way, and the way befitting the role of the standards committee is actually putting effort into standardizing a way to talk to and understand the interfaces and structure of a C++ binary at load-time. That's exactly what linking is for. It should be the responsibility of the software using the FFI to move it's own code around and adjust it to conform with information provided by the main program as part of the dynamic linking/loading process... which is already what it's doing. You can mitigate a lot of the edge cases by making interaction outside of this standard interface as undefined behavior.

      The canonical way to do your example is to get the address of std::string::length() and ask how to appropriately call it (to pass "this, for example.)

    • This standard already exists, it's called the ABI and the reason the STL can't evolve past 90s standards in data structures is because breaking it would cause immeasurable (read: quite measurable) harm

      Like, for fuck's sake, we're using red/black trees for hash maps, in std - just because thou shalt not break thy ABI

      3 replies →

  • I would think name mangling is out of scope for a programming language definition, more so for C and C++, which target running on anything under the sun, including systems that do not have libraries, do not have the concept of shared libraries or do not have access to function names at runtime.

    > It would be really nice to have some way to get symbol names and calling semantics

    Again, I think that’s out of scope for a programming language. Also, is it even possible to have a way to describe low level calling semantics for any CPU in a way such that a program can use that info? The target CPU may not have registers or may not have a stack, may have multiple types of memory, may have segmented memory, etc.

  • > embed LLVM into their runtime

    That comically reads like "embed a blue whale into your hammock".

  • > LLVM into their runtime

    they're not embedding LLVM - they're embedding clang. if you look at my comment below, you'll see LLVM is not currently sufficient.

    > [C++] is a royal pain in the ass to target for a dynamic FFI because of that

    name mangling is by the easiest part of cpp FFI - the hard part is the rest of the ABI. anyone curious can start here

    https://github.com/rust-lang/rust-bindgen/issues/778

    • To be fair, jank embeds both Clang and LLVM. We use Clang for C++ interop and JIT C++ compilation. We use LLVM for IR generation and jank's compiler back-end.

    • > they're not embedding LLVM - they're embedding clang

      They're embedding both, according to the article. But it's also just sloppy semantics on my part; when I say LLVM, I don't make a distinction of the frontend or any other part of it. I'm fully relying on context to include all relevant bits of software being used. In the same way I might use "Windows" to refer to any part of the Windows operating system like dwm.exe, explorer.exe, command.com, ps.exe, etc. LLVM a generic catch-all for me, I don't say "LLI" I say "the LLVM VM", for example. I can't really consider clang to be distinct from that ecosystem, though I know it's a discrete piece of software.

      > name mangling is by the easiest part of cpp FFI

      And it still requires a lot of work, and increases in effort when you have multiple compilers, and if you're on a tiny code team that's already understaffed, it's not really something you can worry about.

      https://en.m.wikiversity.org/wiki/Visual_C%2B%2B_name_mangli...

      You're right, writing platform specific code to handle this is more than possible. But it takes manhours that might just be better spent elsewhere. And that's before we get to the part where embedding a C++ compiler is extremely inappropriate when you just want a symbol name and an ABI.

      But this is besides the point: The fact that it's not a problem solved by the gargantuan standard is awful. I also consider the ABI to be the exact same issue, that being absolutely awful support of runtime code loading, linking and interoperation. There's also no real reason for it, other than the standards committee being incompetent.

  • > de-mangle names at compile-time

    Far from being standardized but it's possible today on GCC and Clang. You just abuse __PRETTY_FUNCTION__.

    • That's not demangling a mangled name, it's retrieving the unmangled name of a symbol.

So, if I import a C++ library in Jank, all the internal memory allocations/deallocation will go through Jank's GC? How is this implemented? And what if a C++ library relies on some 3rd party allocator library?

Recently I tried D lang and was surprise with the nice interop with C++ (the language in general feels pretty good), Carbon is nowhere to be seen and havent tried Swift's yet. I hope this is a good one.

That's great! Interop with C++ is such a complex task. Congratss on your work! It's definitely not an easy thing.

I've always wondered what is the best way to interact with C++ template instantiation while keeping performance.

For a static language, you'd probably need to translate your types to C++ during compilation, ask Clang/GCC/MSVC to compile the generated C++ file, and then link the final result.

And finally, pray to the computer gods that name mangiling was done right.

A long long time ago, at ClojureConj 2014, I asked Rich Hickey whether a cpp-based clojure was possible, and his answer was "well, the primary impediment there is a lack of a garbage collector". There were a lot of conversations going on at the same time, so I didn't get an opportunity to "delve" into it, but:

1. does that objection make sense? 2. How does jank approach that hurdle.

  • A GC is nowhere near the most difficult part of this. In 2014, there was no viable technology for JIT compiling C++, and very little technology for JIT compiling native code in general.

  • Jank likely uses a combination of LLVM's garbage collection support (GC intrinsics) and smart pointers, similar to how Clasp implemented GC for Common Lisp on C++.

  • It's the first section in the article -

    "I have implemented manual memory management via cpp/new and cpp/delete. This uses jank's GC allocator (currently bdwgc), rather than malloc, so using cpp/delete isn't generally needed. However, if cpp/delete is used then memory collection can be eager and more deterministic.

    The implementation has full bdwgc support for destructors as well, so both manual deletion and automatic collection will trigger non-trivial destructors."

  • In the artical - it garbage collects always but if you call delete the garbage collector will be more agressive about cleaning that up.

Neat project, I can only marvel at your ability to deal with such madness. But it would be nice to have better C++ interop in higher level languages, there's some useful C++ code out there. I also appreciate the brief mention of Clasp, as I was immediately thinking of it as I was reading through.

Cool stuff for sure, I've been brainstorming making a language that has some of the same characteristics as Jank. I'm jelly that you took the opportunity to work full time on this for a year, wish I could do the same!

Clojure syntax (or clojure like, whatever this is) is easily the worst I’ve ever seen. (How [do you guys] (live) like this) it’s just awful. One could say it’s jack

I used Clojure back in the day and use Nim at work these days. Linking in to C is trivially easy in Nim. Happy to see this working for jank, but C++ is...such a nightmare target.

Any chance of Jank eventually settling on reference counting? It checks so many boxes in my book: Simple, predictable, few edge cases, fast. I guess it really just depends on how much jank programs thrash memory, I remember Clojure having a lot of background churn.

  • I started with reference counting, but the amount of garbage Clojure programs churn out ends up bogging everything down unless a GC is used. jank's GC will change, going forward, and I want jank to grow to support optional affine typing, but the Clojure base is likely always going to be garbage collected.

    • For a novice, could you elaborate the difference that GC does? Naively, it seems like the only difference would be whether you pay the deallocation fee immediately or later on.

      Is there less of a problem when done in bulk if the volume of trash to collect is high enough?

      1 reply →

i commented on reddit (and got promptly downvoted) but since i think jank's author is around here (and hopefully is receptive to constructive criticism): the CppInterOp approach to cpp interop is completely janky (no pun intended). the approach literally string munges cpp and then parses/interprets it to emit ABI compliant calls. there's no reason to do this except that libclang currently doesn't support any other way. that's not jank's fault but it could be "fixed" in libclang. at a minimum you could use https://github.com/llvm/llvm-project/blob/main/clang/lib/Cod... to emit the code based on clang ast. at a maximum would be to use something like

https://github.com/Mr-Anyone/abi

or this if/when it comes to fruition

https://discourse.llvm.org/t/llvm-introduce-an-abi-lowering-...

to generate ABI compliant calls/etc for cpp libs.

note, i say all this with maximum love in my heart for a language that would have first class cpp interop - i would immediately become jank's biggest proponent/user if its cpp interop were robust.

EDIT: for people wanting/needing receipts, you can skim through https://github.com/compiler-research/CppInterOp/blob/main/li...

  • Hey! I'm here and receptive.

    I completely agree that Clang could solve this by actually supporting my use case. Unfortunately, Clang is very much designed for standalone AOT compilation, not intertwined with another IR generating mechanism. Furthermore, Clang struggles to handle some errors gracefully which can get it into a bad state.

    I have grown jank's fork of CppInterOp quite significantly, in the past quarter, with the full change list being here: https://gist.github.com/jeaye/f6517e52f1b2331d294caed70119f1... Hoping to get all of this upstreamed, but it's a lot of work that is not high priority for me right now.

    I think, based on my experience in the guts of CppInterOp, that the largest issue is not the C++ code generation. Basically any code generation is some form of string building. You linked to a part of CppInterOp which is constructing C++ functions. What's _actually_ wrong with that, in terms of robustness? The strings are generated not based on arbitrary user input, but based on Clang QualTypes and Decls. i.e. you need valid Clang values to actually get there anyway. Given that the ABI situation is an absolute mess, and that jank is already using Clang's JIT C++ compiler, I think this is a very viable solution.

    However, in terms of robustness, I go back to Clang's error handling, lack of grace, and poor tooling for use cases like this. Based on my experience, _that_ is what will cause robustness issues.

    Please don't take my response as unreceptive or defensive. I really do appreciate the discussion and if I'm saying something wrong, or if you want to explain further, please do. For alternatives, you linked to https://github.com/Mr-Anyone/abi which is 3 months old and has 0 stars (and so I assume 0 users and 0 years of battle testing). You also linked to https://discourse.llvm.org/t/llvm-introduce-an-abi-lowering-... which I agree would be great, _if/when it becomes available_.

    So, out of all of the options, I'll ask clearly and sincerely: is there really a _better_ option which exists today?

    CppInterOp is an implementation detail of jank. If we can replace C++ string generation with more IR generation and a portable ABI mechanism, _and_ if Clang can provide the sufficient libraries to make it so that I don't need to rely on C++ strings to be certain that my template specializations get the correct instantiation, I am definitely open to replacing CppInterOp. From all I've seen, we're not there yet.

  • > the CppInterOp approach to cpp interop is completely janky (no pun intended). the approach literally string munges cpp and then parses/interprets it to emit ABI compliant calls.

    So, I agree that this sounds janky as heck. My question is: besides sounding janky as heck, is there something wrong with this? Is it slow/unreliable?

    • i mean it's as prone to error as any other thing that relies on string munging. it's probably not that much slower than the alternative i proposed - because the trampolines/wrappers are jitted and then reused - but it's just not robust enough that i would ever imagine building a prod system on top of it (eg using cppyy in prod) let alone baking it into my language/runtime.

      7 replies →

These recursive initialism PL names are getting out of hand /s

  • I've pondered this for a while and I have no idea how jank is a recursive acronym. What're you seeing that I'm not?

    • It’s a joke (hence the “/s”) on the “[PL name] is [words beginning with the rest of the letters of the Pl name]” snowclone. However as time approaches infinity I’m sure it will get a recursive backronym.