← Back to context

Comment by almostgotcaught

2 days ago

i commented on reddit (and got promptly downvoted) but since i think jank's author is around here (and hopefully is receptive to constructive criticism): the CppInterOp approach to cpp interop is completely janky (no pun intended). the approach literally string munges cpp and then parses/interprets it to emit ABI compliant calls. there's no reason to do this except that libclang currently doesn't support any other way. that's not jank's fault but it could be "fixed" in libclang. at a minimum you could use https://github.com/llvm/llvm-project/blob/main/clang/lib/Cod... to emit the code based on clang ast. at a maximum would be to use something like

https://github.com/Mr-Anyone/abi

or this if/when it comes to fruition

https://discourse.llvm.org/t/llvm-introduce-an-abi-lowering-...

to generate ABI compliant calls/etc for cpp libs.

note, i say all this with maximum love in my heart for a language that would have first class cpp interop - i would immediately become jank's biggest proponent/user if its cpp interop were robust.

EDIT: for people wanting/needing receipts, you can skim through https://github.com/compiler-research/CppInterOp/blob/main/li...

Hey! I'm here and receptive.

I completely agree that Clang could solve this by actually supporting my use case. Unfortunately, Clang is very much designed for standalone AOT compilation, not intertwined with another IR generating mechanism. Furthermore, Clang struggles to handle some errors gracefully which can get it into a bad state.

I have grown jank's fork of CppInterOp quite significantly, in the past quarter, with the full change list being here: https://gist.github.com/jeaye/f6517e52f1b2331d294caed70119f1... Hoping to get all of this upstreamed, but it's a lot of work that is not high priority for me right now.

I think, based on my experience in the guts of CppInterOp, that the largest issue is not the C++ code generation. Basically any code generation is some form of string building. You linked to a part of CppInterOp which is constructing C++ functions. What's _actually_ wrong with that, in terms of robustness? The strings are generated not based on arbitrary user input, but based on Clang QualTypes and Decls. i.e. you need valid Clang values to actually get there anyway. Given that the ABI situation is an absolute mess, and that jank is already using Clang's JIT C++ compiler, I think this is a very viable solution.

However, in terms of robustness, I go back to Clang's error handling, lack of grace, and poor tooling for use cases like this. Based on my experience, _that_ is what will cause robustness issues.

Please don't take my response as unreceptive or defensive. I really do appreciate the discussion and if I'm saying something wrong, or if you want to explain further, please do. For alternatives, you linked to https://github.com/Mr-Anyone/abi which is 3 months old and has 0 stars (and so I assume 0 users and 0 years of battle testing). You also linked to https://discourse.llvm.org/t/llvm-introduce-an-abi-lowering-... which I agree would be great, _if/when it becomes available_.

So, out of all of the options, I'll ask clearly and sincerely: is there really a _better_ option which exists today?

CppInterOp is an implementation detail of jank. If we can replace C++ string generation with more IR generation and a portable ABI mechanism, _and_ if Clang can provide the sufficient libraries to make it so that I don't need to rely on C++ strings to be certain that my template specializations get the correct instantiation, I am definitely open to replacing CppInterOp. From all I've seen, we're not there yet.

  • I think that some packages that generate Python bindings for C++ use Clang to do it as well.

  • > which is 3 months old and has 0 stars (and so I assume 0 users and 0 years of battle testing)

    ah my bad i meant to link to this one https://github.com/scrossuk/llvm-abi

    which inspired the gsoc.

    > is there really a _better_ option which exists today?

    today the "best in class" approach is swift's which fully (well tries to) model cpp AST and do what i suggested (emitting code directly):

    https://github.com/swiftlang/swift/blob/c09135b8f30c0cec8f5f...

    • There are upsides to this approach. Coupling Swift's AST with Clang's AST will allow for the best codgen, for sure.

      However, the huge downside to this approach, which cannot be overlooked, is that Clang (not libclang) is not designed to be a library. It doesn't have the backward compatibility of a library. Swift (i.e. Apple) is already deep into developing Clang, and so I'm sure they can afford the cost of keeping up with the breaking changes that happen on every Clang release. For a solo dev, I'm not yet sure this is actually viable, but I will give it more consideration.

      However, I think that raising alarms at C++ codegen is unwarranted. As I said before, basically any query builder or codegen takes some form of string generation. The way we make those safe is to add types in front of them, so we're not just formatting user strings into other strings. That's exactly what CppInterOp does, where the types added are Clang QualTypes and Decls.

      4 replies →

> the CppInterOp approach to cpp interop is completely janky (no pun intended). the approach literally string munges cpp and then parses/interprets it to emit ABI compliant calls.

So, I agree that this sounds janky as heck. My question is: besides sounding janky as heck, is there something wrong with this? Is it slow/unreliable?

  • i mean it's as prone to error as any other thing that relies on string munging. it's probably not that much slower than the alternative i proposed - because the trampolines/wrappers are jitted and then reused - but it's just not robust enough that i would ever imagine building a prod system on top of it (eg using cppyy in prod) let alone baking it into my language/runtime.

    • > i mean it's as prone to error as any other thing that relies on string munging.

      This is misleading. Having done a great deal of both (as jank also supports C++ codegen as an alternative to IR), if the input is a fully analyzed AST, generating IR is significantly more error prone than generating C++. Why? Well, C++ is statically typed and one can enable warnings and errors for all sorts of issues. LLVM IR has a verifier, but it doesn't check that much. Handling references, pointers, closures, ABI issues, and so many more things ends up being a huge effort for IR.

      For example, want to access the `foo.bar` member of a struct? In IR, you'll need to access foo, which may require loading it if it's a reference. You'll need to calculate the offset to `bar`, using GEP. You'll need to then determine if you're returning a reference to `bar` or if a copy is happening. Referencing will require storing a pointer, whereas copying may involve a lot more code. If we're generating C++, though, we just take `foo` and add a `.bar`. The C++ compiler handles the rest and will tell us if we messed anything up.

      If you're going to hand wave and say anything that's building strings is error prone and unsafe, regardless of how richly typed and thoroughly analyzed the input is, the stance feels much less genuine.

    • The delta between the title and the content gave me extreme pause, thanks for sharing that there's, uh, worse problems.

      I'm a bit surprised I've seen two articles about jank here the last 2 days if these are exemplars of the technical approach and communication style. Seems like that wouldn't be enough to get on people's radars.

      5 replies →