Comment by derefr

3 years ago

Erlang's semantics are deeply intertwined with the unique things the Erlang runtime does.

For example, the Erlang abstract machine does straight-line non-preemptable atomic execution within bytecode basic-blocks, with reduction-checking for yield exactly/only at stack-frame manipulation points (i.e. call/ret/tail-call.)

Those points are guaranteed to occur after O(1) reductions, because of an ISA design that contains no unbounded local looping primitives — i.e. no way to encode relative jumps with negative offsets. (Note that this design requirement — and not any functional-programming ideal — is why Erlang uses tail-calls for looping. It has to; there's no other way to do loops given the ISA constraints!)

This atomicity of bytecode basic-blocks is what guarantees that actors can be hard-killed without corrupting the abstract-machine scheduler they run on (they die at their next yield-point, with the scheduler in a coherent state). It's a fundamental difference between Erlang scheduling and JVM scheduling.

The JVM doesn't have this atomicity, and so you can't hard-kill a Java thread without corrupting the JVM. Instead, you can only softly "interrupt" threads — sending them a special "please die" signal they have to explicitly check for. This means that JVM languages can't support anything like Erlang's process links — i.e. JVM concurrency frameworks can't propagate failure downwards through a supervision hierarchy in a way that actually releases resources from long-running CPU-bound sub-tasks. This in turn means you can't reliably bound resource usage under high-concurrency scenarios; which means that, essentially, all the things that people get excited about adding to Java with Akka, Loom, etc. don't actually do much to help the use-cases they attempt to address.

This last is personal experience, by the way. My company develops backend server software in both Erlang (Elixir) and Java. We actually tried Loom as a way of fixing some of the robustness-under-concurrency problems with the JVM; but the problems are much more fundamental than just adding features like virtual threads can resolve.

I didn’t meant it as “only” a library, but as a fork. I still don’t necessarily see why it would be an insurmountable problem to fork the OpenJDK project and add atomic execution to it (do note that I’m sure it has the necessary mechanism for that — JIT deoptimizations pretty much require dropping the previously calculated results, so with some change it could be doable).

It wouldn’t be a soft fork and would still require plenty of resources no doubt, but I still feel like going this direction will better utilize the insane dev-hours that went into the OpenJDK’s stellar performance, than trying to fix the performance issues of Erlang. But do take everything I said with a huge grain of salt, as I am neither an OpenJDK contributor and as you can see don’t know much about the Erlang ecosystem. And thank you for the very informative look behind the scenes.

  • Yeah no. The problem here is that a huge part of said jit performance on jvm depends on the assumption that you can elide information that the schedulers and tracing would need.

    Reverting that would need to revert nearly all advanced optimisations plus to totally change the memory model and the GC. At this point, you have already lost all benefits.

This is not exactly how the Beam VM works.

> an ISA design that contains no unbounded local looping primitives

Actually, most of the Beam instructions are non-O(1). For example since integers are unbounded in Erlang even simple arithmetic (+, -, etc.) may turn out to be a non-constant time operation (even though most of the time your integers will fit into a single machine word, so it'd rarely be a problem). But there's also a built-in for appending lists (++), which obviously contains an unbounded loop in it.

The solution is that these instructions are written so that they do work on chunks of data (e.g. ++ may process 1000 elements of a list in a chunk) after which they increment the reduction counter and possibly schedule out the process.

> there's no other way to do loops given the ISA constraints

The Beam ISA allows looping. You can even write a hand-crafted loop that will never increment the reduction counter and thus will deadlock a scheduler. But the Erlang compiler will never generate such a loop for you.

On the ISA level there are only labels where you can jump to. You can jump forwards and backwards. So you could implement a language that offers loops and still compiles to Beam. However, since jumps don't increment the reduction counter, you would either risk your loops breaking the fair scheduling of processes, or you would have to ensure that the loop body contains an operation that increments the reduction counter and allows the scheduler to suspend the process.

> This atomicity of bytecode basic-blocks is what guarantees that actors can be hard-killed without corrupting the abstract-machine scheduler they run on

Well, it is of course important that you don't interrupt the scheduler at an arbitrary point, midway executing an opcode. But there are no atomically executed bytecode blocks. Actors are free to kill not because they would run their code in uninterrupted atomic blocks, but because they don't share state (their heap) with each other. So if you have an actor that holds e.g. a binary tree, and it is half way into inserting a value into the binary tree when you kill it, it may leave the binary tree in an inconsistent state, but that doesn't matter, because no one else have access to this data structure: it lives on this process' own heap.

When processes use shared resources (such as ETS tables, files or a gen_server process) and they are killed, they may very well leave that shared resource in an inconsistent state, just not on the VM layer, but on the application logic layer. So the file will still be usable as a file, but it may contain corrupted data for example.

> The JVM doesn't have this atomicity, and so you can't hard-kill a Java thread without corrupting the JVM. Instead, you can only softly "interrupt" threads.

If you would port Erlang to the JVM, that would be the least of your problems. The compiler could just insert code to check for these signals every now and then. I go further: if you'd run Erlang code (and only Erlang code) on the JVM, it wouldn't even matter that you don't have separate heaps. Every process would only use a separate part of the shared heap, so they couldn't tip on each other's toe. The GC could take care of the rest as usual.

I think there are two real issues with porting to the JVM:

* Mapping an Erlang process to an OS thread would only work up to some reasonably low number of Erlang processes. After that you'd have to switch to a green thread model with schedulers, which is a lot of work to implement. * The Beam put a lot of effort into making the VM scale well to a lot of schedulers. Things like how to implement a message box where 100+ schedulers can concurrently push messages to. You'd probably have to implement similar optimisations for the data structures you'd use for message boxes, ETS tables etc. on the JVM too.

  • As discussed under my first comment by others, the JVM will soon get Loom, which might solve the first issue you mention. It will effectively give one the option to run a thread on a virtual one, jumping to another one at any blocking operation.

  • > Actually, most of the Beam instructions are non-O(1).

    I said O(1) in BEAM reductions per basic-block, not O(1) in underlying CPU instructions. This is why reductions, rather than pure "instructions executed", are counted: it allows each op (or BIF/NIF call) to account for how expensive executing it was.

    > The solution is that these instructions are written so that they do work on chunks of data (e.g. ++ may process 1000 elements of a list in a chunk) after which they increment the reduction counter and possibly schedule out the process.

    I was eliding reference to these for simplicity. The precise semantics are that "simple ops" are required to be O(1) reduction-bounded; while BIFs/NIFs (incl. things like `erlang:++/2`) aren't, but then must necessarily be implemented with their own internal yield points; and the instructions which invoke them will also potentially yield before/after the invocation. Essentially, within the Erlang abstract-machine model, BIF/NIF invocations act as optimized remote function calls that might have associated "bytecode intrinsics" for invoking them, rather than as regular ISA instructions per se.

    > The Beam ISA allows looping.

    Yes, but BEAM programs that use such code shouldn't be considered valid.

    The BEAM loader doesn't do load-time checks like that, but that's because the BEAM loader is on the inside of an assumed trust zone created by the Erlang compiler. (I.e. BEAM bytecode is implicitly assumed by the loader to be pre-validated at compile time. This is the reason every attempt at untrusted mobile code execution in Erlang has failed—to use Raymond Chen's phrasing, load-time is already "on the other side of the airtight hatchway." If you wanted to allow people to execute untrusted code, you'd need to move the hatchway!)

    Tangent: this is an annoying aspect of calling the Erlang emulator the "Erlang Abstract Machine" — it's not. An abstract machine is a model of runtime semantics, formed by a compiler/interpreter, VM, runtime libraries, and even standard library, all working together to run code under that model.

    (Compare and contrast: the C abstract machine. It is a model of runtime semantics that exists as only 1. compile-time enforcement by C compilers, and 2. libc. It has no VM component at all.)

    This part might be "just my opinion, man" but: given that BEAM was designed purely for the execution of Erlang; and given that BEAM is written to assume that you used an Erlang compiler to compile the bytecode it's running (thus the trust-zone); then any feature of BEAM bytecode that goes unused by Erlang codegen, should be considered undefined behavior for the purposes of the Erlang abstract machine. Whether the BEAM VM allows the bytecode or not, the Erlang abstract machine doesn't.

    In other words, yes, you can create back-references in a BEAM bytecode file. You can also load and call into a NIF that doesn't bother to do reduction accounting. In both cases, you're breaking the runtime semantics of the Erlang abstract machine by doing so, and thereby discarding the properties (e.g. soft-realtime max-bounded-latency scheduling) that you get when you stay within those runtime semantics.

    (And I would argue that, if we did move the trust zone to allow for untrusted mobile code execution, such that we were doing static analysis at load time, then the bytecode loader would almost certainly toss out programs that contain back-references. They're semantically invalid for the abstract-machine model it's trying to enact.)

    > Actors are free to kill not because they would run their code in uninterrupted atomic blocks, but because they don't share state (their heap) with each other.

    Untrue. Many things in ERTS manipulate global emulator (or more precariously, per-scheduler) state in careful ways: fused port packet generation + enqueue done inside the calling process; ETS updates for tables with write concurrency enabled; module-unload-time constant propagation; etc.

    You're even free to manipulate arbitrary shared state yourself, inside a NIF! It's not breaking Erlang abstract-machine semantics as long as that state 1. isn't ERTS state, and 2. the results aren't visible inside the abstract-machine model. Thus NIF memory handles being visible in Erlang as zero-width reference binaries — that's only necessary because NIFs are assumed to be manipulating shared mutable buffers, and so Erlang actually being able to see into those buffers would cause undefined behavior!)

    BEAM can't assume that any process isn't currently executing inside a NIF that's doing precarious must-be-atomic things to shared out-of-ERTS resources. (I realize that this wasn't a part of the initial design of Erlang — NIFs didn't always exist — but it wasn't in conflict with the design, either, and after much iteration, is now fundamental to it.)

    But this manipulation of global state doesn't break the runtime guarantees of Erlang's abstract machine model, so long as these operations are never pre-empted. And so the BEAM doesn't.

    I also didn't mention the other, maybe more interesting things that this constraint gets you: hot-code upgrade, process hibernation, and dynamic tracing. To work, these three features all require that a process's current heap state have a clean bijection to a continuation (i.e. a remote function call MFA tuple.) This is only true at yield points; between these, the heap state's meaning is undefined to the Erlang abstract machine, and has meaning only to BEAM itself. It's only the guarantee of O(1)-in-reductions distance between yield points — and never having to yield between these points — that makes all these features practical.

    ("Erlang with pre-emptible actors" would basically have to use OS threads for each actor, because anything it did instead would be just as heavy in terms of context-switching costs. No green-threading architecture allows the schedulers to pre-empt the green threads they're running, for exactly this reason.)

    > After that you'd have to switch to a green thread model with schedulers, which is a lot of work to implement.

    My whole point is: how do you cleanly de-schedule arbitrary JVM bytecode that's doing something compute-bounded without explicit yield points? You can't, without replacing both the ISA and the compiler with ones that enforce Erlang-abstract-machine-alike semantics, as described above. And any attempt to do that would mean that this hypothetical forked JVM would now be unable to load original-JVM bytecode — and that you'd have to code in a version of Java that only supports tail calls — which makes it useless as a JVM. It'd just be an Erlang emulator.