Comment by augusteo

10 hours ago

The threading story here is what grabbed my attention. Pass-by-value with copy-on-write means you get data-race immunity without any locks or channels. You just pass data to a thread and mutations stay local. That's a genuinely useful property.

I've worked on systems where we spent more time reasoning about shared state than writing actual logic. The typical answer is "just make everything immutable" but then you lose convenient imperative syntax. This sits in an interesting middle ground.

Curious about performance in practice. Copy-on-write is great until you hit a hot path that triggers lots of copies. Have you benchmarked any real workloads?

Hmm this is a bit like peeling a banana only to throw the banana and eat the peel. Pass by value reduces the true benefit of copy-on-write.

Use immutable pass by reference. Make a copy only if mutability is requested in the thread. This makes concurrent reads lock-free but also cuts down on memory allocations.

  • I think that what you are calling "immutable pass by reference" is what the OP is calling "pass by value". See, when used abstractly, "pass by value" means that the argument is passed as a value, hence it is immutable and the callee can't mutate it. One way to implement this is by copying the data that represents the value. In the OP's language, and in many other languages that work this way, instead of copying the data, we implement "pass by value" by incrementing the reference count and passing a pointer to the original data. These differing implementations provide the same abstract semantics, but differ in performance.

  • > Use immutable pass by reference. Make a copy only if mutability is requested in the thread.

    This is essentially what Herd does. It's only semantically a pass by value, but the same reference counting optimizations still apply.

    In fact, Herd's approach is a bit more powerful than this because (in theory) it can remove the copy entirely if the caller doesn't use the old value any more after creating the thread. In practice, my optimizations aren't perfect and the language won't always detect this.

    The big downside is that we have to use atomic reference counts for _everything_. From memory this was about a 5-15% performance hit versus non-atomic counters, though the number might be higher if other bottlenecks were removed.

> Have you benchmarked any real workloads?

Nothing "real", just the synthetic benchmarks in the ./benchmarks dir.

Unnecessary copies are definitely a risk, and there's certain code patterns that are much harder for my interpreter to detect and remove. E.g. the nbodies has lots of modifications to dicts stored in arrays, and is also the only benchmark that loses to Python.

The other big performance limitation with my implementation is just the cost of atomic reference counting, and that's the main tradeoff versus deep cloning to pass between threads. There would definitely be room to improve this with better reference counting optimizations though.

  • There is some prior work on mitigating the performance cost of immutability that you might be interested in. For example, Clojure's persistent vectors allow fast modifications without destroying the original vector, because internally they're wide trees rather than just linear arrays of memory. This allows for assignments to be implemented without a copy of the full vector. https://hypirion.com/musings/understanding-persistent-vector...

Why don’t we just do this by default for threading in most languages? It’s pretty rare for me to actually want to do memory sharing while threading (mostly because of the complexity)

  • Because it's super slow and shared memory is super fast. And people generally prefer fast code rather than safe code.

Why exactly is imperative syntax "convenient" specifically in the context of inter-thread communication?

  • He's likely referencing that you would need to use different syntax and style, like re-assigning a variable or chaining calls, like when working with a String in Java.

    In C, you can simply mutate the underlying characters. So changing the fifth character in a string is as easy as:

        str[4] = 0;
    

    Whereas using the immutable syntax you might have to do something like:

       str = str.substr(0, 4) + "\0" + str.substr(4);

    • Well, yes, that's how it becomes convenient in general. But why would you be doing things like that when communicating between threads?

Pass-by-value is already a copy.

  • It's only semantically a pass-by-value, in reality a reference is passed and the data is only copied if needed (i.e. value is mutated while shared).

    • So the language has reference semantics, and (per the edit) for every object (like in Python)?

      (Ah, no, your example elsewhere in the thread suggests that the referred-to structures get implicitly copied all over the place.)

      3 replies →