Comment by augusteo

13 days ago

The threading story here is what grabbed my attention. Pass-by-value with copy-on-write means you get data-race immunity without any locks or channels. You just pass data to a thread and mutations stay local. That's a genuinely useful property.

I've worked on systems where we spent more time reasoning about shared state than writing actual logic. The typical answer is "just make everything immutable" but then you lose convenient imperative syntax. This sits in an interesting middle ground.

Curious about performance in practice. Copy-on-write is great until you hit a hot path that triggers lots of copies. Have you benchmarked any real workloads?

21 comments

augusteo

sheepscreek 13 days ago

Hmm this is a bit like peeling a banana only to throw the banana and eat the peel. Pass by value reduces the true benefit of copy-on-write.

Use immutable pass by reference. Make a copy only if mutability is requested in the thread. This makes concurrent reads lock-free but also cuts down on memory allocations.

doug-moen 13 days ago

I think that what you are calling "immutable pass by reference" is what the OP is calling "pass by value". See, when used abstractly, "pass by value" means that the argument is passed as a value, hence it is immutable and the callee can't mutate it. One way to implement this is by copying the data that represents the value. In the OP's language, and in many other languages that work this way, instead of copying the data, we implement "pass by value" by incrementing the reference count and passing a pointer to the original data. These differing implementations provide the same abstract semantics, but differ in performance.
jcparkyn 13 days ago

> Use immutable pass by reference. Make a copy only if mutability is requested in the thread.
This is essentially what Herd does. It's only semantically a pass by value, but the same reference counting optimizations still apply.
In fact, Herd's approach is a bit more powerful than this because (in theory) it can remove the copy entirely if the caller doesn't use the old value any more after creating the thread. In practice, my optimizations aren't perfect and the language won't always detect this.
The big downside is that we have to use atomic reference counts for _everything_. From memory this was about a 5-15% performance hit versus non-atomic counters, though the number might be higher if other bottlenecks were removed.
postepowanieadm 13 days ago

Peels are rich in fiber.

jcparkyn 13 days ago

> Have you benchmarked any real workloads?

Nothing "real", just the synthetic benchmarks in the ./benchmarks dir.

Unnecessary copies are definitely a risk, and there's certain code patterns that are much harder for my interpreter to detect and remove. E.g. the nbodies has lots of modifications to dicts stored in arrays, and is also the only benchmark that loses to Python.

The other big performance limitation with my implementation is just the cost of atomic reference counting, and that's the main tradeoff versus deep cloning to pass between threads. There would definitely be room to improve this with better reference counting optimizations though.

wging 13 days ago

There is some prior work on mitigating the performance cost of immutability that you might be interested in. For example, Clojure's persistent vectors allow fast modifications without destroying the original vector, because internally they're wide trees rather than just linear arrays of memory. This allows for assignments to be implemented without a copy of the full vector. https://hypirion.com/musings/understanding-persistent-vector...

MetricExpansion 11 days ago

Swift actually does this copy-on-mutation strategy as well for its standard library’s container types like Array and Dictionary, and it does indeed make multithreaded programming much easier without requiring full copies. The downside is that you pay reference-counting overhead.

rao-v 13 days ago

Why don’t we just do this by default for threading in most languages? It’s pretty rare for me to actually want to do memory sharing while threading (mostly because of the complexity)

vbezhenar 13 days ago
Because it's super slow and shared memory is super fast. And people generally prefer fast code rather than safe code.
- gf000 13 days ago
  
  It's not "super slow" and most languages do something very similar within concurrent data structures.
  Also, copy by value in itself is just a semantic requirement, it doesn't say how it's implemented.
  And shared mutable memory is pretty damn slow (given you are not fine with data race garbage), because atomic operations destroy caches. So it's the usual space-time tradeoff at the end of the day.

zahlman 13 days ago

Why exactly is imperative syntax "convenient" specifically in the context of inter-thread communication?

ddtaylor 13 days ago
He's likely referencing that you would need to use different syntax and style, like re-assigning a variable or chaining calls, like when working with a String in Java.
In C, you can simply mutate the underlying characters. So changing the fifth character in a string is as easy as:
str[4] = 0;
Whereas using the immutable syntax you might have to do something like:
str = str.substr(0, 4) + "\0" + str.substr(4);
- zahlman 13 days ago
  
  Well, yes, that's how it becomes convenient in general. But why would you be doing things like that when communicating between threads?

jagged-chisel 13 days ago

Pass-by-value is already a copy.

jcparkyn 13 days ago
It's only semantically a pass-by-value, in reality a reference is passed and the data is only copied if needed (i.e. value is mutated while shared).
- zahlman 13 days ago
  
  So the language has reference semantics, and (per the edit) for every object (like in Python)?
  (Ah, no, your example elsewhere in the thread suggests that the referred-to structures get implicitly copied all over the place.)
  
  4 replies →