Comment by WorldMaker

6 hours ago

> I probably could but it's definitely not going to feel like second nature to me and I suspect I'd really feel like I'm fighting the language. I mean, seriously, I'd be curious, how do you deal with the fact that you can't stack allocate? I can spawn a thread in Rust and share a pointer back to the parent stack, that just seems very hard to do in javascript if not outright impossible?

I had alluded to it before, but this is maybe where some additional experience with other garbage collected backend languages like C# or Java could help build some "muscle memory" here.

The typical lens in a GC-based language is value types versus reference types. Value types are generally stack allocated and pass-by-value (copy-by-value; copied from stack frame to stack frame when passed). Reference types are usually heap allocated and pass-by-reference. A reference is generally a "fat pointer", with the qualification that you generally can't dereference one like a pointer without complex GC locks because the GC reserves the right to move the objects pointed to by references (for instance, due to compaction, but can also due to things like promotion to another heap). References themselves follow the same pass-by-value rules generally (stack allocated and copied).

(The lines are often blurry hence "generally" and "usually": a GC language may choose to allocate particularly large value types on the heap and apply copy-on-write semantics in a way to meet the pass-by-value semantics. A GC language is also free to stack allocate small reference types that it believes won't escape a particular part of the stack. I bring up these edge cases not to suggest complexity but to remind that profile-guided optimization is often the best strategy in any language because any good compiler, even a JIT compiler, is trying to optimize what it can.)

In JS, the breakdown is generally that your value types are string, number, boolean, and your reference types are object, array, and function. `const a = 12` is a static, stack allocated number. `const x = 'foo'` is a static, stack allocated string. It will get copied if you pass it anywhere. Though there's one more optimization here that most GC languages use (and goes all the way back to early Lisp) called "string interning". Strings are always treated as immutable and essentially copy-on-write. Common strings and strings passed to a large number of stack frames get "interned" to shared memory (sometimes the heap; sometimes even just reusing the memory of their first compiled instance in the compiled binary). But because of the copy-on-write and how easy it is to trigger, and often those copies start stack allocated, strings are still considered value types, even though with "interning" they sometimes exhibit reference-like behavior and are sort of the "border type".

Of things to look out for `+` or `+=` where one of the sides is a string can be a huge memory allocator due to copying string bytes alone, which should be easy to expect to happen.

On the reference type side `let x = {a: 5}; let y = x`, the `{a: 5}` part is an object and does allocate to the heap (probably, modulo again things like escape detection by the JIT compiler), but `x` and `y` themselves are stack allocated references. That `let y = x` is only a reference copy.

> it's not like js where I'd have to be like "Okay, I can't write {a: 5} here because it would allocate" or something. I feel like that shouldn't be too contentious? Surely one must learn how to avoid much of javascript if they want to learn how to avoid allocations.

Generally, it's not about "avoiding" the easy language constructions because they allocate, it is balancing the trade-offs of when you want to allocate and how much.

Just like you might preallocate a vector before a tight loop, you might preallocate an array or an object, or even an object pool. (Build an array of objects, with a "free" counter, borrow them, mutate them, return them to the "free" section when done.)

But some of that is trade-offs, preallocation is sometimes harder to read/reason with. On the other side the "over-allocation" you are worried about might be caught entirely by the JIT's escape analysis and compiled out. For almost all languages it is best to let a profile or real data guide what to try to optimize (premature optimization is rarely a good idea), but especially for a GC language it can be crucial. Not because the GC language is more complicated or "magic" or "mysterious", but simply because a GC language is tuned for a lot of auto-optimizations that a manually managed memory language doesn't necessarily get "for free". The trade-off for references being much more opaque boxes than pointers is that a JIT compiler has more optimization options because it can just assume pointer math is off the table. It's between the JIT and the GC where an allocation lives, more times than not, and there are some simple optimization answers such as "the JIT stack allocated that because it doesn't escape this method". It shouldn't feel like a surprise when such things happen, when you get such benefits "for free". The JIT and GC are still maintaining the value-type or reference-type "semantics" at all times, those are just (intentionally) big easy "traits" with a lot of useful middle ground and lot of cross-implementation.

> stack allocation feels like a pretty obvious one, reasoning about mutability, access to pointers

A lot of the above should be a decent starting place for learning those tools. `let` versus `const` as maybe a remaining JS piece not explicitly dived into.

References are generally "pointer enough" for most work. The JS GC doesn't have a way to manually lock a reference to dereference it for pointer math today, but that doesn't mean it never will. Parts of WASM GC are applicable here, but mostly restricted to shared array buffers (blocks of bytes).

In other GC languages, C# has been exploring a space for GC-safe stack allocated pointers to blocks of memory that support (range checked) pointer-like math called Span<T> and Memory<T>. It's roughly equivalent to Rust's Arc-like mechanics, but subtly different as you would expect for existing in a larger GC environment. As that approach has become very successful in C# I am starting to expect variations of it in more GC languages in the next few years.

> control over locking, access to atomics, access to mutexes

For the most part JS is single threaded, stack data is copied (value types), and reference-types get auto-locking for "free" from the GC. So locks aren't important for most JS work and there's not much to control.

If you start to share memory buffers from JS to a Service/Web Worker or to a WASM process you may need to do more manual locks. The big family of tools for that is the Atomics global object: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

But a lot of that is new and rare in JS today.

> the ability to `join` two futures

`Promise.all` and `Promise.any` are the two most common "standard library" combinators. `Promise.all` is the most like Rust `join`.

There are also libraries with even higher-level combinators.

> manage their polling myself

Promises don't poll. JS lives in a browser-owned event loop. Superficially you are in a browser-provided "tokio"-like runtime at all times.

There are some "low-level" tricks you can pull, though in that the Promise abstraction is especially thin compared to Rust Futures. The entire "trait" that async/await syntax abstracts is just the "thenable pattern" in JS. All you need to make a new non-Promise Promise-like is create an object that supports `.then(callBack)` (optionally a second parameter for a catchCallback and/or a `.catch(callBack)`). Though the Promise constructor is also powerful enough you generally don't need to make your own thenable, just implement your logic in the closure you provide to the Promise constructor.

Similarly on the flipside if you need a more complex combinator than Promise.all, and the reason that some higher-level libraries also exist, you just have to build the right callbacks to `.then()` and coordinate what you need to.

It's generally recommended to stick with things like Promise.all, but low level tricks exist.

> I mean even if you say "I can do it", that's great, but how is it surprising?

I think what continues to surprise me is that it sometimes reads like a lack of curiosity for other languages and for the commonalities between languages. Any GC language is built on the same exact kind of building blocks as "lower level" languages. There is a learning curve involved in reasoning about a GC language, but I don't think it should seem like a steep one. The vocabulary has strong overlaps: value types and stack allocated; reference types and heap allocated; references and pointers. The intuitions of one often benefit the other ("this is a reference type, can I simplify what I need from it inside this loop to a value type or two to keep it stack allocated or would it make more sense to preallocate a pool of them?"). Just because you don't have access to the exact same kinds of low level tools doesn't mean that they don't exist or that you can't learn how to take what you would do with the low level tools and apply them in the higher level space. (Plus tools like C#'s Span<T> and Memory<T> work where the low level tools themselves are also starting to blur more together than ever before.)

It just takes a little bit of curiosity, I think, to ask that next question of "how does a GC language stack allocate?" and allowing that to lead you to more of the vocabulary. Hopefully, I've done an okay job in this post illustrating that.

0 comments

WorldMaker

No comments yet

Contribute on Hacker News ↗