← Back to context

Comment by staticassertion

14 hours ago

> I still think that's a training/familiarity problem more than a language issue?

But that's fine. Even if we say it's a familiarity problem, that's fine. I'm only saying that it's not reasonable to expect my skills in optimizing backend code to somehow transfer. Obviously many things are the same - reducing allocation, improving algorithmic performance, etc. But that looks very different when you go from the backend to the frontend because the languages can look very different.

> You can just as easily start with `rg \bnew\b` as you can `rg \.clone`.

That's not true though. In Rust you have to have a clone somewhere if you're allocating on the heap, or one of the pointer types like `new`. If I pass a struct around it's either cheaply moveable (ie: Copy) or I have to `clone` it. Granted, many APIs will clone "invisibly" within them, but I can always grep to find the clone.

In Javascript, things seem to allocate by default. A new object allocates. A closure allocates. Things are very implicit, you sort of are in an "allocates by default" mode with js, it seems. In Rust I can just do `[u8; n]` or whatever if I want to, I can just do `let x = "foo"` for a static string, or `let y = 5;` etc. I don't really have to question the memory layout much.

Regardless, you can just learn those rules, of course, but you have to learn them. It seems much easier to "trip onto" an allocation, so to speak, in js.

> Again, perhaps this is just a familiarity issue

I largely agree, though I think that js does a lot more allocation in its natural syntax.

> Do you treat all your backend library dependencies as black boxes as well?

No, but I don't really use frameworks in backend languages much. The heaviest dependency I use is almost always the HTTP library, which is reliably quite optimized. Frameworks impose patterns on how code is structured, which, to me, makes it much harder to reason about performance. I now have to learn the details of the framework. Perhaps the only thing close to this in Rust would be tokio.

> I've used profilers to guide building my own "frameworks" and help proven "Vanilla" approaches to other developers over frameworks in use.

I suspect that this is merely an issue of my own biased experience where I have inherited codebases with javascript that are already using frameworks.

> Maybe I'm missing what primitives you are looking for. async/await is about the same primitive in JS and Rust and there are very similar higher-level tools on top of them.

I mean, stack allocation feels like a pretty obvious one, reasoning about mutability, control over locking, the ability to `join` two futures or manage their polling myself, access to operating system threads, access to atomics, access to mutexes, access to pointers, etc. These just aren't available in javascript. async/await in js is only superficially similar to Rust.

I mean, a simple example is that I recently switched to CompactString and foldhash in Rust for a significant optimization. I used Arc to avoid expensive `.clone` calls. I preallocated vectors and reused them, I moved other work to threads, etc. I feel really comfy doing this in Rust where all of this is sort of just... first class? Like, it's not "weird" rust to do any of this. I don't have to really avoid much in the language, it's not like js where I'd have to be like "Okay, I can't write {a: 5} here because it would allocate" or something. I feel like that shouldn't be too contentious? Surely one must learn how to avoid much of javascript if they want to learn how to avoid allocations.

> To be fair, I get the impulse to want to leave it as someone else's problem.

I just reject that framing. People focus on what they focus on. Optimizing their website is not necessarily their interest.

> I feel like if you can profile and performance tune Rust you should be able to profile and performance tune JS.

I probably could but it's definitely not going to feel like second nature to me and I suspect I'd really feel like I'm fighting the language. I mean, seriously, I'd be curious, how do you deal with the fact that you can't stack allocate? I can spawn a thread in Rust and share a pointer back to the parent stack, that just seems very hard to do in javascript if not outright impossible?

> I think I'm saying that it is surprising to me that people who have developed skillsets for optimizing backend code in languages designed to be fast seem to struggle applying the same skills to a language with simpler/"slower" mechanics

Yeah I don't really see it tbh. I mean even if you say "I can do it", that's great, but how is it surprising?

> I probably could but it's definitely not going to feel like second nature to me and I suspect I'd really feel like I'm fighting the language. I mean, seriously, I'd be curious, how do you deal with the fact that you can't stack allocate? I can spawn a thread in Rust and share a pointer back to the parent stack, that just seems very hard to do in javascript if not outright impossible?

I had alluded to it before, but this is maybe where some additional experience with other garbage collected backend languages like C# or Java could help build some "muscle memory" here.

The typical lens in a GC-based language is value types versus reference types. Value types are generally stack allocated and pass-by-value (copy-by-value; copied from stack frame to stack frame when passed). Reference types are usually heap allocated and pass-by-reference. A reference is generally a "fat pointer", with the qualification that you generally can't dereference one like a pointer without complex GC locks because the GC reserves the right to move the objects pointed to by references (for instance, due to compaction, but can also due to things like promotion to another heap). References themselves follow the same pass-by-value rules generally (stack allocated and copied).

(The lines are often blurry hence "generally" and "usually": a GC language may choose to allocate particularly large value types on the heap and apply copy-on-write semantics in a way to meet the pass-by-value semantics. A GC language is also free to stack allocate small reference types that it believes won't escape a particular part of the stack. I bring up these edge cases not to suggest complexity but to remind that profile-guided optimization is often the best strategy in any language because any good compiler, even a JIT compiler, is trying to optimize what it can.)

In JS, the breakdown is generally that your value types are string, number, boolean, and your reference types are object, array, and function. `const a = 12` is a static, stack allocated number. `const x = 'foo'` is a static, stack allocated string. It will get copied if you pass it anywhere. Though there's one more optimization here that most GC languages use (and goes all the way back to early Lisp) called "string interning". Strings are always treated as immutable and essentially copy-on-write. Common strings and strings passed to a large number of stack frames get "interned" to shared memory (sometimes the heap; sometimes even just reusing the memory of their first compiled instance in the compiled binary). But because of the copy-on-write and how easy it is to trigger, and often those copies start stack allocated, strings are still considered value types, even though with "interning" they sometimes exhibit reference-like behavior and are sort of the "border type".

Of things to look out for `+` or `+=` where one of the sides is a string can be a huge memory allocator due to copying string bytes alone, which should be easy to expect to happen.

On the reference type side `let x = {a: 5}; let y = x`, the `{a: 5}` part is an object and does allocate to the heap (probably, modulo again things like escape detection by the JIT compiler), but `x` and `y` themselves are stack allocated references. That `let y = x` is only a reference copy.

> it's not like js where I'd have to be like "Okay, I can't write {a: 5} here because it would allocate" or something. I feel like that shouldn't be too contentious? Surely one must learn how to avoid much of javascript if they want to learn how to avoid allocations.

Generally, it's not about "avoiding" the easy language constructions because they allocate, it is balancing the trade-offs of when you want to allocate and how much.

Just like you might preallocate a vector before a tight loop, you might preallocate an array or an object, or even an object pool. (Build an array of objects, with a "free" counter, borrow them, mutate them, return them to the "free" section when done.)

But some of that is trade-offs, preallocation is sometimes harder to read/reason with. On the other side the "over-allocation" you are worried about might be caught entirely by the JIT's escape analysis and compiled out. For almost all languages it is best to let a profile or real data guide what to try to optimize (premature optimization is rarely a good idea), but especially for a GC language it can be crucial. Not because the GC language is more complicated or "magic" or "mysterious", but simply because a GC language is tuned for a lot of auto-optimizations that a manually managed memory language doesn't necessarily get "for free". The trade-off for references being much more opaque boxes than pointers is that a JIT compiler has more optimization options because it can just assume pointer math is off the table. It's between the JIT and the GC where an allocation lives, more times than not, and there are some simple optimization answers such as "the JIT stack allocated that because it doesn't escape this method". It shouldn't feel like a surprise when such things happen, when you get such benefits "for free". The JIT and GC are still maintaining the value-type or reference-type "semantics" at all times, those are just (intentionally) big easy "traits" with a lot of useful middle ground and lot of cross-implementation.

> stack allocation feels like a pretty obvious one, reasoning about mutability, access to pointers

A lot of the above should be a decent starting place for learning those tools. `let` versus `const` as maybe a remaining JS piece not explicitly dived into.

References are generally "pointer enough" for most work. The JS GC doesn't have a way to manually lock a reference to dereference it for pointer math today, but that doesn't mean it never will. Parts of WASM GC are applicable here, but mostly restricted to shared array buffers (blocks of bytes).

In other GC languages, C# has been exploring a space for GC-safe stack allocated pointers to blocks of memory that support (range checked) pointer-like math called Span<T> and Memory<T>. It's roughly equivalent to Rust's Arc-like mechanics, but subtly different as you would expect for existing in a larger GC environment. As that approach has become very successful in C# I am starting to expect variations of it in more GC languages in the next few years.

> control over locking, access to atomics, access to mutexes

For the most part JS is single threaded, stack data is copied (value types), and reference-types get auto-locking for "free" from the GC. So locks aren't important for most JS work and there's not much to control.

If you start to share memory buffers from JS to a Service/Web Worker or to a WASM process you may need to do more manual locks. The big family of tools for that is the Atomics global object: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

But a lot of that is new and rare in JS today.

> the ability to `join` two futures

`Promise.all` and `Promise.any` are the two most common "standard library" combinators. `Promise.all` is the most like Rust `join`.

There are also libraries with even higher-level combinators.

> manage their polling myself

Promises don't poll. JS lives in a browser-owned event loop. Superficially you are in a browser-provided "tokio"-like runtime at all times.

There are some "low-level" tricks you can pull, though in that the Promise abstraction is especially thin compared to Rust Futures. The entire "trait" that async/await syntax abstracts is just the "thenable pattern" in JS. All you need to make a new non-Promise Promise-like is create an object that supports `.then(callBack)` (optionally a second parameter for a catchCallback and/or a `.catch(callBack)`). Though the Promise constructor is also powerful enough you generally don't need to make your own thenable, just implement your logic in the closure you provide to the Promise constructor.

Similarly on the flipside if you need a more complex combinator than Promise.all, and the reason that some higher-level libraries also exist, you just have to build the right callbacks to `.then()` and coordinate what you need to.

It's generally recommended to stick with things like Promise.all, but low level tricks exist.

> I mean even if you say "I can do it", that's great, but how is it surprising?

I think what continues to surprise me is that it sometimes reads like a lack of curiosity for other languages and for the commonalities between languages. Any GC language is built on the same exact kind of building blocks as "lower level" languages. There is a learning curve involved in reasoning about a GC language, but I don't think it should seem like a steep one. The vocabulary has strong overlaps: value types and stack allocated; reference types and heap allocated; references and pointers. The intuitions of one often benefit the other ("this is a reference type, can I simplify what I need from it inside this loop to a value type or two to keep it stack allocated or would it make more sense to preallocate a pool of them?"). Just because you don't have access to the exact same kinds of low level tools doesn't mean that they don't exist or that you can't learn how to take what you would do with the low level tools and apply them in the higher level space. (Plus tools like C#'s Span<T> and Memory<T> work where the low level tools themselves are also starting to blur more together than ever before.)

It just takes a little bit of curiosity, I think, to ask that next question of "how does a GC language stack allocate?" and allowing that to lead you to more of the vocabulary. Hopefully, I've done an okay job in this post illustrating that.