Comment by aapoalas
16 days ago
Hey, thank you for the comment! I'll try to answer as best I can.
A pointer is 64 bits, though carrying much less useful payload than that. A JavaScript engine only rarely deals with more than 4 GiB of memory, so a 32 bit integer would be enough to index the entire memory needed. If you turn that though into indexes, a 32 bit index can speak of 4 billion separate items: Most programs never have that many distinct heap items alive at the same time. Note that this index doesn't now really correspond to indexable memory so we're no longer bound by the 4 GiB limit.
We actually do keep the 64 bit Value though! We just use the massive amounts of data to store a lot of data on the stack, avoiding heap allocations altogether.
> That just sounds like a pointer.
A pointer points to one place and one place only: An index can points to as many places as there are "parallel vectors" associated with it. eg. Think of a table: A row index refers to as many cells as there are columns, whereas a cell pointer only identifies one cell.
> The last case also seems like a security hole, not protection.
Usually JS engines don't consider the JS-accessible contents of the JS heap itself part of the threat model: Any object in the heap is liable to be leaked by the JS code running in the engine anyway. eg. V8's object placement is fairly static and easy to exploit. The important thing for safety is to avoid type confusion which can be used to create read/write primitives to punch out of the sandbox. So; an attacker can freely read through the heap data by creating heap indexes out of thin air but they cannot use that to reinterpret one type of data as another type and then feed that back to the engine to cause it to misbehave.
> But this I don't get. If you are splitting things by type, how can you cluster them by how they are related? An object like {a: 1, b: 2} is an object with two strings and two numbers, presumably spread out over three different type-specific heaps?
Yes, this would split into the ordinary object vector, and the object property vector. If the keys were longer they'd end up in the strings vector and if the values were heap allocated doubles then they'd end up in yet another vector. Looking at it one thing at a time, it is split here and there.
That being said, this doesn't really much change from how traditional engines do it: Strings are not going to be near the objects that use them as keys, nor are heap numbers, and (added) properties also go into a separate backing store which is likely not next to the object. Worst of all, even if all of these were next to the object, they'd span multiple cache lines and wouldn't really benefit from being close to each other as they're pointer chased and thus wouldn't get much guarantees of prefetching.
When you look at multiple objects, however, then you'll see that Nova's object data is still found in those 4 vectors, whereas the traditional engine design... It may have tried it's best to keep the data together but it's probably still spread out here and there. And you're loading all unnecessary stuff like the elements pointer (for indexed properties) and any other inline properties etc. together with the properties that you actually wanted to read.
Sorry, this ended up a bit disjointed. Let me know if you have more questions! Thanks.
No comments yet
Contribute on Hacker News ↗