Comment by kaoD

17 days ago

> Why would a game benefit from ECS? A player in the game can do basically anything, there's no guarantee that things are always perfectly accessed in a linear order.

There's actually a guarantee that things are mostly going to be accessed in a linear order because player actions don't matter to the execution of the simulation. The whole simulation is run at 1/FPS intervals across the whole set of entities, regardless of player input (or lack thereof).

In an ECS the whole World is run by Systems, which operate on Components. This is why cache locality works there: when the Movement System is acting, it's operating on the Position Component for all (or at least many) Entities, so linear array access pattern is very favorable. Any other component in your cache is going to be unused until the next system runs (and then the Position Component will become the useless data in cache). That's why you'd rather have an array of Components in cache instead of an array of Entities.

This access pattern is very suitable for games because the simulation is running continuously in an infinite loop (the game loop) consisting of even more loops (the Systems running), but not so much for general purpose computation where access patterns are mostly random. (EDIT: or rather, local to each "entity".)

It is very true that a general purpose computation can theoretically do anything and mess linearity of access patterns entirely up. But in practice programs do most of their work in very linear fashion. It's not by chance that eg. V8 will try to write objects parsed from a JSON array of objects one right after the other. So in a sense we can say that the JavaScript program itself becomes the System with a capital S.

That is not to say that Nova's heap vectors will necessarily make sense: The two big possible stumbling blocks are 1) growing of heap vectors possibly taking too long, and 2) compacting of heap vectors during GC taking too long.

The first point basically comes down to the fact that, at present, each heap vector is truly a single Rust Vec. When it can no longer fit all the heap data into it, it needs to reallocate. Imagine you have 2 billion ordinary objects, and suddenly the ordinary objects vector needs to reallocate: This will cause horrible stalls in the VM. This can be mitigated at the cost of splitting each heap vector into chunks, but this of course comes at the cost of an extra indirection and some lack of linearity in the memory layout.

The second point is more or less a repeat of the first: Imagine you have 2 billion ordinary objects, and suddenly a single one at the beginning of the vector is removed by GC: The GC has to now move every object remaining in the vector down a step to make the vector dense again. This is something that I cannot really do anything about: I can make this less frequent by introducing a "minor GC" but eventually a "major GC" must happen and something like this can then be experienced. I can only hope that this sort of things are rare.

The alternative would be to do a "swap to tail", so the last item in the vector is moved to take the removed item's place. But that then means that linear access is no longer guaranteed. It also plays havoc on how our GC is implemented but that's kind of a side point.

Software engineering is architecture is full of trade-offs :) I'm just hoping that the ones we've made will prove to make sense.

  • > It's not by chance that eg. V8 will try to write objects parsed from a JSON array of objects one right after the other.

    Yes, but note this is still a different pattern of access (array of "entities"). V8 does this because it assumes that e.g. `foo.name` is very likely going to be accessed along with `foo.lastName` (which is likely the 99% case for general computing) whereas ECS assumes `foo.name` is very likely going to be accessed along with `foo2.name`, `foo3.name`, ..., `fooN.name` (which is the 99% case for videogame timestep loops).

    > Software engineering is architecture is full of trade-offs :) I'm just hoping that the ones we've made will prove to make sense.

    To clarify: my comment is not a criticism of Nova's design decisions. I was only trying to clarify the answer to "Why would a game benefit from ECS?" for those foreign to ECS's existential motive.

    I'm sure Nova's tradeoffs make sense for some workloads and I wish you the best!

    • Thank you very much for your well-wishes <3

      > Yes, but note this is still a different pattern of access (array of "entities").

      I was referring to the `[foo, foo2, foo3]` objects themselves; V8 does use an "cache local" placement for those so you'll find them laid out in memory as:

      > [foo_proto, foo_elems, foo_props, foo_name, foo_lastName, foo2_proto, foo2_elems, foo2_props, foo2_name, foo2_lastName, ...]

      For what it's worth, I am interested in laying object properties out in an ECS like manner in Nova, so the properties would be laid out as `[foo.name, foo2.name, foo3.name, ...]`, but currently the properties are laid out similarly to V8, `[foo.name, foo.lastName]`. The only difference is that we do not have "in object properties".

      That being said: I am obviously biased, but I do wonder if an ECS-like layout wouldn't be nearly universally beneficial. Thinking of the `foo.name` and `foo.lastName` access: If those are on the same cache line then accessing the two only reads one cache line. This is nice. But if there are more properties in the objects (and there often are), then those will pollute the cache. If you do this access once, it doesn't matter. If you do this a million times, now the cache pollution becomes a real issue: In Node.js even the optimal case would be that you read read 625,000 cache lines worth of data, only to discard 250,000 cache lines of it.

      If instead we use an ECS-like layout, then accessing these two properties reads two 10100cache lines: That's bad, but on the other hand if this happens once then it won't even make a blip on the screen. If a million of these accesses are done, you could think that we'd suddenly be slow as molasses but now the ECS-like layout is probably going to help you: You're more likely reading the next `name` and `lastName` property values on each access. If you have it bad and only half of the property data you read is actually the `name` and `lastName` properties you want, then you read 750,000 cache lines and lose out to the traditional engine by 100,000 cache lines. If you get 67% "hit rate" then you break even. And that's comparing to the case where the objects only contain `name` and `lastName` and nothing more.

      It of course all comes down to statistics but... I'm very interested in the potential benefits here :)

      Again, thank you for your comments, I've enjoyed discussing and pondering this <3

      4 replies →

  • Virtual alloc your vectors so you can add more backing memory without having to modify the addresses of existing items. Compaction can reap only empty pages but you’ll still need some moving to avoid over fragmentation.

    • Yeah, virtual alloc for the Vec backing memory is something I hope to do _one day_. It's not a very pressing concern however, as it requires going much lower in the stack.

  • > The GC has to now move every object remaining in the vector down a step to make the vector dense again

    Is there something which forces you to compact everything here? Or could you do what most GCs do and track that free entry in a free list?

    • I would have to keep a free list per vector, and there's a lot of those, and it would defeat the point of keeping data packed and temporally colocated: I want to ensure that data was created together stays together and only ever gets more compact as it grows older.

      Filling in empty slots would mean that likely unrelated data comes and pollutes the cache for the old data :(

      2 replies →