← Back to context

Comment by jayd16

5 hours ago

Even more so, it shows that SoA data structure means you can add fields to your 1M monsters with little impact.

This is valid for sequential scanning of the data. The CPU will fill whole cache lines at once with the arrays that do get used and the algorithm touches all the field instances in the array.

Now think about random access to single struct instances instead: the CPU loads a cache line worth of data for each field and uses only one element out of the whole cache line. This is much worse than a compact structure representation of the same data.

SoA is not universally better.

  • This sounds similar to relational databases vs document oriented databases, at least when I briefly looked into database like MongoDB when such things were all the rage 15-20 years ago.

    For the internal web site that customer support people used a document oriented database would be great because that wants to load everything about one customer and pretty much doesn't need anything else until the user is done supporting that customer.

    For the dozens or periodic reports that needed to be generated relational was way better. A given report generally only wanted a small amount of per customer data but wanted that for all customers.

    A little bit of searching and LLM querying suggests that nowadays there are databases that are good at both kind of tasks, in particular Postgress with JSONB, at least at the scale we were looking at (maybe 30k or so customers), but maybe really big operations would need more specialized software.

  • No it's not always better and I didn't mean to imply it was. I was simply saying that the article argues against its title.

    In both cases you want to think about locality of the next read and structure the data accordingly.

> you can add fields to your 1M monsters with little impact.

Great for this access pattern, but I wouldn't make a general statement like that. This is the same thing as row-oriented vs column-oriented databases, OLTP vs OLAP. SoA is weak if you are adding/removing monsters more often than accessing a single "hot" field.

  • > SoA is weak if you are adding/removing monsters more often than accessing a single "hot" field.

    Why is that? Genuinely curious. Does "weak" mean that it performs worse than AoS, or that the gains aren't as significant versus AoS?

    • It's because removing a monster with 20 fields from an SoA structure means resizing 20 arrays. Removing the same monster from an AoS array involves resizing a single array, which you're going to process in a very cache friendly way.

      3 replies →

Yes. I think one of the big advantages of SoA is that you only pay for the fields you're currently using. If you need a field somewhere, you can add it and only pay the cost of iterating it where you need it.