Comment by furyofantares
2 days ago
Here's the one that showed a lot more speedup than the article:
Looks like the LLM invented somewhat different test for it than the article had. I tried again and have this with the same data structure as in the article:
That gave similar results to the article.
All the other tests still give little-to-no speedup on my machine.
Many thanks for providing the source. It also works on my machine.
TIL.
I tried the others on my x86 machine and they all do something for me - not nearly as much as the article, but something.
The "_ [0]byte" trick has no base in my knowledge. For the author's specified example, [1024]float64 will be always allocated on one whole page, aka, always 64-byte aligned.
For "Array of Structs vs Struct of Arrays", using slices as fields is a good idea. If the purpose is to make fields allocated on their respective memory block, just use pointers instead.
1 reply →