Comment by furyofantares

4 months ago

FWIW, which may be not much - I had codex cli try to verify the results. On my M2 Macbook Air only the first example (False Sharing) did anything - a 23x speedup compared to the article's 6x speedup. All the others didn't produce any speedup at all.

Of course I didn't verify the results I got either - I'm not about to spend hours trying to figure out if this is just slop. But I think it is.

8 comments

furyofantares

tapirl 4 months ago

Could you share the benchmark source code of the first example?

furyofantares 4 months ago
Here's the one that showed a lot more speedup than the article:
https://pastebin.com/v9tczpus
Looks like the LLM invented somewhat different test for it than the article had. I tried again and have this with the same data structure as in the article:
https://pastebin.com/SDdcchZG
That gave similar results to the article.
All the other tests still give little-to-no speedup on my machine.
- tapirl 4 months ago
  
  Many thanks for providing the source. It also works on my machine.
  TIL.
  
  4 replies →