Comment by highfrequency

8 months ago

> CPU, Apple M3 Max, 1 thread: 3.5 minutes

> CPU, Apple M3 Max, 16 threads: 10.26 seconds

Surprised to see a more than linear speedup in CPU threads. What’s going on here?

I believe the single-core version was running slower due to the memory getting full. The benchmark was adding 2^30 numbers, but HVM2 32-bit has a limit of 2^29 nodes. I've re-ran it with 2^28 instead, and the numbers are `33.39 seconds` (1 core) vs `2.94 seconds` (16 cores). You can replicate the benchmark in an Apple M3 Max. I apologize for the mistake.