Comment by highfrequency
8 months ago
> CPU, Apple M3 Max, 1 thread: 3.5 minutes
> CPU, Apple M3 Max, 16 threads: 10.26 seconds
Surprised to see a more than linear speedup in CPU threads. What’s going on here?
8 months ago
> CPU, Apple M3 Max, 1 thread: 3.5 minutes
> CPU, Apple M3 Max, 16 threads: 10.26 seconds
Surprised to see a more than linear speedup in CPU threads. What’s going on here?
I believe the single-core version was running slower due to the memory getting full. The benchmark was adding 2^30 numbers, but HVM2 32-bit has a limit of 2^29 nodes. I've re-ran it with 2^28 instead, and the numbers are `33.39 seconds` (1 core) vs `2.94 seconds` (16 cores). You can replicate the benchmark in an Apple M3 Max. I apologize for the mistake.
More cores = more caches?