Comment by LightMachine

8 months ago

I believe the single-core version was running slower due to the memory getting full. The benchmark was adding 2^30 numbers, but HVM2 32-bit has a limit of 2^29 nodes. I've re-ran it with 2^28 instead, and the numbers are `33.39 seconds` (1 core) vs `2.94 seconds` (16 cores). You can replicate the benchmark in an Apple M3 Max. I apologize for the mistake.