Comment by jerf
10 hours ago
While that speed increase is real, of course, you're really just looking at the general speed delta between Python and C there. To be honest I'm a bit surprised you didn't get another factor of 2 or 3.
"Cimba even processed more simulated events per second on a single CPU core than SimPy could do on all 64 cores"
One of the reasons I don't care in the slightest about Python "fixing" the GIL. When your language is already running at a speed where a compiled language can be quite reasonably expected to outdo your performance on 32 or 64 cores on a single core, who really cares if removing the GIL lets me get twice the speed of an unthreaded program in Python by running on 8 cores? If speed was important you shouldn't have been using pure Python.
(And let me underline that pure in "pure Python". There are many ways to be in the Python ecosystem but not be running Python. Those all have their own complicated cost/benefit tradeoffs on speed ranging all over the board. I'm talking about pure Python here.)
Good point. The profiler tells me that the context switch between coroutines is the most time-consuming part, even if I tried to keep it as light as possible, so I guess the explanation for "only" getting 45x speed improvement rather than 100x is that it is spending a significant part of the time moving register content to and from memory.
Any ideas for how to speed up the context switches would be welcome, of course.