← Back to context

Comment by tgbugs

3 years ago

I don't have anything rigorous, but I can say that I see the usual ~4x speedup when using rdflib to parse large files so that a 20 minute workload in cpython drops to 4 or 5 minutes when run on pypy3.

I just reran one of my usual benchmarks and I see 2mins for pypy3 (pypy 7.3.12 python 3.10.12) peak memory usage about 8gigs, 4.8mins for python3.11 (3.11.4) peak memory usage about 3.6gigs (2.4x speedup). On another computer running the exact same workload I see 6.3mins and 19mins (3x speedup) with the same peak memory usage.

I don't have any numbers on the dataset pipelines because I never ran them in production on cpython and went straight to pypy3. It is easy for me to switch between the two implementations in this context so I could run a side by side comparison (with the usual caveat that it would be completely non-rigorous).

I also have some internal notes related to a project that I didn't list because it isn't public, isn't in production, and the benchmarks were collected quite a while ago, but I see a 4x increase in throughput when pulling large amounts of data from a postgresql database from 20mbps on cpython 3.6 to 80mbps on pypy3.