Comment by torginus

1 month ago

I'm going to go out on a limb and say no - this library seems to do the parsing in Python, and Python is slow, like many times slower than Java, C# or languages in this class - which you find out if you try to do heavy data processing with it, which is one of the reasons I dislike the language. It's also very hard to parallelize - in C# if you feed stuff into LINQ and entries are independent, you can make the work parallel with PLINQ very quickly, while threads aren't really a thing in Python (or at least they werent back then).

I've seen so many times that data processing quickly became a bottleneck and source of frustration with Python that stuff needed to be rewritten, that I came to not bother writing stuff in Python in the first place.

You can make Python fast by relying on NumPy and pandas with array programming, but doing so can be quite challenging to format and massage the data so that the things that you want can be expressed as array programming ops, that usually it became too much of a burden for me.

I wish Python was at least as fast as Node (which also can have its own share of performance cliffs)

It's possible that nowadays Python has JITs that improve performance to Java levels while keeping compatibility with most existing code - I haven't used Python professionally in quite a few years.

1 comment

torginus

zahlman 1 month ago

From the README, features include:

> native code parsing speedups for most common platforms

Which is to say, roughly analogous to "relying on NumPy". (A well-designed system avoids repeatedly calling from Python to C and prefers to let loops live within the C code; that applies at least as much to tree-like data as array-like data.)

> I wish Python was at least as fast as Node (which also can have its own share of performance cliffs) It's possible that nowadays Python has JITs that improve performance to Java levels while keeping compatibility with most existing code - I haven't used Python professionally in quite a few years.

No guarantees, but have you tried PyPy? It's existed since 2007 and definitely improved over time.

I would say that "performance cliffs" are just endemic to programming. Even in C you find people writing bad algorithms because better ones seem (at least superficially) much harder to write — especially if the good algorithm requires, say, a hash table. (C++ standard library containers definitely ameliorate this effect, but you pay in code complexity, especially where templates are needed.) And on the other hand you sometimes see big improvements from dropping to assembly (cf. ffmpeg).