Comment by nurettin

7 days ago

It just litters perfectly reasonable python code with async/await. Maybe they are preparing for something we don't know, like a parallel async executor which can be set up to use native threads without changing code and somehow protects you if it detects shared state.

Caveat: I have not looked at the neither the API nor the implementation of Kreuzberg, this is purely from personal work.

Even with CPU bound code in Python, there are valid reasons to be using async code. Recognizing that the code is CPU bound, it is possible to use thread and/or process pools to achieve a certain level of parallelism in Python. Threading won't buy you much in Python, until 3.13t, due to the GIL. Even with 3.12+ (with the GIL enabled), it's possible (but not trivial) to use threading with sub interpreters (that have their own, separate GIL). See PEP 734 [0].

I'm currently investigating the use of sub interpreters on a project at work where I'm now CPU bound. I already use multiprocessing & async elsewhere, but I am curious if PEP 734 is easier/faster/slower or even feasible for me. I haven't gotten as far as to actually run any code to compare (I need to refactor my code a bit with the idea of splitting the work up a bit differently to account for being CPU instead of just IO bound).

[0] https://peps.python.org/pep-0734/

  • Will it lock the GIL if you use thread executor with asyncio for a native c / ffi extension? If that’s the case, that would also add to benefits of asyncio.

> It just litters perfectly reasonable python code with async/await

Yeah. As an API consumer I would not expect a PDF API do IO, hence be async. Have the library be sans-io, the interfaces sync and callers from async code handle IO on their end, offloading to IO threads.

Async is also referred to as “best practice”, but it’s just a tool, for specific use cases. And I say that as an “async fan”!

That said, perhaps it’s easier nowadays to just do async by default, as you say. The real world is async anyway, so why not program closer to that reality.

  • Async is great when you truly need it, but it can overcomplicate things when misused. Having both sync and async options, seems like the best approach. Lets devs choose based on their needs rather than forcing one paradigm.