Comment by shay_ker
1 month ago
It's really fascinating to read this, since I've encountered similar memory issues in other languages (ruby, go, etc.). Debugging these issues is a pain.
Is there a way to make all this much easier to debug and to prevent memory issues in the first place? Is the abstraction level not quite right?
So with CPython's reference counting, if you're good at not building strong cycles, you really can avoid garbage pressure. It's not even that complicated, it's mostly a question of making a weak reference _somewhere_ along the chain. Often the ergonomics are not great, but Python @property's are nice here.
So for example
class Request
class Session
request.session exists, and the session is "part" of the request. but session.request often exists as a facility. That's a reference cycle which prevents the request (and anything it's pointed at!) from being deallocated at the end of a request.
But in this case, you could easily do something like:
session._request = weakref.ref(request) # on session creation
and then have session.request call session._request() (and maybe assert session._request() is not None if you want to be certain). If you're confident that the session is a "child" of the request, and that you would _never_ have a hold of the session after the request is done, this is a cheap trick that makes session.request cost a little bit more but not much.
I think most Python libraries just don't do memory perf analyses here, and also "believe" in the garbage collector. When GC runs, both request and session will get deallocated, after all! But the long term effects of everyone relying on the GC are that GC is expensive when it doesn't need to be, and when looking through memory you just have more stuff to dig through