Comment by anticodon
3 months ago
Yep, a __del__ in the redis client code caused almost random deadlocks at my job for several years. Manual intervention was required to restart stuck Celery jobs. Took me about 2-3 weeks to find the culprit (had to deploy python interpreter compiled with debug info into production, wait for deadlock to happen again, attach with gdb and find where it happens). One of the most difficult production issues I had to solve in my life (because it happened randomly and it was impossible to even remotely guess what is causing it).