← Back to context

Comment by xapata

8 years ago

> Processes have the same problem in addition to being generally less efficient.

What I meant was that you should consider a multiprocessing approach that shares essentially no data between processes. As you say, the memory copying overhead is highly inefficient. Once you approach a problem like that, you've already implemented an essentially distributed system and the change is trivial.

I've regretted multithreading enough times to convince me it's almost never the right choice. Mostly because I find I've underestimated the project scale and needed to rewrite as distributed. Maybe those new monstrous instances available on EC2 will change my habits. I've never had such flexible access to a 4TB RAM / 128 core machine before.

> the fact that Python needs these projects

Apache Arrow solves problems for many languages. The ACM article that popped up the other day, "C is not low-level" touched on some of the issues.

https://arrow.apache.org