Comment by 2RTZZSro

8 years ago

Would it be feasible to keep a set of Python interpreters around at all times and use a round robin approach to feed each already-on interpreter commands then perform an interpreter environment cleanup out-of-band after a task is complete?

The Java ecosystem had this with Drip and I think it turned out to not be a great idea in practice - the magazine of VMs get exhausted when you don't want it to, they get into odd states and other things I think, can't quite remember.

Or just use the operating system's `fork` system call?

There's also nailgun for Java which sounds like it works a little differently: http://martiansoftware.com/nailgun/

  • I guess a fork()'ed process triggers copy-on-write behavior in the kernel once the process starts running. So that's latency (the copying) you could still optimize away.

    • A common solution for web application servers ("preforking").

      The idea of keeping persistent interpreters doesn't really work for Python because the interpreter is full of state in places you'd never expect -- it's hard to reset the interpreter to a sane state after it ran some unknown program.

    • I may be wrong, but I would bet that copy-on-write of pages would be hardly visible for most workloads. Copying is quite fast when you do it in batches (4k per page).

    • You might want to measure it before you optimize it! Oftentimes I find that forks where I don't write much are quite inexpensive, with little COW action.

      1 reply →

Yes but with the added complexity and resource usage it's not a good general solution. If every app behaved this way we'd be in a worse place overall.

I imagine this could be handled by some kind of "fork". Where you instantly duplicate the whole process with copy-on-write.

What would be really nice is checkpoint and restart (i.e., unexec), but it turns out that it's extremely hard to implement and get right in a non-managed environment.