Comment by t-kalinowski

4 hours ago

I agree that at first glance, it seems like tmux, or even long-running PTY shell calls in harnesses like Claude, solve this. They do keep processes alive across discrete interactions. But in practice, it’s kind of terrible, because the interaction model presented to the LLM is basically polling. Polling is slow and bloats context.

To avoid polling, you need to run the process with some knowledge of the internal interpreter state. Then a surprising number of edge cases start showing up once you start using it for real data science workflows. How do you support built-in debuggers? How do you handle in-band help? How do you handle long-running commands, interrupts, restarts, or segfaults in the interpreter? How do you deal with echo in multi-line inputs? How do you handle large outputs without filling the context window? Do you spill them to the filesystem somewhere instead of just truncating them, so the model can navigate them? What if the harness doesn’t have file tools? And so on.

Then there is sandboxing, which becomes another layer of complexity wrapped into the same tool.

I’ve been building a tool around this problem: `mcp-repl` https://github.com/posit-dev/mcp-repl

So tmux helps, but even with a skill and some shims, it does not really solve the core problem.