Comment by mccoyb

6 hours ago

Something something medical researcher reinvents calculus.

In 2026: frontend web developer reinvents tmux.

Guys, please do us the service of pre-filtering your crack token dreams by investigating the tool stack which is already available in the terminal ... or at least give us the courtesy of explaining why your vibecoded Greenspun's 10th something is a significant leg up on what already exists, and perhaps has existed for many years, (and is therefore, in the training set, and is therefore, probably going to work perfectly out of the box).

Maybe, just maybe, this is of obvious utility to the many people who have needs that are not yours?

I very regularly need to interact with my work through a python interpreter. My work is scientific programming. So the variables might be arrays with millions of elements. In order to debug, optimize, verify, or improve in any way my work, I cannot rely on any other methods than interacting with the code as it's being run, or while everything is still in memory. So if I want to really leverage LLMs, especially to allow them to work semi-autonomously, they must be able to do the same.

I'm not going to dump tens of GB of stuff to a log file or send it around via pipes or whatever. Why is there a nan in an array that is the product of many earlier steps in a code that took an hour to run? Why are certain data in a 200k-variable system of equations much harder to fit than others, and which equations are in tension with each other to prevent better convergence?

Are interpreters and pdb not great, previously-existing tools for this kind of work? Does a new tool that lets LLMs/agents use them actually represent some sort of hack job because better solutions have existed for years?

  • I agree that at first glance, it seems like tmux, or even long-running PTY shell calls in harnesses like Claude, solve this. They do keep processes alive across discrete interactions. But in practice, it’s kind of terrible, because the interaction model presented to the LLM is basically polling. Polling is slow and bloats context.

    To avoid polling, you need to run the process with some knowledge of the internal interpreter state. Then a surprising number of edge cases start showing up once you start using it for real data science workflows. How do you support built-in debuggers? How do you handle in-band help? How do you handle long-running commands, interrupts, restarts, or segfaults in the interpreter? How do you deal with echo in multi-line inputs? How do you handle large outputs without filling the context window? Do you spill them to the filesystem somewhere instead of just truncating them, so the model can navigate them? What if the harness doesn’t have file tools? And so on.

    Then there is sandboxing, which becomes another layer of complexity wrapped into the same tool.

    I’ve been building a tool around this problem: `mcp-repl` https://github.com/posit-dev/mcp-repl

    So tmux helps, but even with a skill and some shims, it does not really solve the core problem.

  • In the data science scenario you should just have proper tooling, for you it sounds like a REPL the agent can interface with. I do this with nREPL/CIDER; in Python-land a Jupyter kernel over MCP maybe. For stateful introspection where you don't control the tooling, tmux plus trivial glue gets you most of the way.

    edit: There are much better solutions for Python-land below it seems :)

  • Are you aware that you can use tmux (or zellij, etc.), spin up the interpreter in a tmux session, and then the LLM can interact with it perfectly normally by using send-keys? And that this works quite well, because LLMs are trained on it? You just need to tell the LLM "I have ipython open in a tmux session named pythonrepl"

    This is exactly how I do most of my data analysis work in Julia.

  • See related sibling: the use cases are compelling!

    My complaint is that tmux handles them perfectly. Exactly the claim that OP is making with their software - is served by robust 18 year old software.

    In 2026, it costs nearly nothing to thoroughly and autonomously investigate related software — so yes I am going to be purposefully abrasive about it.

  • > I'm not going to dump tens of GB of stuff to a log file

    In the same vein as the parent comment, the curiosity is why you would vibe code a solution instead of reaching for grep.

  • What I do is have a quick command that spins up a worktree on a repo with my ghostty splits as I like them and the tmux named the worktree. I then tell the Claude code about the tmux when it needs to look. It’s pretty good at natively handling the tmux interactions.

    Ideally Ghostty would offer primitives to launch splits but c’est la vie. Apple automation it is.

The problem is, they'll find there is typically already a good solution to their problem, and then they'll have nothing to write about.

At this point, it’s easier to (have the agent) build a simple tool like this than it is to find and set up an existing one.

I sincerely think the chatbot phenomena is giving people the perspective that whatever hallucinatory conversation they're having is profound because it's the first time they personally have thought about it.

On one hand this is normal in education and pedagogy to have the student or apprentice put the boring pieces together to find the wonder of the puzzle itself, but on the other this is how we end up with https://xkcd.com/927/

I agree. We skipped CLIs and went all the way to TUIs because TUIs are "easy to make now"? Or maybe because claude/codex?

But in practice you are padding token counts of agents reading streams of TUIs instead of leveraging standard unix pipes that have been around from day 1.

TLDR - your agent wants a CLI anyway.

Disclaimer: still a cool project and thank you to the author for sharing.

  • The TUI makes more sense to humans who don’t understand the difference between a human and a machine.