← Back to context

Comment by gck1

17 hours ago

Start building your own liteweight "harness" that does things you need. Ignore all functionality of clients like CC or Codex and just implement whatever you start missing in your harness.

You can replace pretty much everything - skills system, subagents, etc with just tmux and a simple cli tool that the official clients can call.

Oh and definitely disable any form of "memory" system.

Essentially, treat all tooling that wraps the models as dumb gateways to inference. Then provider switch is basically a one line config change.

lol this is literally the same advice us ancient devops nerds were telling others back when ci/cd was new

write scripts that work anywhere and have your ci/cd pipeline be a "dumb" executor of those scripts. unless you want to be stuck on jenkins forever.

what's old is new again!

> You can replace pretty much everything - skills system, subagents, etc with just tmux and a simple cli tool that the official clients can call.

I'm very interest by this. Can you go a bit more into details?

ATM for example I'm running Claude Code CLI in a VM on a server and I use SSH to access it. I don't depend on anything specific to Anthropic. But it's still a bit of a pain to "switch" to, say, Codex.

How would that simple CLI tool work? And would CC / Codex call it?

  • Not the OP but here is a good example: https://mariozechner.at/posts/2025-11-30-pi-coding-agent/

    Initially I read it because just it was interesting but it has ended up being the harness I have stuck with - pi is well designed, nicely extensible and supports many model provider APIs. Though sadly gemini and claude's subscriptions can't really be used with it anymore thanks to openclaw.

  • Check out github.com/ralabarge/beigebox -- OSS AI Harness, started as a way to save all of my data but has agentic features, MCP server, point it at any endpoint (or use any front end with it as well, transparent middleware)

    So far what I am finding is that you just get the basics working and then use the tool and inference to improve the tool.

  • I wish I had lower standards towards sharing absolute AI slop, then I could just drop a link to my implementation. But since I don't, let me just describe it. I essentially had claude build the initial version in a single session which I've been extending as I noticed any gaps in my process.

    First, you need an entrypoint that kicks things off. You never run `claude` or `codex`, you always start by running `mycli-entrypoint` that:

    1. Creates tmux session 2. Creates pane 3. Spawns claude/codex/gemini - whichever your default configured backend is 4. Automatically delivers a prompt (essentially a 'system message') to that process via tmux paste telling it what `mycli` is, how to use it, what commands are available and how it should never use built-in tools that this cli provides as alternatives.

    After that, you build commands in `mycli` that CC/Codex are prompted to call when appropriate.

    For example, if you want a "subagent", you have a `mycli spawn` command that takes a role (just preconfigured markdown file living in the same project), backend (claude/codex/...) and a model. Then whenever CC wants to spawn a subagent, it will call that command instead, which will create a pane, spawn a process and return agent ID to CC. Agent ID is auto generated by your cli and tmux pane is renamed to that so you can easily match later.

    Then you also need a way for these agents to talk to each other. So your cli also has a `send` command that takes agent ID and a message and delivers it to the appropriate pane using automatically tracked mapping of pane_id<>agent_id.

    Claude and codex automatically store everything that happens in the process as jsonl files in their config dirs. Your cli should have adapters for each backend and parse them into common format.

    At this point, your possibilities are pretty much endless. You can have a sidecar process per agent that say, detects when model is reaching context window limit (it's in jsonl) and automatically send a message to it asking it to wrap up and report to a supervisor agent that will spawn a replacement.

    I also don't use "skills" because skills are a loaded term that each of the harnesses interprets and loads/uses differently. So I call them "crafts" which are again, just markdown files in my project with an ID and supporting command `read-craft <craft-id>`. List of the available "crafts" are delivered using the same initialization message that each agent gets. If I like any third party skill, I just copy it to my "crafts" dir manually.

    My implementation is an absolute junk, just Python + markdown files, and I have never looked at the actual code, but it works and I can adapt it to my process very easily without being dependent on any third party tool.