Comment by mccoyb

2 months ago

The article seems to be about fun, which I'm all for, and I highly appreciate the usage of MAKER as an evaluation task (finally, people are actually evaluating their theories on something quantitative) but the messaging here seems inherently contradictory:

> Gas Town helps with all that yak shaving, and lets you focus on what your Claude Codes are working on.

Then:

> Working effectively in Gas Town involves committing to vibe coding. Work becomes fluid, an uncountable that you sling around freely, like slopping shiny fish into wooden barrels at the docks. Most work gets done; some work gets lost. Fish fall out of the barrel. Some escape back to sea, or get stepped on. More fish will come. The focus is throughput: creation and correction at the speed of thought.

I see -- so where exactly is my focus supposed to sit?

As someone who sits comfortably in the "Stage 8" category that this article defines, my concern has never been throughput, it has always been about retaining a high-degree of quality while organizing work so that, when context switching occurs, it transitions me to near-orthogonal tasks which are easy to remember so I can give high-quality feedback before switching again.

For instance, I know Project A -- these are the concerns of Project A. I know Project B -- these are the concerns of Project B. I have the insight to design these projects so they compose, so I don't have to keep track of a hundred parallel issues in a mono Project C.

On each of those projects, run a single agent -- with review gates for 2-3 independent agents (fresh context, different models! Codex and Gemini). Use a loop, let the agents go back and forth.

This works and actually gets shit done. I'm not convinced that 20 Claudes or massively parallel worktrees or whatever improves on quality, because, indeed, I always have to intervene at some point. The blocker for me is not throughput, it's me -- a human being -- my focus, and the random points of intervention which ... by definition ... occur stochastically (because agents).

Finally:

> Opus 4.5 can handle any reasonably sized task, so your job is to make tasks for it. That’s it.

This is laughably not true, for anyone who has used Opus 4.5 for non-trivial tasks. Claude Code constantly gives up early, corrupts itself with self-bias, the list goes on and on. It's getting better, but it's not that good.

19 comments

mccoyb

anthonypasq 1 month ago

a response like this is confusing to me. what you are saying makes sense, but seems irrelevant. something like gas town is clearly not attempting to be a production grade tool. its an opinionated glimpse into the future. i think the astethic was fitting and intentional.

this is the equivalent of some crazy inventor in the 19th century strapping a steam engine onto a unicycle and telling you that some day youll be able to go 100mph on a bike. He was right in the end, but no one is actually going to build something usable with current technology.

Opus 4.5 isnt there. But will there be a model in 3-5 years thats smart enough, fast enough, and cheap enough for a refined vision of this to be possible? Im going to bet on yes to that question.

mccoyb 1 month ago
I think this read is generous:
> something like gas town is clearly not attempting to be a production grade tool.
Compare to the first two sentences:
> Gas Town is a new take on the IDE for 2026. Gas Town helps you with the tedium of running lots of Claude Code instances. Stuff gets lost, it’s hard to track who’s doing what, etc. Gas Town helps with all that yak shaving, and lets you focus on what your Claude Codes are working on.
Compared to your read, my read is confused: is it or is it not intending to be a useful tool (we can debate "production" quality, here I'm just thinking something I'd actually use meaningfully -- like Claude Code)?
I think the author wants us to take this post seriously, so I'm taking it seriously, and my critique in the original post was a serious reaction.
- alexjurkiewicz 1 month ago
  
  The blog post says, many times, not to use Gastown. It makes fun of the tool's inconsistent branding and describes a lot of jankiness.
  This tool is dangerous, largely untested, and yet may be of interest if you are already doing similar things in production.
leftbehinds 1 month ago
in 3-5years, sure, just like we are all currently using crypto to pay for groceries and smart contracts for all legal matters.
- anthonypasq 1 month ago
  
  ... no one ever used crypto to buy things. most engineers are currently already using AI. such a dumb comparison that really just doesnt pass the sniff test.
  
  8 replies →

andrewl-hn 1 month ago

Meanwhile here I am at stage 0. I work on several projects where we are contractually obliged to not use any AI tools, even self-hosted ones. And AFAIK there's now a growing niche of mostly government projects with strict no-AI policy.

aidanhs 1 month ago

I'm super interested to hear more on anything you can share about your projects, or the niche of gov projects you're aware of - I've been doing some work with gov and haven't seen this requirement yet, so want to be prepared if it does come up.
(contact details in profile if you prefer)
mccoyb 1 month ago

I’m luckily in a situation where I can afford to explore this stuff without the concerns that come from using it within an organization (and those concerns are 100% valid and haven’t been solved yet, especially not by this blog post)

iamwil 2 months ago

> For instance, I know Project A -- these are the concerns of Project A. I know Project B -- these are the concerns of Project B. I have the insight to design these projects so they compose, so I don't have to keep track of a hundred parallel issues in a mono Project C. On each of those projects, run a single agent -- with review gates for 2-3 independent agents (fresh context, different models! Codex and Gemini). Use a loop, let the agents go back and forth.

Can you talk more about the structure of your workflow and how you evolved it to be that?

mccoyb 2 months ago
I've tried most of the agentic "let it rip" tools. Quickly I realized that GPT 5~ was significantly better at reasoning and more exhaustive than Claude Code (Opus, RL finetuned for Claude Code).
"What if Opus wrote the code, and GPT 5~ reviewed it?" I started evaluating this question, and started to get higher quality results and better control of complexity.
I could also trust this process to a greater degree than my previous process of trying to drive Opus, look at the code myself, try and drive Opus again, etc. Codex was catching bugs I would not catch with the same amount of time, including bugs in hard math, etc -- so I started having a great degree of trust in its reasoning capabilities.
I've codified this workflow into a plugin which I've started developing recently: https://github.com/evil-mind-evil-sword/idle
It's a Claude Code plugin -- it combines the "don't let Claude stop until condition" (Stop hook) with a few CLI tools to induce (what the article calls) review gates: Claude will work indefinitely until the reviewer is satisfied.
In this case, the reviewer is a fresh Opus subagent which can invoke and discuss with Codex and Gemini.
One perspective I have which relates to this article is that the thing one wants to optimize for is minimizing the error per unit of work. If you have a dynamic programming style orchestration pattern for agents, you want the thing that solves the small unit of work (a task) to have as low error as possible, or else I suspect the error compounds quickly with these stochastic systems.
I'm trying this stuff for fairly advanced work (in a PhD), so I'm dogfooding ideas (like the ones presented in this article) in complex settings. I think there is still a lot of room to learn here.
- mlady 2 months ago
  
  I'm sure we're just working with the same tools thinking through the same ideas. Just curious if you've seen my newsletter/channel @enterprisevibecode https://www.enterprisevibecode.com/p/let-it-rip
  It's cool to see others thinking the same thing!