Comment by simonw

20 days ago

This is the stealth team I hinted at in a comment on here last week about the "Dark Factory" pattern of AI-assisted software engineering: https://simonwillison.net/2026/Feb/7/software-factory/

This one is worth paying attention to to. They're the most ambitious team I've see exploring the limits of what you can do with this stuff. It's eye-opening.

This right here is where I feel most concerned

> If you haven’t spent at least $1,000 on tokens today per human engineer, your software factory has room for improvement

Seems to me like if this is true I'm screwed no matter if I want to "embrace" the "AI revolution" or not. No way my manager's going to approve me to blow $1000 a day on tokens, they budgeted $40,000 for our team to explore AI for the entire year.

Let alone from a personal perspective I'm screwed because I don't have $1000 a month in the budget to blow on tokens because of pesky things that also demand financial resources like a mortgage and food.

At this point it seems like damned if I do, damned if I don't. Feels bad man.

  • My friend works at Shopify and they are 100% all in on AI coding. They let devs spend as much as they want on whatever tool they want. If someone ends up spending a lot of money, they ask them what is going well and please share with others. If you’re not spending they have a different talk with you.

    As for me, we get Cursor seats at work, and at home I have a GPU, a cheap Chinese coding plan, and a dream.

    • > If someone ends up spending a lot of money, they ask them what is going well and please share with others. If you’re not spending they have a different talk with you.

      Make a "systemctl start tokenspender.service" and share it with the team?

    • I get $200 a month, I do wish I could get $1000 and stop worrying about trying the latest AI tools.

    • > I have a GPU, a cheap Chinese coding plan, and a dream

      Right in the feels

  • Yeah, that's one part of this that didn't sit right with me.

    I don't think you need to spend anything like that amount of money to get the majority of the value they're describing here.

    Edit: added a new section to my blog post about this: https://simonwillison.net/2026/Feb/7/software-factory/#wait-...

    • This is the part that feels right to me because agents are idiots.

      I built a tool that writes (non shit) reports from unstructured data to be used internally by analysts at a trading firm.

      It cost between $500 to $5000 per day per seat to run.

      It could have cost a lot more but latency matters in market reports in a way it doesn't for software. I imagine they are burning $1000 per day per seat because they can't afford more.

      3 replies →

    • I wonder if this is just a byproduct of factories being very early and very inefficient. Yegge and Huntley both acknowledge that their experiments in autonomous factories are extremely expensive and wasteful!

      I would expect cost to come down over time, using approaches pioneered in the field of manufacturing.

  • > No way my manager's going to approve me to blow $1000 a day on tokens, they budgeted $40,000 for our team to explore AI for the entire year.

    To be fair, I’ll bet many embracing concerning advice like that have never worked for the same company for a full year.

  • Same. Feels like it goes against the entire “hacker” ethos that brought me here in the first place. That sentence made me actually feel physically sick on initial read as well. Everyday now feels like a day where I have exponentially less & less interest in tech. If all of this AI that’s burning the planet is so incredible, where are the real world tangible improvements? I look around right now and everything in tech, software, internet, etc. has never looked so similar to a dumpster fire of trash.

    • Yes, exactly this. My biggest issue is how uncurious the approach seems. Setting a "no-look" policy seems cutting edge for two seconds, but prevents any actual learning about how and why things fail when you have all the details. They are just hamstringing their learning.

      We still need to specify precisely what we want to have built. All we know from this post is what they aren't doing and that they are pissing money on LLMs. I want to know how they maintain control and specificity, share control and state between employees, handle conflicts and errors, manage design and architectural choices, etc.

      All of this seems fun when hacking out a demo but how in the world does this make sense when there are any outside influences or requirements or context that needs to be considered or workflows that need to be integrated or scaling that needs to occur in a certain way or any of the number of actual concerns that software has when it isn't built in a bubble?

      1 reply →

    • The biggest rewards for human developers came from building addictive eyeball-getters for adverts so I don’t see how we can expect a very high bar for the results of their replacement AI factories. Real-world and tangible just seem completely out of the picture.

    • Maybe think about it like this: A dev is ~1k per day. If the tool gives you 3x then 2x in cost is fine.

      (The current cost of 1k is "real" and ultimately, even if you tinker on your own, you're paying this in opportunity cost)

      ((caveats, etc))

  • I read that as combined, up to this point in time. You have 20 engineers? If you haven't spent at least $20k up to this point, you've not explored or experienced enough of the ins and outs to know how best to optimize the use of these tools.

    I didn't read that as you need to be spending $1k/day per engineer. That is an insane number.

    EDIT: re-reading... it's ambiguous to me. But perhaps they mean per day, every day. This will only hasten the elimination of human developers, which I presume is the point.

  • I think corporate incentives vs personal incentives are slightly different here. As a company trying to experiment in this moment, you should be betting on token cost not being the bottleneck. If the tooling proves valuable, $1k/day per engineer is actually pretty cheap.

    At home on my personal setup, I haven't even had to move past the cheapest codex/claude code subscription because it fulfills my needs ¯\_(ツ)_/¯. You can also get a lot of mileage out of the higher tiers of these subscriptions before you need to start paying the APIs directly.

Until there's something verifiable it's just talk. Talk was cheap. Now talk has become an order of magnitude cheaper since ChatGPT.

Yet they have produced almost nothing. You can give $10k to couple of college grads and get a better product.

  • This idea was the explicit business proposition of my college scholarship program. It included "summer internships" which turned out to be selling undergrads for $30 an hour while paying us $9 an hour.

    Unfortunately, it doesn't matter how many hours you hire someone who doesn't actually know what they are doing, even though they teach themselves new things sometimes, because writing a few hundred lines of TI-BASIC is not a foundation that you can build a "Turbotax for end of life care planning" service out of.

    Unfortunately, I was charismatic and good at meetings and sounding smart and confident, so they loved my work even though I never delivered a single thing other than the barest of mockups.

    I can't help but find terrible similarities in AI slop.

It is tempting to be stealthy when you start seeing discontinuous capabilities go from totally random to somewhat predictable. But most of the key stuff is on GitHub.

The moats here are around mechanism design and values (to the extent they differ): the frontier labs are doomed in this world, the commons locked up behind paywalls gets hyper mirrored, value accrues in very different places, and it's not a nice orderly exponent from a sci-fi novel. It's nothing like what the talking heads at Davos say, Anthropic aren't in the top five groups I know in terms of being good at it, it'll get written off as fringe until one day it happens in like a day. So why be secretive?

You get on the ladder by throwing out Python and JSON and learning lean4, you tie property tests to lean theorems via FFI when you have to, you start building out rfl to pretty printers of proven AST properties.

And yeah, the droids run out ahead in little firecracker VMs reading from an effect/coeffect attestation graph and writing back to it. The result is saved, useful results are indexed. Human review is about big picture stuff, human coding is about airtight correctness (and fixing it when it breaks despite your "proof" that had a bug in the axioms).

Programming jobs are impacted but not as much as people think: droids do what David Graeber called bullshit jobs for the most part and then they're savants (not polymath geniuses) at a few things: reverse engineering and infosec they'll just run you over, they're fucking going in CIC.

This is about formal methods just as much as AI.

Can you make an ethical declaration here, stating whether or not you are being compensated by them?

Their page looks to me like a lot of invented jargon and pure narrative. Every technique is just a renamed existing concept. Digital Twin Universe is mocks, Gene Transfusion is reading reference code, Semport is transpilation. The site has zero benchmarks, zero defect rates, zero cost comparisons, zero production outcomes. The only metric offered is "spend more money".

Anyone working honestly in this space knows 90% of agent projects are failing.

The main page of HN now has three to four posts daily with no substance, just Agentic AI marketing dressed as engineering insight.

With Google, Microsoft, and others spending $600 billion over the next year on AI, and panicking to get a return on that Capex....and with them now paying influencers over $600K [1] to manufacture AI enthusiasm to justify this infrastructure spend, I won't engage with any AI thought leadership that lacks a clear disclosure of financial interests and reproducible claims backed by actual data.

Show me a real production feature built entirely by agents with full traces, defect rates, and honest failure accounting. Or stop inventing vocabulary and posting vibes charts.

[1] - https://news.ycombinator.com/item?id=46925821

  • > Every technique is just a renamed existing concept. Digital Twin Universe is mocks, Gene Transfusion is reading reference code, Semport is transpilation. The site has zero benchmarks, zero defect rates, zero cost comparisons, zero production outcomes. The only metric offered is "spend more money".

    Repeating for emphasis, because this is the VERY obvious question anyone with a shred of curiosity would be asking not just about this submission but about what is CONSTANTLY on the frontpage these days.

    There could be a very simple 5 question questionnaire that could eliminate 90+% of AI coding requests before they start:

    - Is this a small wrapper around just querying an existing LLM

    - Does a brief summary of this searched with "site:github" already return dozens or hundreds of results?

    - Is this a classic scam (pump&dump, etc) redone using "AI"

    - Is this needless churn between already high level abstractions of technology (dashboard of dashboards, yaml to json, python to java script, automation of automation framework)