Comment by amangsingh

16 days ago

A 500k line codebase for an agent CLI proves one thing: making a probabilistic LLM behave deterministically is a massive state-management nightmare. Right now, they're great for prompting simple sites/platforms but they break at large enterprise repos.

If you don't have a rigid, external state machine governing the workflow, you have to brute-force reliability. That codebase bloat is likely 90% defensive programming; frustration regexes, context sanitizers, tool-retry loops, and state rollbacks just to stop the agent from drifting or silently breaking things.

The visual map is great, but from an architectural perspective, we're still herding cats with massive code volume instead of actually governing the agents at the system level.

82 comments

amangsingh

ttcbj 15 days ago

I find it really strange that there is so much negative commentary on the _code_, but so little commentary on the core architecture.

My takeaway from looking at the tool list is that they got the fundamental architecture right - try to create a very simple and general set of tools on the client-side (e.g. read file, output rich text, etc) so that the server can innovate rapidly without revving the client (and also so that if, say, the source code leaks, none of the secret sauce does).

Overall, when I see this I think they are focused on the right issues, and I think their tool list looks pretty simple/elegant/general. I picture the server team constantly thinking - we have these client-side tools/APIs, how can we use them optimally? How can we get more out of them. That is where the secret sauce lives.

olejorgenb 15 days ago
The tools was mostly already known, no? (I wish they had a "present" tool which allowed to model to copy-paste from files/context/etc. showing the user some content without forcing it through the model)
- AnotherGoodName 15 days ago
  
  Yeah in fact one thing claude is freaking great at is decompilation.
  If you can download it client side you can likely place a copy in a folder and ask claude
  ‘decompile the app in this folder to answer further questions on how it works. As an an example first question explain what happens when a user does X’.
  I do this with obscure video games where i want to a guide on how some mechanics work. Eg. https://pastes.io/jagged-all-69136 as a result of a session.
  It can ruin some games but despite the possibility of hallucinations i find it waaay more reliable than random internet answers.
  Works for apps too. Obfuscation doesn’t seem to stop it.
  
  1 reply →
acedTrex 15 days ago

> but so little commentary on the core architecture.
The core architecture is not interesting? its an LLM tui, theres not much there to discuss architecturally. The code itself is the actual fascinating train wreck to look at.
jayd16 15 days ago
Why are "tools" for local IO interesting and not just the only way to do it? I can't really imagine a server architecture that gets to read your local files and present them without a fat client of some kind.
What is the naive implementation you're comparing against? Ssh access to the client machine?
- abossy 15 days ago
  
  It's early days and we don't fully understand LLM behavior to the extent that we can assume questions like this about agent design are resolved. For instance, is an agent smarter with Claude Code's tools or `exec_command` like Codex? And does that remain true for each subsequent model release?
  
  1 reply →
3abiton 15 days ago

[dead]

sunir 16 days ago

It’s not surprising. There has been quite a bit of industrial research in how to manage mere apes to be deterministic with huge software control systems, and they are an unruly bunch I assure you.

RALaBarge 15 days ago
Sunir! Hope you are doing well man, I got a good chuckle from this.
- sunir 15 days ago
  
  I am! I’ll reach out in another channel to connect.

comboy 16 days ago

It's hard to tell how much it says about difficulty of harnessing vs how much it says about difficulty of maintaining a clean and not bloated codebase when coding with AI.

amangsingh 16 days ago

Why not both? AI writes bloated spaghetti by default. The control plane needs to be human-written and rigid -> at least until the state machine is solid enough to dogfood itself. Then you can safely let the AI enhance the harness from within the sandbox.
whiplash451 15 days ago

Were human organizations (not individuals) any good at the latter anyway?

chrismarlow9 15 days ago

We propped the entire economy up on it. Just look at the s&p top 10. Actually even top 50 holdings.

If it doesn't deliver on the promise we have bigger problems than "oh no the code is insecure". We went from "I think this will work" to "this has to work because if it doesn't we have one of those 'you owe the bank a billion dollars' situations"

jayd16 15 days ago
It's weird to look at the world like this. If they deliver doesn't that invalidate thousands of other business plans? What about paying for that?
If they fail, doesn't software and the giant companies that make it go back to owning the world?
- xp84 15 days ago
  
  “if they deliver”
  As I’m reading this, I’m thinking about how in 1980. It was imagined that everyone needed to learn how to program in BASIC or COBOL, and that the way computers would become ubiquitous would be that everybody would be writing program programs for them. That turned out to be a quaint and optimistic idea.
  It seems like the pitch today is that every company that has a software-like need will be able to use AI to manifest that software into existence, or more generally, to manifest some kind of custom solution into existence. I don’t buy it. Coding the software has never been the true bottleneck, anyone who’s done a hackathon project knows that part can be done quickly. It’s the specifying and the maintenance that is the hard part.
  To me, the only way this will actually bear the fruit it’s promising is if they can deliver essentially AGI in a box. A company will pay to rent some units of compute that they can speak to like a person and describe the needs, and it will do anything - solve any problem - a remote worker could do. IF this is delivered, indeed it does invalidate virtually all business models overnight, as whoever hits AGI will price this rental X%[1] below what it would cost to hire humans for similar work, breaking capitalism entirely.
  [1] X = 80% below on day 1 as they’ll be so flush with VC cash, and they’d plan to raise the price later. Of course, society will collapse before then because of said breaking of capitalism itself.
  
  2 replies →

pred_ 15 days ago

The time is ripe for deterministic AI; incidentally, this was also released today: https://itsid.cloud/ - presumably will be useful for anyone who wants to quickly recreate an open source Python package or other copyrighted work to change its license.

nyrikki 15 days ago
Can you please explain the use here? I tried the demo, and cat, cp, echo, etc... seem to do the exact same thing without the cost.
Their demo even says:
`Paste any code or text below. Our model will produce an AI-generated, byte-for-byte identical output.`
Unless this is a parody site can you explain what I am missing here?
Token echoing isn't even to the lexeme/pattern level, and not even close to WSD, Ogden's Lemma, symbol-grounding etc...
The intentionally 'Probably approximately complete' statistical learning model work, fundamentally limits reproducibility for PAC/Stastical methods like transformers.
CFG inherently ambiguity == post correspondence problem == halt == open domain frame-problem == system identification problem == symbol-grounding problem == entscheidungsproblem
The only way to get around that is to construct a grammar that isn't. It will never exist for CFGs, programs, types, etc... with arbitrary input.
I just don't see why placing a `14-billion parameter identity transformer` that just basically echos tokens is a step forward on what makes the problem hard.
Please help me understand.
- yw3410 15 days ago
  
  It's satire - just see the About page.
ericfr11 15 days ago

April's fool. Check the career page
BloondAndDoom 15 days ago
I don’t understand what this is, is it satire? What is it supposed to be doing or solving?
- climclam 15 days ago
  
  Take a look at the demo or about page ;)
  edit: or click 'Start Pro Trial'
  
  1 reply →

nicoburns 15 days ago

Kinda depends how much of it is vibe coded. It could easily be 5x larger than it needs to be just because the LLM felt like it if they've not been careful.

saynay 15 days ago

Claude folks proudly claim to have Claude effectively writing itself. The CEO claims it will read an issue and automatically write a fix, tests, commit and submit a PR for it.
amangsingh 15 days ago

Bingo. And them 'being careful' is exactly what bloats it to 500k lines. It's a ton of on-the-fly prompt engineering, context sanitizers, and probabilistic guardrails just to keep the vibes in check.

whycombagator 15 days ago

> Right now, they're great for prompting simple sites/platforms but they break at large enterprise repos

Can you expand on this?

My experience is they require excessive steering but do not “break”

oblio 15 days ago

I think the "breakage" is in terms of conciseness and compactness, not outright brokenness.
Like that drunk uncle that takes half an hour and 20 000 words to tell you a 500 word story.

xp84 15 days ago

Indeed. In some ways, this is just kind of an extrapolation of the overall trend toward extreme bloat that we’ve seen in the past 15 years, just accelerated because LLMs code a lot faster. I’m pretty accustomed to dealing with Web application code bases that are 6-10 years old, where the hacks have piled up on top of other hacks, piled on top of early, tough-to-reverse bad decisions and assumptions, and nobody has had time to go back and do major refactors. This just seems like more of the same, except now you can create a 10 year-old hack-filled code base in three hours.

jessai202699 15 days ago

The terrifying thing is that LLMs turn "technical debt" into "synthetic debt" that accumulates in real-time.
When we use an agent that lacks a native way to consolidate its own context, we essentially force it to generate these 10-year-old hack-filled codebases by design. We’re over-engineering the "container" (the CLI logic) to babysit a "leaky" context.
If the architecture doesn't start treating long-term memory as a first-class citizen, we’re just going to see more of these 500k-line "safety nets" masking the underlying fragility of the agents.

cheesecompiler 15 days ago

There seem to be multiple mechanisms compensating for imperfect, lossy memory. "Dreaming" is another band-aid on inability to reliably store memory without loss of precision. How lossy is this pruning process?

It's one thing to give Claude a narrow task with clear parameters, and another to watch errors or incorrect assumptions snowball as you have a more complex conversation or open-ended task.

tracyhenry 15 days ago

> they break at large enterprise repos.

I don't know where you get this. you should ask folks at Meta. They are probably the biggest and happiest users of CC

batshit_beaver 15 days ago
You mean the company where engineers ask chat bots to write chess games in their spare time in order to hit their AI usage requirements? That Meta?
- tracyhenry 15 days ago
  
  idk why you bring this up. this is irrelevant to whether CC actually works at big corps
- jimbokun 15 days ago
  
  I missed that, source?

pancsta 15 days ago

You need state oriented programming to handle that. I know, because I made one. The keyword is „unpredictability”. Embrace nondeterminism.

viktorianer 15 days ago

Exactly right. Files on disk as the shared state, not the conversation window. Each step reads current state, does its job, writes output. Next step starts fresh from those files. No accumulated context means no drift, and the LLM can hallucinate in its reasoning all it wants as long as the output passes a check before anything advances.

bogdanoff_2 16 days ago

What do you mean by "actually governing the agents at the system level", and how is it different from "herding cats"?

amangsingh 16 days ago
Herding cats is treating the LLM's context window as your state machine. You're constantly prompt-engineering it to remember the rules, hoping it doesn't hallucinate or silently drop constraints over a long session.
System-level governance means the LLM is completely stripped of orchestration rights. It becomes a stateless, untrusted function. The state lives in a rigid, external database (like SQLite). The database dictates the workflow, hands the LLM a highly constrained task, and runs external validation on the output before the state is ever allowed to advance. The LLM cannot unilaterally decide a task is done.
I got so frustrated with the former while working on a complex project that I paused it to build a CLI to enforce the latter. Planning to drop a Show HN for it later today, actually.
- skeledrew 15 days ago
  
  > The database dictates the workflow, hands the LLM a highly constrained task, and runs external validation on the output before the state is ever allowed to advance.
  This sounds like where lat.md[0] is headed. Only thing is it doesn't do task constraint. Generally I find the path these tools are taking interesting.
  [0] https://github.com/1st1/lat.md
  
  1 reply →
- mywacaday 15 days ago
  
  I started that very personal project on Monday, waiting with baited breath, make sure to add a sponsor me a coffee link.
  
  5 replies →
- fallinditch 16 days ago
  
  Sounds good, I'll keep an eye out.
  
  3 replies →

marcuscog 15 days ago

I think these folks are attempting to build systems with IAM, entity states, business rules: all built over two foundational DSLs - https://typmo.com

mbesto 15 days ago

Thousands of developers are using Claude Code successfully (I think?).

So what specifically is the gripe? If it works, it works right?

ap99 15 days ago

So this is more like an art than science - and Claude Code happens to be the best at this messy art (imo).

p-e-w 16 days ago

> A 500k line codebase for an agent CLI proves one thing: making a probabilistic LLM behave deterministically is a massive state-management nightmare.

Considering what the entire system ends up being capable of, 500k lines is about 0.001% of what I would have expected something like that to require 10 years ago.

You can combine that with all the training and inference code, and at the end of the day, a system that literally writes code ends up being smaller than the LibreOffice codebase.

It boggles the mind, really.

davidkunz 15 days ago

Oh, you should have a look at Pi then.
https://github.com/badlogic/pi-mono/tree/main/packages/codin...
sarchertech 16 days ago
> You can combine that with all the training and inference code, and at the end of the day, a system that literally writes code ends up being smaller than the LibreOffice codebase.
You really need to compare it to the model weights though. That’s the “code”.
- pixl97 15 days ago
  
  >You really need to compare it to the model weights though
  Then you'd need to compare the education of any developer in relation to how many LOC their IDE is. That's the "code".
  So yea, the analogy doesn't make a whole lot of sense.
oblio 15 days ago

It even wrote an entire browser!
By "just" wrapping a browser engine.
raincole 15 days ago

... what are you even talking about? "The system that literally writes code" has a few hundreds of trillions of parameters. How is this smaller than LibreOffice?
I know xkcd 1053, but come on.

bwfan123 15 days ago

brute-forcing pattern-matching at scale. These are brittle systems with enormous duct-taping to hold everything together. workarounds on workarounds.

jeffreese 7 days ago

[dead]

quantumquantara 15 days ago

[dead]

gbibas 15 days ago

[dead]

jessai202699 15 days ago

[dead]

dolomo 16 days ago

[flagged]

amangsingh 16 days ago
If writing concise architectural analysis without the fluff makes me an AI, I'll take the complement. But no - just a tired Architect who has spent way too many hours staring at broken agent state loops haha.
- avarun 15 days ago
  
  This reply is quite literally AI as well, and so was your initial comment. It's so so obvious after spending enough time on Twitter and seeing the pattern used by all the AI reply bots. Absolutely insane that the HN crowd isn't able to see this.
- airstrike 15 days ago
  
  I'll bet you $20 you ran your original comment through an LLM. Likely an OpenAI model.
thfuran 16 days ago

What makes you think that’s AI-written?
samusiam 16 days ago
AI witch-hunters are even more annoying.
- WarmWash 15 days ago
  
  Seriously, people are becoming deranged.
  Drop an em dash or a bullet point and they go into spasms.

ramesh31 16 days ago

>A 500k line codebase for an agent CLI proves one thing: making a probabilistic LLM behave deterministically is a massive state-management nightmare. Right now, they're great for prompting simple sites/platforms but they break at large enterprise repos.

Is that the case? I'm pretty sure Claude Code is one of the most massively successful pieces of software made in the last decade. I don't know how that proves your point. Will this codebase become unmanageable eventually? Maybe, but literally every agent harness out there is just copying their lead at this point.

amangsingh 15 days ago
Claude code is a massively successful generator, I use it all the time, but it's not a governance layer.
The fact that the industry is copying a 500k-line harness is the problem. We're automating security vulnerabilities at scale because people are trying to put the guardrails inside the probabilistic code instead of strictly above it.
Standardizing on half a million lines of defensive spaghetti is a huge liability.
- ramesh31 15 days ago
  
  >Standardizing on half a million lines of defensive spaghetti is a huge liability.
  Again, maybe it will be. Or maybe the way we make software and what is considered good practice will completely change with this new technology. I'm betting on the latter at this point.