Comment by r_lee

11 hours ago

if we are at 10x with AI and near AGI or ASI, then how is it possible that these products (Codex, Claude Code CLI) are still such garbage?

shouldn't this "agentic AI revolution" have long solved this already?

no way they're over there saying "we are on it plz wait" or that "it's too much effort"?

43 comments

r_lee

igleria 10 hours ago

This is the biggest elephant in the room I have seen in my decade+ career. At the same time, look how bad Apple is in software compared to its hardware... It's not an AI only problem, it's almost like software in general gets a free pass on being very unsafe or low quality because no one wants to face the same "profit reducing red tape" that civil engineers or similar face.

CharlieDigital 10 hours ago
Anthropic were the progenitors of the Model Context Protocol. Claude Code does not fully implement the client end of the protocol. A protocol; a literal pre-defined spec that an agent should be able to one-shot. Neither does Codex. Codex does not implement MCP Prompts.
(I want Codex to implement MCP Prompts because then we have one central way to ship skills from a server).
The fact that neither platform can implement a protocol given what is functionally infinite frontier model tokens really says a lot. I do not care what kind of random project some influencer can ship with a swarm of 1000 agents. If you cannot make the basics work, it is a farce.
- deathbob 7 hours ago
  
  It still boggles my mind that Anthropic would invent the MCP protocol but not fully implement it.
  Especially when fully implementing it (prompts, resources, tools) is easily done in harnesses that don’t ship with MCP but allow good extension / modification like Pi.
  Claude not being able to see its own usage or self invoke slash commands is also very frustrating.
  
  3 replies →
thewebguyd 6 hours ago
> same "profit reducing red tape" that civil engineers or similar face.
I don't think we should ever head toward licensing/a credential body for software development, but I do think now is a good time to have discussions around liability for defective products.
A good start would be to stop allowing companies to disclaim all warranties of fitness for a particular purpose in their EULAs. The joke of Microsoft Copilot applies here where they have a big disclaimer that "Copilot is for entertainment purposes only" while advertising says otherwise. Not even the chrome EULA will agree that its fit for purpose as a web browser. The clause is a get out of jail free card that shifts all liability and risk to the end user.
- datsci_est_2015 4 hours ago
  
  > I don't think we should ever head toward licensing/a credential body for software development, but I do think now is a good time to have discussions around liability for defective products.
  Liability is how a credential body would organically grow. It already exists in the security, compliance, and enterprise parts of the software world.
  
  1 reply →
forshaper 6 hours ago

How much of all this is due to hardware improving, and software bloating enough to fill the capacity?

thewebguyd 6 hours ago

> shouldn't this "agentic AI revolution" have long solved this already?

Daily reminder that Anthropic took over a year to fix the Claude Code terminal flickering issue despite proclaiming all over the internet that software development as a "solved problem."

Apple forked over $250 Million in a class action over false advertising for Apple Intelligence. When do we start seeing the same for the misleading and outright false claims coming out of the frontier labs about the model capabilities? At this point the marketing is doing more harm than the technology itself because its warping the perceptions of those at the top that make decisions. The only reason tokenmaxxing was ever a thing was because marketing mislead execs and technology decisions were made based on vibes instead of evidence.

mannanj 12 minutes ago

Why is not a thing that people track the lies of people as they are public, and tie them to their reputation over time for anyone to find?
mannanj 3 hours ago

As long as a majority of the people of the living class are gullible and naive and sick, entrained behavior from the institutions and media they are made to consume, they stop seeing the misleading and false claims. Or at least they myopically see it short enough to complain about it in an ineffective way, then continue to consume the next big lie or slop. Until something happens that channels that accumulated rage finally into a cause they feel makes things right (assuming they have not already died and the next generation has been groomed to fall for the rich man's trap) and those who's family and next generation is to continue the extraction and trickery hides behind an anonymous personality or system.

jeffybefffy519 9 hours ago

Because vibe coding is a toy… thats the secret.

You can use it to accelerate development certainly, but that requires careful change->review cycles. The developer still needs to be in heavy control, versus vibe coding having an agent own the code base.

Nullsession 30 minutes ago

[flagged]

hombre_fatal 9 hours ago

Like anything, you have to decide between polish vs switch to any other task in the queue. If you choose too much from the latter, then polish suffers, yet that's a human thing.

Also, Codex and Claude Code aren't as bad as people say. I think most of the noise is embellished by the "hah see? AI sucks" angle.

It's kind of like how HNers would claim to your face that you can't actually build anything with Javascript and Node.js (JS just sucks too much), then they'd list off a few footguns that were supposed to demonstrate why. In other words, champing at the bit for JS to lead people to catastrophize issues that were pretty mediocre.

coldtea 8 hours ago
>Like anything, you have to decide between polish vs switch to any other task in the queue
Why do you "have to decide"? Let some agents go at both of those, isn't that what they claim people can just do?
>Also, Codex and Claude Code aren't as bad as people say. I think most of the noise is embellished by the "hah see? AI sucks" angle.
Why shouldn't it? They're not the ones making the extraordinary claims.
- hombre_fatal 6 hours ago
  
  > Why do you "have to decide"? Let some agents go at both of those, isn't that what they claim people can just do?
  Because your code is still marching somewhere in tokens per second. You have to decide where they are allocated: polish or the next thing. Humans still are the ones prompting LLMs and deciding what is done.
  > isn't that what they claim? Why shouldn't it? They're not the ones making the extraordinary claims.
  Even if I grant that someone else makes excessive claims, why would that let you off the hook to stay grounded?
  Though I don't grant it. Maybe if Anthropic claimed that Opus makes all decisions at the company and builds all software without humans doing all the prompting, the critics would make more sense.
  Until then, it looks more like a double standard: if software built with AI has any issues, then see, AI is shit and the humans who invoked it had no role in it. e.g. it could be the case that Anthropic's Claude Code engineers just aren't doing as much polish as they should.
  Better answer: Someone asked why might it be the case that AI-written software has issues, and it has a real answer. Marketing claims are a different conversation.
  
  5 replies →
geodel 6 hours ago
> yet that's a human thing.
is this joke?
Here we are talking about trillon dollar AI companies who claim AI can fix decade old bugs and create new compilers, OSs and what not. Are parallel agents working autonomously to fix issues as well as create new features not allowed at these companies?
- hombre_fatal 6 hours ago
  
  Humans still decide what LLMs do in a code base, full stop.
  
  2 replies →

rjh29 1 hour ago

Gemini is also buggy as heck and has been buggy for years. For a company of Google's size with "all the power of AI" it's seriously embarrassing.

ValentineC 8 hours ago

The "AI revolution" feels like it's creating a bunch of ultra-smart AI models are scarily good at cracking most of human-created security (Mythos), but also happen to be careless snobs that just leave litter and mess in their wake.

mnicky 8 hours ago

If the code churn is high the investment to refactoring etc is less beneficial than may be obvious. I don't remember the details but I heard in some podcast that the code base of Claude Code changes so fast that any piece of code won't be there for long..

coldtea 8 hours ago
In other words it's an ever moving vibe fest, with random bugs and misbehaviors each time they roll the dice...
- tartoran 7 hours ago
  
  Yes, it’s very characteristic of gen-AI era.
tartoran 7 hours ago

If they respected their users they’d at least pin some versions that are more stable.

layer8 4 hours ago

The issue is that apparently AI coding means that developers stop caring about software quality. Which puts the whole purpose into question.

fg137 10 hours ago

You are asking too many good questions.

user43928 10 hours ago

The products generally work just fine on my MacBook.

I have not encountered major issues in either the Claude Code CLI, the Codex Desktop app, or Claude Desktop app.

They generally get the job done. I don't measure disk writes or analyze the GPU usage.

Zababa 9 hours ago

A simple explanation is that they are "good enough" for most people and they have better things to do. Even if tomorrow I was 100 times as productive, I still wouldn't have time to do literally everything and I would have to prioritize.

coldtea 8 hours ago
You might not.
But the Claude Code team has ONE job.
And they have full access to a platform that they advertise as "humanity-threat" level good, and claim that it can automate everything code related...
- Zababa 7 hours ago
  
  I think they have more than one job, they have to balance new features with improving the software itself. And Anthropic has to balance investing resources into Claude Code vs on infra or other things.
  Not that I'm happy with the current state of things, in fact I'm quite sad that improvements in capacity to do things doesn't translate into better quality.
  
  1 reply →

reducesuffering 3 hours ago

Claude Code has been out for just 1 year and has millions of users already, being a major contribution to roughly $40 billion in revenue. By any stretch it is one of the most extremely fast developed products driving the most important workflow for millions of people already.

"Why isn't literally everything about a product that came out a year ago with an extremely fast scaling userbase solved?" is what I hear.

The goalposts will keep moving until AGI is undeniable.