Comment by kstenerud

6 days ago

The polarization comes from the very disparate coding experiences and output quality that different people find when using these tools.

For example, I've had the opposite experience of yours, generating very high quality work using Claude (such as https://github.com/kstenerud/yoloai). Just in dealing with all the bugs and idiosyncrasies in the technologies I'm using, the agent has been a godsend in discovering and cataloguing them so that the implementation phase doesn't keep tripping over them: https://github.com/kstenerud/yoloai/blob/main/docs/dev/backe...

And the agents keep getting better all the time. Even in the past month I've noticed a considerable jump in its ability to anticipate issues and correctly infer implications as we build out research, design, architecture and planning docs. By the time it comes to coding, it's mostly a mechanical process that can be passed off to sonnet with a negligible defect rate.

112 comments

kstenerud

Philip-J-Fry 6 days ago

I don't want to offend (it's AI coded anyway :)) but that does not scream "high quality" to me. The headline gif on that repo just paints a terrible picture. It can't draw a box correctly, there's random underscores all over the screen. The UI itself is just incredibly incoherent. I don't even know what I'm looking at.

Like, no it doesn't seem like very high quality work... It just seems like a vibe coded tool.

Edit: yes it's wrapping Claude. It's BREAKING the TUI. Not sure what people aren't getting here...

walthamstow 6 days ago
Take it up with Anthropic. It's actually their billion-dollar TUI product you're commenting on.
The problem with being such a naysayer is that you're entirely disconnected from what's going on. You haven't tried an agent like Claude Code and experienced it for yourself, so you don't recognise what it looks like when it's in front of you.
- SlinkyOnStairs 5 days ago
  
  There are two possibilities here:
  1) This tool breaks the Claude TUI. Exactly as described by the comment.
  2) The Claude TUI itself is broken. The comment is wrong, but assuming the "billion dollar TUI product" is capable of basic rendering and it's the wrapper that broke it, that is an entirely reasonable assumption
  The fun here is that both of these softwares were made extensively using AI. No matter which of our options is the case here, the point stands. An AI-built product was shown, it looks obviously ass.
  
  6 replies →
- Philip-J-Fry 6 days ago
  
  I have tried Claude code. It doesn't look like that!
  I don't know what the project is. All I see is a TUI that looks completely broken.
  Go and use Claude Code right now. Does it look like that? Random underscores all over the page. No it doesn't.
  
  8 replies →
- embedding-shape 5 days ago
  
  > Take it up with Anthropic. It's actually their billion-dollar TUI product you're commenting on.
  That's like blaming the company making hammers because you're unable to build a lasting house with the hammer, it really isn't up to Anthropic, but all about how you use the tool you're holding.
  
  15 replies →
- gcr 5 days ago
  
  They’re talking past each other. For some, “high quality” is a comment about implementation elegance. For others, “high quality” is about duct-taping crude implementations together to fashion a kickass user experience. To most, quality probably involves some convex combination of these.
- my-next-account 5 days ago
  
  I have used those tools, I don't think they're THAT good tbh :P
- godelski 5 days ago
  
  I use claude every single day at work. I've burned hundreds of dollars a week in tokens. But I still think you're being too defensive while attacking Philip.
  I'm sorry, but you need to look yourself in the mirror. You didn't like what they said so you jumped to the assumption that they must not have used CC (or any other agent). That if they had, they would have the same experience as you did/do. But this whole thread is exactly that conversation, that those experiences aren't shared. That this assumption is baseless. And you know what? That's okay. We're not robots. We're human. Each of us has our own unique world we live in. It's okay that people don't have the same experience as you. It's okay that their favorite color, food, activity, or whatever isn't the same as yours. I'm glad that we live in that kind of world. That's what makes things like culture. I don't want to live in a hive mind, and I don't think anyone else does either.
vdelpuerto 5 days ago

That is the same fight the 2D animators were having with 3D aninmation 30 years ago. The resolution is likely to be the same: the tool wins but the fundamentals stay, and the line between competent and incompetent practitioners moves but does not disappear.
godelski 5 days ago
> I don't want to offend (it's AI coded anyway :)) but that does not scream "high quality" to me.
Honestly, I think this is where the big divide is. People have massively different opinions on what "quality" is. Which is okay, but it feels like everyone is working under some assumption that quality is this very clear objective measure that we all agree on. Clearly we don't. We didn't before AI and well... if you can't tell that we don't with AI... you need to take a step back.
FWIW, I agree with Philip here. I don't think this screams "high quality" to me. I'm also not trying to take a shit on your project. Nothing screams "terrible" to me, but yeah, it does look a bit sloppy. There's no polish to it. It looks like someone that grades on "it works" and that's fine. But it also isn't everyone's cup of tea. Where the sloppiness comes in is like what Philip said. First thing I saw was the gif and well... I think Claude Code is sloppy. But this is also a great example at how and where LLMs visibly fail. Creating a box in text is pretty simple. There's tons of tools to do it. And the LLM 100% knows about characters like ⌜⌝⌞⌟⎜, it just doesn't use them and doesn't care. The code itself also looks very LLM generated.
It's fine and I don't think you have any reason to be ashamed of it, but I also wouldn't go around boasting that it is an example of high quality work too. And FWIW, I can't think of a single heavily LLM assisted code where I don't have similar feelings. I've seen stuff with more polish, but yeah, they feel off.
> TUI
This is a space I feel weird in. I love the terminal. I love that there's a lot of new TUIs. But it also feels very weird because it is extremely clear that a lot of these new TUIs were written by people (or machines) that don't really have a lot of experience in the terminal itself. There's a real shared language by people like me who live in the cli. There's a reason people like me can pick up a new tool and guess certain flags and certain ways to use them. It's because of a shared design language that we know of and we end up writing that way because we know it reduces to cognitive load on our peers. But the LLMs? They don't have that shared experience.
I think this is true for a lot of stuff, not just TUIs or bash tools. Things just smell... off...
kstenerud 6 days ago
You do realize that you're complaining about the Claude Code TUI, right?
That's not what this product is; merely a tool it uses.
- pprotas 6 days ago
  
  You claim "very high quality" but can't even get the basic UI working properly. You wrap tmux and a container in 2k lines of code and claim quality, I think the comment above was aimed at this claim.
  
  16 replies →
- Philip-J-Fry 6 days ago
  
  So why has your tool completely broken the Claude Code UI then?
  Can't you see in the gif? It's completely broken. My Claude doesn't look like that. Neither does anyone else's.
  
  21 replies →
wanderlust123 5 days ago

I think at this point there is no convincing people. Clearly there is value in these tools and it generates code when steered properly. Perhaps your struggles are down to a skill issue.

timr 6 days ago

While reading this thread, I literally just caught an agent putting in the following CSS selector in a rule:

> .row > div > div, .alert

This is fairly simple CSS, not multi-threaded systems development. A bar low enough that you could trip over it. I catch this kind of stuff all the time (literally every run), but only because I read every line. Most of it wouldn't be the end of the world for any particular task, but would eventually result in a complete mess.

I think the people doing the heaviest breathing around the elimination of programmers either aren't very good at programming, or they're not paying close attention. Or they're hyping their book.

kstenerud 5 days ago
I haven't done any CSS/HTML/JS level work with Claude yet. I've mainly been using it for systems level stuff.
LLMs have traditionally had problems with visual rendering (the good ol' pelican on the bicycle test). I wonder if this is more of the same?
- timr 5 days ago
  
  In this case, the visual display was fine -- I was instructing it to fix bad code from a previous round that happened to deliver the right results.
  Like I said, this is just an example that happens to be CSS. I see this stuff daily, if not hourly.
  
  3 replies →
habinero 5 days ago

> I think the people doing the heaviest breathing around the elimination of programmers either aren't very good at programming, or they're not paying close attention.
Yeah, absolutely. People think you're picking on, like, code formatting and no, dawg, your code doesn't do what you think it does, or it only handles the happiest of happy paths.
I do find it funny when people get mad about you critiquing their AI project. You didn't even write it, dude.
sjagauanbdvva 5 days ago
Or they don’t know CSS.
Amazing how the LLM is godly with things I don’t understand, and falls over completely when it works in my domain… I wonder why that is /s
- timr 5 days ago
  
  Yes, it's a mystery, isn't it?
  Specifically for CSS, these bots really want to just barf out tailwind-style crap. If you deviate even slightly from the standards and practices of the modal front-end developer, you quickly see how these things are brittle, and no amount of prompting and cajoling will truly affect their behavior. In this case, you're kind of seeing the downstream affects of saying "no, do NOT do tailwind, make actual CSS with actual semantic class names please and thank you."
  Perhaps ironically, this results in the quality of output I might expect if I had prompted a right-out-of-bootcamp coder to do the same. (But at least it doesn't whine about it!)
  
  4 replies →

hollowturtle 6 days ago

Don't want to be rough, but I'd like to read experiences about novelty ideas that solve people real problems in the real world, your project it's just about selling new shovels.

As I commented on another thread

> If you're trying to solve a HARD problem people REALLY have, it's a novelty that agents can't help with, otherwise if it gets 97% there MAYBE it's just a signal that your idea isn't that novel!

jiggawatts 6 days ago
As a random example of a "hard" problem solved by AI that I couldn't have realistically done myself, despite having decades of wide industry experience:
Reverse engineering a proprietary protocol from a binary executable.
I heard about people finding security vulnerabilities in compiled code with the combination of Claude Mythos wired up to a disassembler like NSA's Ghidra. Someone here mentioned that GPT 5.5 "extra high" is just as capable, I had a problem to solve, spare token quota for the week, so... I gave it a go.
My problem was that I'm working with a product that uses a legacy 1990s style network appliance output log format that is proprietary, undocumented, and has no publicly available decoders other than an app by the same vendor, and that app has fundamental limitations. (I.e.: it's nothing like Splunk or Elastic.)
Codex with a Ghidra MCP bridge figured it all out: the framing, bit and byte packing, endian order, field names, data types, etc. It made me a neat little protocol parser in a modern language that I can use to spit out something sane like NDJSON or OTLP protobufs.
There is no way I could have reverse engineered this myself from compiled C++ code and/or packet captures! The format isn't self-describing and is incredibly dense (similar to NetFlow). In a hex viewer it looks like line noise!
- bluGill 5 days ago
  
  > There is no way I could have reverse engineered this myself from compiled C++ code and/or packet captures! The format isn't self-describing and is incredibly dense (similar to NetFlow). In a hex viewer it looks like line noise!
  I think you could have. However I don't think you would have - there is a big difference. It is a lot of work to to that, and people who try normally give up. However if your boss told you could have. Note that I suspect from your story this is more like give this to a dozen people and in 2 years you get results - at a cost of several million dollars.
  
  1 reply →
paulluuk 6 days ago
This is a pretty wild take. What percentage of human engineers are creating novel solutions for hard problems, you think? I work in R&D and even my work is 90% doing things that other people already solved. If you are really doing cutting edge SOTA work that has never been done by another human in some form or another, then kudos to you and I want your job.
- hollowturtle 6 days ago
  
  > What percentage of human engineers are creating novel solutions for hard problems, you think?
  IMO Every engineer should try spending his time in a company that tries to solve new problems.
  Otherwise we will be stuck, as we are now, with big tech paying you mountains of money for doing nothing, incentivizing you to embark on useless activities for letting other managers have a career, fear layoffs and when that happen complaining about it because "it's a year i'm looking for a new job" pretending same compensation and environment. Web development jobs are particularly affected by that.
  In the game industry, for example, if you don't do something interesting your game won't sell a copy.
  Let me stress this out again, if LLMs get you 97% there, maybe you should try another idea.
  
  2 replies →
kstenerud 6 days ago

The comment was directed at:
> For generating production code even with a lot of steering and baby sitting? Absolutely not, not quite there not even close in my experience.
As I said, this is an example of using AI successfully to produce a high quality product (one that I use every day).
But to your point: I am solving hard problems that people really have. You just don't see those because I haven't mentioned them publicly yet. And they won't be released or talked about until they're ready.
ThrowawayTestr 6 days ago

Claude wrote me a little python script to help me sort and rank all the AI videos I've generated. It also extracted the metadata and organized it into a CSV. I sent it some hex dumps of the header and it got it first try. The header structure of webms generated by comfy are pretty novel.

wickedsight 6 days ago

> The polarization comes from the very disparate coding experiences and output quality that different people find when using these tools.

Not just when using tools, also when using humans. The frame of reference of what is considered 'production code' differs immensely between organizations, teams and people. The code I get from LLM's is usually much better than what I get from my peers. Maybe not one shot, but after some steering it gets there.

It also isn't lazy. When generating test cases for relatively simple pieces of code, it usually tests pretty much every path and doesn't stop right at the 80% code coverage quality gate.

I can imagine if you're at the level of Linus or something, you might conclude differently, but most people aren't there at all.

ryandrake 6 days ago

> The frame of reference of what is considered 'production code' differs immensely between organizations, teams and people.
I think it’s really down to this. Nobody can agree on what counts as production-quality code. I remember joining a company with what I think (hope) most of us would call horrible quality code. It was an absolute mess, barely compiled with hundreds of warnings, and had uncountable number of bugs. They didn’t even have a bug tracker so nobody even knew how many they had.
But the people working there already were so proud of it! None of them had ever worked for another company so they had no idea how bad their code was in comparison with the rest of the software industry (which itself is a very low bar). I told the founder we had a huge code quality problem and he looked at me like I had horns growing out of my head.
When someone says their LLM is producing “production-quality” code, actually look at it and see. Arguing about it on HN is pointless because everyone’s quality bar is different.
kstenerud 6 days ago

Absolutely! I find its test generation, properly steered, to be top notch. In many ways it's like having a second head, because it'll spontaneously come up with test paths that I'd normally only get to after a month or so in one of my "aha! What about XYZ?" shower thoughts.
You'll also notice that Linus doesn't poo-poo AI at all. His only gripe is with people using it wrong, such as flooding security lists with drive-by security reports after pointing their agent to the code and saying "find me some VULNS!!1!1!!"
hollowturtle 6 days ago

> The code I get from LLM's is usually much better than what I get from my peers
Then you should seriously question for who you're working for imo.
> It also isn't lazy.
It is indeed lazy in my experience, as in being overly zealous when creating useless test cases and ignoring the important ones. I don't want it to test a sum I want to know a test that can "guarantee" me that a further change doesn't break existing code. And producing this high quality in tests is HARD, and requires a lot of steering with agents. This culture of tests code coverage is just wrong, the best code base I worked with had code coverage only on the net percent of code that matters, the rest is covered by for static type checking and integration tests

windexh8er 5 days ago

A standard Docker container, with the container UID/GID mirrored to the host user, holding the host user's API keys, with the host user's project directory bind-mounted. The tooling doesn't even use gVisor / Kata by default which could implement the claim made, but in reality this entire project appears to be security theater.

timacles 5 days ago

I’d like people to notice that those who claim this amazing AI productivity boost are always: pushing out software they don’t know how to judge the quality of and pushing projects that are 70% done. Every. Single. Time.
I use Claude all the time, it is immensely helpful. It is also very nuanced and requires a high level of expertise in a specific domain to produce quality work. Even then, that take time and effort. Anyone saying otherwise, quite frankly, doesn’t know what they’re doing.
kstenerud 5 days ago

[dead]

dominotw 5 days ago

not going to look at your vibeslop