Comment by dchuk

21 days ago

I’m very bought in to the idea that raw coding is now a solved problem with the current models and agentic harnesses. Let alone what’s coming in the near term.

That being said, I think we’re in a weird phase right now where people’s obvious mental health issues are appearing as “hyper productivity” due to the use of these tools to absolutely spam out code that isn’t necessarily broadly coherent but is locally impressive. I’m watching multiple people both publicly and privately clearly breaking down mentally because of the “power” AI is bestowing on them. Their wires are completely crossed when it comes to the value of outputs vs outcomes and they’re espousing generated nonsense as it’s thoughtful insight.

It’s an interesting thing to watch play out.

Mm.

I'd agree, the code "isn’t necessarily broadly coherent but is locally impressive".

However, I've seen some totally successful, even award-winning, human-written projects where I could say the same.

Ages back, I heard a woodworking analogy:

  LLM code is like MDF. Really useful for cheap furniture, massively cheaper than solid wood, but it would be a mistake to use it as a structural element in a house.

Now, I've never made anything more complex than furniture, so I don't know how well that fit the previous models let alone the current ones… but I've absolutely seen success coming out of bigger balls of mud than the balls of mud I got from letting Claude loose for a bit without oversight.

Still, just because you can get success even with sloppy code, doesn't mean I think this is true everywhere. It's not like the award was for industrial equipment or anything, the closest I've come to life-critical code is helping to find and schedule video calls with GPs.

  • "Without oversight" is the key here.

    You need to define the problem space so that the agent knows what to do. Basically give it the tools to determine when it's "done" as defined by you.

This has also been an interesting social experiment in that we get to see what work people think is actually impressive vs trivial.

Folks who have spent years effectively snapping together other people’s APIs like LEGOs (and being well-compensated for it) are understandably blown away by the current state of AI. Compare that to someone writing embedded firmware for device microcontrollers, who would understandably be underwhelmed by the same.

The gap in reactions says more about the nature of the work than it does about the tools themselves.

  • >Compare that to someone writing embedded firmware for device microcontrollers, who would understandably be underwhelmed by the same.

    One datum for you: I recently asked Claude to make a jerk-limited and jerk-derivative-limited motion planner and to use the existing trapezoidal planner as reference for fuzzy-testing various moves (to ensure total pulses sent was correct) and it totally worked. Only a few rounds of guidance to get it to where I wanted to commit it.

    • My comment above I hope wasn't read to mean "LLMs are only good at web dev." Only that there are different capability magnitudes.

      I often do experiments where I will clone one of our private repos, revert a commit, trash the .git path, and then see if any of the models/agents can re-apply the commit after N iterations. I record the pass@k score and compare between model generations over time.

      In one of those recent experiments, I saw gpt-oss-120b add API support to swap tx and rx IQ for digital spectral inversion at higher frequencies on our wireless devices. This is for a proprietary IC running a quantenna radio, the SDK of which is very likely not in-distribution. It was moderately impressive to me in part because just writing the IQ swap registers had a negative effect on performance, but the model found that swapping the order of the IQ imbalance coefficients fixed the performance degradation.

      I wouldn't say this was the same level of "impressive" as what the hype demands, but I remain an enthusiastic user of AI tooling due to somewhat regular moments like that. Especially when it involves open weight models of a low-to-moderate param count. My original point though is that those moments are far more common in web dev than they are elsewhere currently.

      EDIT: Forgot to add that the model also did some work that the original commit did not. It removed code paths that were clobbering the rx IQ swap register and instead changed it to explicitly initialize during baseband init so it would come up correct on boot.

      1 reply →

  • This is not true. You can see people who are much older and built a lot of the "internet scale" equally excited about it, e.g: freebsd OG developers, Steve himself (who wrote gas town) etc.

    In fact, I would say I've seen more people who are "OG Coders" excited (and in their >50s) then mid generation

    • I think you're shadow-boxing with a point I never made. I never said experienced devs are not or can not be excited about current AI capabilities.

      Lots of experienced devs who work in more difficult domains are excited about AI. In fact, I am one of them (see one of my responses in this thread about gpt-oss being able to work on proprietary RF firmware in my company [1]).

      But that in no way suggests that there isn't a gap in what impresses or surprises engineers across any set of domains. Antirez is probably one of the better, more reasoned examples of this.

      [1] https://news.ycombinator.com/item?id=46682604

  • I think this says a lot about yourself and where your prejudices and preferences lie.

    • Preferences I think I get, but prejudices?

      The OED defines prejudice as a "preconceived opinion that is not based on reason or actual experience."

      My day to day work involves: full stack web dev, distributed systems, embedded systems, and machine learning. In addition to using AI tooling for dev tasks, we also use agents in production for various workflows and we also train/finetune models (some LLMs, but also other types of neural networks for anomaly detection, fault localization, time series forecasting, etc). I am basing my original commentary in this thread on all of that cumulative experience.

      It has been my observation over the last almost 30 years of being a professional SWE that full stack web dev has been much easier and simpler than the other domains I work in. And even further, I find that models are much better at that domain on average than the other domains, measured by pass@k scores on private evals representing each domain. Anecdotal experience also tends to match the evals.

      This tracks with all the other information we have pertaining to benchmark saturation, the "we need harder evals" crowd has been ringing this bell for the last 8-12 months. Models are getting very good at the less complex tasks.

      I don't believe it will remain that way forever, but at present its far more common to see someone one shot a full stack web app from a single prompt than something like kernel driver for a NIC. One class of devs is seeing a massive performance jump, another class is not.

      I don't see how that can be perceived as prejudice, it just may be an opinion you don't agree with or an observation that doesn't match your own experience (both of which are totally valid and understandable).

If you give every idiot a worldwide heard voice, you will hear every idiot from the whole world. If you give every idiot a tool to make programs, you will see a lot of programs made by idiots.

  • Steve Yegge is not an idiot or a bad programmer. Possibly just hypomanic at most. And a good, entertaining writer. https://en.wikipedia.org/wiki/Steve_Yegge

    Gas Town is ridiculous and I had to uninstall Beads after seeing it only confuse my agents, but he's not completely insane or a moron. There may be some kernels of good ideas inside of Gas Town which could be extracted out into a better system.

    • > Steve Yegge is not an idiot or a bad programmer.

      I don't think he's an idiot, there are almost no actual idiots here on HN in my opinion and they don't write such articles or make systems like Steve Yegge. I'm only commenting about giving more tools to idiots. Even tools made by geniuses will give you idiotic results when used by actual idiots, but a lot of smart people want to lower barriers of entry so that idiots can use more tools. And there are a lot of idiots who were inactive just because they didn't have the tools. Famous quote from a famous Polish essayist/futurist Stanisław Lem: "I didn't know there are so many idiots in this world until I got internet".

    • Even if I looked past the overwrought, self-indulgent Mad Max LARP (and the poor judgment evidenced by the prioritization of world-building minutia while the basic architecture is imploding), the cost of finding those kernels in a monstrosity of this size negates any ROI. 189k lines in four weeks will inevitably surface interesting pattern combinations — that's not merit, that's sample size. You might as well search the Library of Babel; at least the patterns are guaranteed to exist there.

      The other problem with that reasoning is that whatever patterns ARE interesting are more likely to be new to AI-assisted coding generally – meaning a cleaner system built for the same use case will surface them without the archaeological dig, just by virtue of its builder having the skill to design it (and crucially, being more interested in designing it than in creating AI drawings of polecats in steampunk-adjacent garb).

      I'm also a bit curious about at which point you start considering someone an idiot when they keep making objectively idiotic moves – the whimsical Disneyfied presentation, the "please don't download this" false modesty while keeping the repo public, the inexplicable code growth all come from the same place. They're not separate quirks: they're the same inability to edit, the same need for immediate audience validation, the same substitution of volume and narrative for actual engineering discipline. Someone who thinks "Polecats" and "Guzzoline" are good names for production abstractions is not suddenly going to develop the editorial rigor to scrap a codebase and rebuild.

      Which is why it's worth remembering that Yegge's one successful shipped project was Grok, an internal tool used by Google engineers, so Yegge seems to have bought his own hype, missing how much of that project's success was likely subsidized by its user base comprising people skilled enough to route around its limitations.

      These days he seems to be building for developers in general, but critically might be missing that actual developers immediately clock the project's ineptitude + Yegge's immature, narcissistic prioritization and peace the fuck out. The end result of this is filtering for the self-described vibe-coder types, people already Dunning-Krugered enough to believe you can prompt your way into a complete system without knowing how to reason about that system in order to guide the AI.

      Which, fittingly, is how you end up with users who can't even follow "please don't download this yet".

Well put. I can't help thinking of this every time I see the 854594th "agent coordination framework" in GitHub. They all look strangely similar, are obviously themselves vibe-coded, and make no real effort to present any type of evidence that they can help development in any way.

> where people’s obvious mental health issues

I think the kids would call this "getting one-shotted by AI"

> raw coding is now a solved problem

Surely this was solved with fortran. What changed? I think most people just don't know what program they want.

  • You no longer have to be very specific about syntax. There's now an AI that can translate your idea into whatever language you want.

    Previously, if you had an idea of what the program needed to do, you needed to learn a new language. This is so hard that we use language itself as a metaphor: It's hard to learn a new language, only a few people can translate from French to English, for example. Likewise, few people can translate English to Fortran.

    Now, you can just think about your program in English, and so long as you actually know what you want, you can get a Fortran program.

    The issue is now what it was originally for senior programmers: to decide what to make, not how to make it.

    • The hard part of software development is equivalent to the hard part of engineering:

      Anyone can draw a sketch of what a house should look like. But designing a house that is safe, conforms to building regulations, and which wouldn't be uncomfortable to live in (for example, poor choice of heat insulation for the local climate) is the stuff people train on. Not the sketching part.

      It's the same for software development. All we've done is replace FORTRAN / Javascript / whatever with a subset of a natural language. But we still need to thoroughly understand the problem and describe it to the LLM. Plus the way we format these markdown prompts, you're basically still programming. Albeit in a less strict syntax and the "compiler" is non-deterministic.

      This is why I get so mythed by comments about AI replacing programmers. That's not what's happening. Programming is just shifting to a language that looks more like Jira tickets than source code. And the orgs that think they can replace developers with AI (and I don't for one second believe many of the technology leaders think this, but some smaller orgs likely do) are heading for a very unpleasant realisation soon.

      I will caveat this by saying: there are far too many naff developers out there that genuinely aren't any better than an LLM. And maybe what we need is more regulation around software development, just like there is in proper engineering professions.

      6 replies →

    • Again, I don't think most people are prepared to articulate what behavior they want. Fortran (and any other formal language) used to force this, but now you just kind of jerk off on the keyboard or into the microphone and expect mind-reading.

      Reactionarily? Sure. Maybe AI has some role to play there. Maybe you can ask the chatbot to modify settings.

      I am no fan of chatbots. But i do have empathy for the people responsible for them when their users start complaining that programs don't do what they want, despite the chatbots delivering precisely the code demanded.

      https://youtu.be/5IsSpAOD6K8?si=FtfQZzgRU8K2z4Ub

There is a lot of research on how words/language influences what we think, and even what we can observe, like the Sapir-Whorf hypothesis. If in a langauge there is one word for 2 different colors, speakers of it are unable to see the difference between the colors.

I have a suspicion that extensive use of LLMs can result in damage to your brain. That's why we are seeing so many mental health issues surfacing up, and we are getting a bunch of blog posts about "an agentic coding psychosis".

It could be that llms go from bicycles for the brain to smoking for the brain, once we figure out the long term effects of it.

  • > If in a langauge there is one word for 2 different colors, speakers of it are unable to see the difference between the colors.

    That is quite untrue. It is true that people may be slightly slower or less accurate in distinguishing colors that are within a labeled category than those that cross a category boundary, but that's far from saying they can't perceive the difference at all. The latter would imply that, for instance, English speakers cannot distinguish shades of blue or green.

  • > If in a langauge there is one word for 2 different colors, speakers of it are unable to see the difference between the colors.

    Perhaps you mean to say that speakers are unable to name the difference between the colours?

    I can easily see differences between (for example) different shades of red. But I can't name them other than "shade of red".

    I do happen to subscribe to the Sapir-Whorf hypothesis, in the sense that I think the language you think in constrains your thoughts - but I don't think it is strong enough to prevent you from being able to see different colours.