Comment by micw

1 day ago

For me, AI is an enabler for things you can't do otherwise (or that would take many weeks of learning). But you still need to know how to do things properly in general, otherwise the results are bad.

E.g. I'm a software architect and developer for many years. So I know already how to build software but I'm not familiar with every language or framework. AI enabled me to write other kind of software I never learned or had time for. E.g. I recently re-implemented an android widget that has not been updated for a decade by it's original author. Or I fixed a bug in a linux scanner driver. None of these I could have done properly (within an acceptable time frame) without AI. But also none of there I could have done properly without my knowledge and experience, even with AI.

Same for daily tasks at work. AI makes me faster here, but also makes me doing more. Implement tests for all edge cases? Sure, always, I saved the time before. More code reviews. More documentation. Better quality in the same (always limited) time.

I use Claude Code a lot but one thing that really made me concerned was when I asked it about some ideas I have had which I am very familiar with. It's response was to constantly steer me away from what I wanted to do towards something else which was fine but a mediocre way to do things. It made me question how many times I've let it go off and do stuff without checking it thoroughly.

  • I've had quite a bit of the "tell it to do something in a certain way", it does that at first, then a few messages of corrections and pointers, it forgets that constraint.

    • > it does that at first, then a few messages of corrections and pointers, it forgets that constraint.

      Yup, most models suffer from this. Everyone is raving about million tokens context, but none of the models can actually get past 20% of that and still give as high quality responses as the very first message.

      My whole workflow right now is basically composing prompts out of the agent, let them run with it and if something is wrong, restart the conversation from 0 with a rewritten prompt. None of that "No, what I meant was ..." but instead rewrite it so the agent essentially solves it without having to do back and forth, just because of this issue that you mention.

      Seems to happen in Codex, Claude Code, Qwen Coder and Gemini CLI as far as I've tested.

      7 replies →

  • Call me a conspiracy theorist, and granted much of this could be attributed to the fact that the majority of code in existence is shit, but im convinced that these models are trained and encouraged to produce code that is difficult for humans to work on. Further driving and cementing the usage of then when you inevitably have to come back and fix it.

    • I don't think they would be able to have an LLM withouth the flaws. The problem is that an LLM cannot make a distinction between sense and nonsense in the logical way. If you train an LLM on a lot of sensible material, it will try to reproduce it by matching training material context and prompt context. The system does not work on the basis of logical principles, but it can sound intelligent.

      I think LLM producers can improve their models by quite a margin if customers train the LLM for free, meaning: if people correct the LLM, the companies can use the session context + feedback to as training. This enables more convincing responses for finer nuances of context, but it still does not work on logical principles.

      LLM interaction with customers might become the real learning phase. This doesn't bode well for players late in the game.

    • This could be the case even without an intentional conspiracy. It's harder to give negative feedback to poor quality code that's complicated vs. poor quality code that's simple.

      Hence the feedback these models get could theoretically funnel them to unnecessarily complicated solutions.

      No clue has any research been done into this, just a thought OTTOMH.

    • Or it takes a lot of time effort and intelligence to produce good code and IA is not there yet…

  • Mediocre is fine for many tasks. What makes a good software engineer is that he spots the few places in every software where mediocre is not good enough.

Yes but in my experience this sometimes works great, other times you paint yourself in a corner and the sun total is that you still have to learn the thing, just the initial ram is less steep. For example I build my self a nice pipeline for converting jpegs on disk to h264 on disk via zero-copy nvjpeg to nvenc, with python bindings but have been pulling out my hair over bframe ordering and weird delays in playback etc. Nothing u solvable but I had to learn a great deal and when we were in the weeds, Opus was suggesting stupid hack quick fixes that made a whack a mole with the tests. In the end I had to lead e Pugh and read enough to be able to ask it with the right vocabulary to make it work. Similarly with entering many novel areas. Initially I get a rush because it "just works" but it really only works for the median case initially and it's up to you to even know what to test. And AIs can be quite dismissive of edge cases like saying this will not happen in most cases so we can skip it etc.

  • Yeah, knowing what words to use is half the battle. Quickly throw away a prompt like "Hey, `make build` takes five minutes, could you make it fast enough to run under 1 minute" and the agent will do some work and say "Done, now the build takes 25 seconds as we're skipping the step of building the images, use `make build INCLUDE_IMAGES=true` when you want to build with images". It's not wrong, given the prompt, but takes a bit to get used to how they approach things.

Huh. I'm extremely skeptical of AI in areas where I don't have expertise, because in areas where I do have expertise I see how much it gets wrong. So it's fine for me to use it in those areas because I can catch the errors, but I can't catch errors in fields I don't have any domain expertise in.

  • I feel the same way. LLMs errors sound most plausible to those who know least.

    On complex topics where I know what I'm talking about, model output contains so much garbage with incorrect assumptions.

    But complex topics where I'm out of my element, the output always sounds strangely plausible.

    This phenomenon writ large is terrifying.

I'm in the same boat. I've been taking on much more ambitious projects both at work and personally by collaborating with LLMs. There are many tasks that I know I could do myself but would require a ton of trial and error.

I've found giving the LLMs the input and output interfaces really help keep them on rails, while still being involved in the overall process without just blindly "vibe coding."

Having the AI also help with unit tests around business logic has been super helpful in addition to manual testing like normal. It feels like our overall velocity and code quality has been going up regardless of what some of these articles are saying.

  • How granular do you go with the interfaces? Full function signatures + types, or more like module-level contracts.

    wondering what sort of artifacts beyond ADR/natural language prompts help steer LLMs to do the right thing

  • 100% agree with AI expanding core testing from my own edge and key tests.

    I agree, I write out the sketch of what I want. With a recent embedded project in C I gave it a list of function signatures and high level description and was very satisfied with what it produced. It would have taken me days to nail down the particulars of the HAL (like what kind of sleep do I want what precisely is the way to setup the WDT and ports).

    I think it's also language dependent.

    I imagine JavaScript can be a crap shoot. The language is too forgiving.

    Rust is where I have had most success. That is likely a personal skill issue, I know we want a Arc<DashMap>, will I remember all the foibles of accessing it? No.

    But given the rigidity of the compiler and strong typing I can focus on what the code functionally is doing, that in happy with the shape/interface and function signature and the compiler is happy with the code.

    It's quite fast work. It lets me use my high level skills without my lower level skills getting in the way.

    And id rather rewrite the code at a mid-level then start it fresh, and agree with others once it's a large code base then in too far behind in understanding the overall system to easily work on it. That's true of human products too - someone elses code always gives me the ick.

    • Vanilla javascript is hit or miss for anything complex.

      Using Typescript works great because you can still build out the interfaces and with IDE integrations the AIs can read the language server results so they get all the type hints.

      I agree that the AI code is usually a pretty good starting point and gets me up to speed for new features fast rather than starting everything from scratch. I usually end up refactoring the last 10-20% manually to give it some polish because some of the code still feels off some times.

In my case I built a video editing tool fully customized for a community of which I am a member. I could do it in a few hours. I wouldn't have even started this project as I don't have much free time, though I have been coding for 25+ years.

I see it empowering to build custom tooling which need not be a high quality maintenance project.

> Or I fixed a bug in a linux scanner driver. None of these I could have done properly (within an acceptable time frame) without AI. But also none of there I could have done properly without my knowledge and experience, even with AI

There are some things here that folks making statements like yours often omit and it makes me very sus about your (over)confidence. Mostly these statements talk in a business short-term results oriented mode without mentioning any introspective gains (see empirically supported understanding) or long-term gains (do you feel confident now in making further changes _without_ the AI now that you have gained new knowledge?).

1. Are you 100% sure your code changes didn't introduce unexpected bugs?

1a. If they did, would you be able to tell if they where behaviour bugs (ie. no crashing or exceptions thrown) without the AI?

2. Did you understand why the bug was happening without the AI giving you an explanation?

2a. If you didn't, did you empirically test the AI's explanation before applying the code change?

3. Has fixing the bug improved your understanding of the driver behaviour beyond what the AI told you?

3a. Have you independently verified your gained understanding or did you assume that your new views on its behaviour are axiomatically true?

Ultimately, there are 2 things here: one is understanding the code change (why it is needed, why that particular change implementation is better relative to others, what future improvements could be made to that change implementation in the future) and skill (has this experience boosted your OWN ability in this particular area? in other words, could you make further changes WITHOUT using the AI?).

This reminds me of people that get high and believe they have discovered these amazing truths. Because they FEEL it not because they have actual evidence. When asked to write down these amazing truths while high, all you get in the notes are meaningless words. While these assistants are more amenable to get empirically tested, I don't believe most of the AI hypers (including you in that category) are actually approaching this with the rigour that it entails. It is likely why people often think that none of you (people writing software for a living) are experienced in or qualified to understand and apply scientific principles to build software.

Arguably, AI hypers should lead with data not with anecdotal evidence. For all the grandiose claims, the lack of empirical data obtained under controlled conditions on this particular matter is conspicuous by its absence.

  • It's incredible that within two minutes after posting this comment is already grayed out whereas it makes a number of excellent points.

    I've been playing with various AI tools and homebrew setups for a long time now and while I see the occasional advantage it isn't nearly as much of a revolution as I've been led to believe by a number of the ardent AI proponents here.

    This is starting to get into 'true believer' territory: you get these two camps 'for and against' whereas the best way forward is to insist on data rather than anecdotes.

    AI has served me well, no doubt about that. But it certainly isn't a passe-partout and the number of times it has caused gross waste of time because it insisted on chasing some rabbit simply because it was familiar with the rabbit adds up to a considerable loss in productivity.

    The scientific principle is a very powerful tool in such situations and anybody insisting on it should be applauded. It separates fact from fiction and allows us to make impartial and non-emotional evaluations of both theories and technologies.

    • > (...) you get these two camps 'for and against' whereas the best way forward is to insist on data rather than anecdotes.

      I think that's an issue with online discussions. It barely happens to me in the real world, but it's huge on HN.

      I'm overall very positive about AI, but I also try to be measured and balanced and learn how to use it properly. Yet here on HN, I always get the feeling people responding to me have decided I am a "true believer" and respond to the true believer persona in their head.

  • Thanks for pointing these things out. I always try to learn and understand the generated code and changes. Maybe not so deep for the android app (since it's just my own pet project). But especially for every pull request to a project. Everyone should do this out of respect to the maintainers who review the change.

    > Are you 100% sure your code changes didn't introduce unexpected bugs?

    Who is this ever? But I do code reviews and I usually generate a bunch of tests along with my PRs (if the project has at lease _some_ test infrastructure).

    Same applies for the rest of the points. But that's only _my_ way to do these things. I can imagine that others do it a different way and that the points above are more problematic then.

    • > I always try to learn and understand the generated code and changes

      Not to be pedantic but, do you _try_ to understand? Or do you _actually_ understand the changes? This suggests to me that there are instances where you don't understand the generated code on projects others than your own, which is literally my point and that of many others. And even if you did understand it, as I pointed out earlier, that's not enough. It is a low bar imo. I will continue to keep my mind open but yours isn't a case study supporting the use of these assistants but the opposite.

      In science, when a new idea is brought forward, it gets grilled to no end. The greater the potential the harder the grilling. Software should be no different if the builders want to lay a claim on the name "engineer". It is sad to see a field who claims to apply scientific principles to the development of software not walking the walk.

  • >Arguably, AI hypers should lead with data not with anecdotal evidence

    This reminds me of people who get sad when they realize they haven’t discovered anything amazing.

    I am pedantic and “people that” → “people who” (for people, who is preferred).

  • > 1. Are you 100% sure your code changes didn't introduce unexpected bugs?

    How often have you written code and been 100% your code didn't introduce ANY bugs?

    Seriously, for most of the code out there who cares? If it's in a private or even public repo, it doesn't matter.

I think what we'll see as AI companies collect more usage data the requirements for knowing what you do will sink lower and lower. Whatever advantage we have now is transient.

Also most of the studies shown start to be obsolete with AI rapid path of improvements. Opus 4.5 has been a huge game changer for me (combined with CC that I had not used before) since December. Claude code arrived this summer if I’m not mistaken.

So I’m not sure a study from 2024 or impact on code produced during 2024 2025 can be used to judge current ai coding possibilities.

> But you still need to know how to do things properly in general, otherwise the results are bad.

Even that could use some nuance. I'm generating presentations in interactive JS. If they work, they work - that's the result, and I extremely don't care about the details for this use case. Nobody needs to maintain them, nobody cares about the source. There's no need for "properly" in this case.

  • Agree. Or internal or personal tooling where it doesn't beg to be perfect and you can (1) verify by observing behavior of the software -- if it works it works, and (2) deal with edge cases as they arise because it's not mission critical.

I've found this is exact opposite of what I'd dare do with AI, things you don't understand are things you can't verify. Consider you want a windowed pane for your cool project, so you ask an AI to draft a design. It looks cool and it works! Until you bring it outside where after 30 minutes it turns into explosive shrapnel, because the model didn't understand thermal expansion, nor did you.

Contrast this to something you do know but can't be arsed to make; you can keep re-rolling a design until you get something you know and can confirm works. Perfect, time saved.