Comment by kace91

1 day ago

These articles keep popping up, analyzing an hypothetical usage of AI (and guessing it won’t be useful) as if it wasn’t something already being used in practice. It’s kinda weird to me.

“It won’t deal with abstractions” -> try asking cursor for potential refactors or patterns that could be useful for a given text.

“It doesn’t understand things beyond the code” -> try giving them an abstract jira ticket or asking what it things about certain naming, with enough context

“Reading code and understanding whether it’s wrong will take more time than writing it yourself” -> ask any engineer that saves time with everything from test scaffolding to run-and-forget scripts.

It’s as if I wrote an article today arguing that exercise won’t make you able to lift more weight - every gymgoer would raise an eyebrow, and it’s hard to imagine even the non-gymgoers would be sheltered enough to buy the argument either.

While I tend to agree with your premise that the linked article seems to be reasoning to the extreme off the basis of a very small code snippet, I think the core critique the author wants to make stands:

AI agents alone, unbounded, currently cannot provide huge value.

> try asking cursor for potential refactors or patterns that could be useful for a given text.

You, the developer, will be selecting this text.

> try giving them an abstract jira ticket or asking what it things about certain naming, with enough context

You still selected a JIRA ticket and provided context.

> ask any engineer that saves time with everything from test scaffolding to run-and-forget scripts.

Yes that is true, but again, what you are providing as a counterfactual are very bounded, aka easy contexts.

In any case, the industry (both the LLM providers as well as tooling builders and devs) is clearly going into the direction of constantly etching out small imoprovements by refining which context is deemed relevant for a given problem and most efficient ways to feed it to LLMs.

And let's not kid ourselves, Microsoft, OpenAI, hell Anthropic all have 2027-2029 plans where these things will be significantly more powerful.

  • Here's an experience I've had with Claude Code several times:

    1. I'll tell Claude Code to fix a bug.

    2. Claude Code will fail, and after a few rounds of explaining the error and asking it to try again, I'll conclude this issue is outside the AI's ability to handle, and resign myself to fixing it the old fashioned way.

    3. I'll start actually looking into the bug on my own, and develop a slightly deeper understanding of the problem on a technical level. I still don't understand every layer to the point where I could easily code a solution.

    4. I'll once again ask Claude Code to fix the bug, this time including the little bit I learned in #3. Claude Code succeeds in one round.

    I'd thought I'd discovered a limit to what the AI could do, but just the smallest bit of digging was enough to un-stick the AI, and I still didn't have to actually write the code myself.

    (Note that I'm not a professional programmer and all of this is happening on hobby projects.)

    • > I once again ask Claude Code to fix the bug, this time including the little bit I learned in #3. Claude Code fixes the problem in one round.

      Context is king, which makes sense since LLM output is based on probability. The more context you can provide it, the more aligned the output will be. It's not like it magically learned something new. Depending on the problem, you may have to explain exactly what you want. If the problem is well understood, a sentence would most likely be suffice.

      2 replies →

    • I had Claude go into a loop because I have cat aliased as bat

      It wanted to check a config json file, noticed that it had missing commas between items (because bat prettifies the json) and went into a forever loop of changing the json to add the commas (that were already there) and checking the result by 'cat'ing the file (but actually with bat) and again finding out they weren't there. GOTO 10

      The actual issue was that Claude had left two overlapping configuration parsing methods in the code. One with Viper (The correct one) and one 1000% idiotic string search system it decided to use instead of actually unmarshaling the JSON :)

      I had to use pretty explicit language to get it stop fucking with the config file and look for the issue elsewhere. It did remember it, but forgot on the next task of course. I should've added the fact to the rule file.

      (This was a vibe coding experiment, I was being purposefully obtuse about not understanding the code)

  • Why does it matter that you're doing the thinking? Isn't that good news? What we're not doing any more is any the rote recitation that takes up most of the day when building stuff.

    • I think "AI as a dumb agent for speeding up code editing" is kind of a different angle and not the one I wrote the article to address.

      But, if it's editing that's taking most of your time, what part of your workflow are you spending the most time in? If you're typing at 60WPM for an hour then that's over 300 lines of code in an hour without any copy and paste which is pretty solid output if it's all correct.

      6 replies →

  • In lots of jobs, the person doing work is not the one selecting text or the JIRA ticket. There's lots of "this is what you're working on next" coding positions that are fully managed.

    But even if we ignored those, this feels like goalpost moving. They're not selecting the text - ok, ask LLM what needs refactoring and why. They're not selecting the JIRA ticket with context? Ok, provide MCP to JIRA, git and comms and ask it to select a ticket, then iterate on context until it's solvable. Going with "but someone else does the step above" applies to almost everyone's job as well.

  •    >etching out
    

    Could you explain what you mean by etching out small improvements? I've never seen the phrase "etching out" before.

  • I think maybe you have unrealistic expectations.

    Yesterday I needed to import a 1GB CSV into ClickHouse. I copied the first 500 lines into Claude and asked it for a CREATE TABLE and CLI to import the file. Previous day I was running into a bug with some throw-away code so I pasted the error and code into Claude and it found the non-obvious mistake instantly. Week prior it saved me hours converting some early prototype code from React to Vue.

    I do this probably half a dozen times a day, maybe more if I'm working on something unfamiliar. It saves at a minimum an hour a day by pointing me in the right direction - an answer I would have reached myself, but slower.

    Over a month, a quarter, a year... this adds up. I don't need "big wins" from my LLM to feel happy and productive with the many little wins it's giving me today. And this is the worst it's ever going to be.

Out if interest, what kind of codebases are you able to get AI to do these things on. Everytime I have tried it with even simpler things than these it has failed spectacularly. Every example I see of people doing this kind of thing seems to be on some kind if web development so I have a hypothesis that AI might currently be much worse for the kinds of codebases I work on.

  • I currently work for a finance-related scaleup. So backend systems, with significant challenges related to domain complexity and scalability, but nothing super low level either.

    It does take a bit to understand how to prompt in a way that the results are useful, can you share what you tried so far?

    • I have tried on a lot of different projects.

      I have a codebase in Zig and it doesn't understand Zig at all.

      I have another which is embedded C using zephyr RTOS. It doesn't understand zephyr at all and even if it could, it can't read the documentation for the different sensors nor can it plug in cables.

      I have a tui project in rust using ratatui. The core of the project is dealing with binary files and the time it takes to explain to it how specific bits of data are organised in the file and then check it got everything perfectly correct (it never has) is more than the time to just write the code. I expect I could have more success on the actual TUI side of things but haven't tried too much since I am trying to learn rust with this project.

      I just started an android app with flutter/dart. I get the feeling it will work well for this but I am yet to verify since I need to learn enough flutter to be able to judge it

      My dayjob is a big C++ codebase making a GUI app with Qt. The core of it is all dealing with USB devices and Bluetooth protocols which it doesn't understand at all. We also have lots of very complicated C++ data structures, I had hoped that the AI would be able to at least explain them to me but it just makes stuff up everytime. This also means that getting it to edit any part of the codebase touching this sort if thing doesn't work. It just rips up any thread safety or allocates memory incorrectly etc. It also doesn't understand the compiler errors at all, I had a circular dependency and tried to get it to solve it but I had to give so many clues I basically told it what the problem was.

      I really expected it to work very well for the Qt interface since building UI is what everyone seems to be doing with it. But the amount of hand holding it requires is insane. Each prompt feels like a monkey's paw. In every experiment I've done it would have been faster to just write it myself. I need to try getting it to write an entirely new pice of UI from scratch since I've only been editing existing UI so far.

      Some of this is clearly a skill issue since I do feel myself getting better at prompting it and getting better results. However, I really do get the feeling that it either doesn't work or doesn't work as well on my code bases as other ones.

      2 replies →

  • I work in Python, Swift, and Objective-C. AI tools work great in all of these environment. It's not just limited to web development.

    • I suppose saying that I've only seen it in web development is a bit of an exaggeration. It would be more accurate to say that I haven't seen any examples of people using AI on a codebase that looks like on of the ones I work on. Clearly I am biased just lump all the types of coding I'm not interested in into "web development"

      1 reply →

  • That’s my experience too. It also fails terribly with ElasticSearch probably because the documentation doesn’t have a lot of examples. ChatGPT, copilot and claude were all useless for that and gave completely plausible nonsense. I’ve used it with most success for writing unit tests and definitely shell scripts.

Agreed. It isn’t like crypto where the proponents proclaimed some use case that would prove value always on the verge of arriving. AI is useful right now. People are using these tools now and enjoying them.

  • > Observer bias is the tendency of observers to not see what is there, but instead to see what they expect or want to see.

    Unfortunately, people enjoying a thing and thinking that it works well doesn't actually mean much on its own.

    But, more than that I suspect that AI is making more people realize that they don't need to write everything themselves, but they never needed to to begin with, and they'd be better off to do the code reuse thing in a different way.

  • I'm not sure that's a convincing argument given that crypto heads haven't just been enthusiastically chatting about the possibilities in the abstract. They do an awful lot of that, see Web3, but they have been using crypto.

  • Even in 2012 bitcoin could very concretely be used to order drugs. Many people have used it to transact and preserve value in hostile economic environments. Etc etc. Ridiculous comment.

    Personally i have still yet to find LLMs useful at all with programming.

  • I don't (use AI tools), I've tried them and found that they got in the way, made things more confusing, and did not get me to a point where the thing I was trying to create was working (let alone working well/safe to send to prod)

    I am /hoping/ that AI will improve, to the point that I can use it like Google or Wikipedia (that is, have some trust in what's being produced)

    I don't actually know anyone using AI right now. I know one person on Bluesky has found it helpful for prototyping things (and I'm kind of jealous of him because he's found how to get AI to "work" for him).

    Oh, I've also seen people pasting AI results into serious discussions to try and prove the experts wrong, but only to discover that the AI has produced flawed responses.

    • If you are interested, try the following experiment.

      Presuming you are logged onto a google account, log on to Gemini 2.5 then ask it “create a go program that connects one thread to a usb device and another thread to generate a GUI”. The results might surprise you.

    • Essentially the same for me, I had one incident where someone was arguing in favor of it and then immediately embarrassed themselves badly because they were misled by a chatgpt error. I have the feeling that this hype will collapse as this happens more and people see how bad the consequences are when there are errors

If AI gives a bad experience 20% of the time, and if there are 10M programmers using it, then about 3000 of them will have a bad experience 5 times in a row. You can't really blame them for giving up and writing about it.

It’s all good to me - let these folks stay in the simple times while you and i arbitrage our efforts against theirs? I agree, there’s massive value in using these tools and it’s hilarious to me when others don’t see it. My reaction isn’t going to be convince them they’re wrong, it’s just to find ways to use it to get ahead while leaving them behind.

I need some information/advice -> I feed that into an imprecise aggregator/generator of some kind -> I apply my engineering judgement to evaluate the result and save time by reusing someone's existing work

This _is_ something that you can do with AI, but it's something that a search engine is better suited to because the search engine provides context that helps you do the evaluation, and it doesn't smash up results in weird and unpredictable ways.

Y'all think that AI is "thinking" because it's right sometimes, but it ain't thinking.

If I search for "refactor <something> to <something else>" and I get good results, that doesn't make the search engine capable of abstract thought.

  • AI is usually a better search engine than a search engine.

    • AI alone can't replace a search engine very well at all.

      AI with access to a search engine may be present a more useful solution to some problems than a bare search engine, but the AI isn't replacing a search engine it is using one.

      2 replies →

  • This seems like a great example of someone reasoning from first principles that X is impossible, while someone else doing some simple experiments with an open mind can easily see that X is both possible and easily demonstrated to be so.

    Y'all think that AI is "thinking" because it's right sometimes, but it ain't thinking.

    I know the principles of how LLMs work, I know the difference between anthropomorphizing them and not. It's not complicated. And yet I still find them wildly useful.

    YMMV, but it's just lazy to declare that anyone who sees it differently than you just doesn't understand how LLMs work.

    Anyway, I could care less if others avoid coding with LLMs, I'll just keep getting shit done.

weird metaphor, because a gym goer practices what they are doing by putting in the reps in order to increase personal capacity. it's more like you're laughing at people at the gym, saying "don't you know we have forklifts already lifting much more?"

  • That’s a completely different argument, however, and a good one to have.

    I can buy “if you use the forklift you’ll eventually lose the ability to lift weight by yourself”, but the author is going for “the forklift is actually not able to lift anything” which can trivially be proven wrong.

    • More like, "We had a nice forklift, but the boss got rid of it replaced it with a pack of rabid sled dogs which work sometimes? And sometimes they can also sniff out expiration dates on the food (although the boxes were already labeled?). And, I'm pretty sure one of them, George, understands me when I talk to him because the other day I asked him if he wanted a hotdog and he barked (of course, I was holding a hotdog at the time). But, anyway, we're using the dogs, so they must work? And I used to have to drive the forklift, but the dogs just do stuff without me needing to drive that old forklift"

  • I see it as almost the opposite. It’s like the pulley has been invented but some people refuse to acknowledge its usefulness and make claims that you’re weaker if you use it. But you can grow quite strong working a pulley all day.

  • "If you want to be good at lifting, just buy an exoskeleton like me and all my bros have. Never mind that your muscles will atrophy and you'll often get somersaulted down a flight of stairs while the exoskeleton makers all keep trying, and failing, to contain the exoskeleton propensity for tossing people down flights of stairs."

It's the barstool economist argument style, on long-expired loan from medieval theology. Responding to clear empirical evidence that X occurs: "X can't happen because [insert 'rational' theory recapitulation]"

There are people at the gym I go to benching 95 lbs and asking what does it take to get to 135, or 225? The answer is "lift more weight" not "have someone help you lift more weight"

If you already know how to code, yes AI/LLMs can speed you along at certain tasks, though be careful you don't let your skills atrophy. If you can bench 225 and then you stop doing it, you soon will not be able to do that anymore.

  • > If you already know how to code, yes AI/LLMs can speed you along at certain tasks, though be careful you don't let your skills atrophy.

    This isn't a concern. Ice-cutting skills no longer have value, and cursive writing is mostly a 20th century memory. Not only have I let my assembly language skills atrophy, but I'll happily bid farewell to all of my useless CS-related skills. In 10 years, if "app developer" still involves manual coding by then, we'll talk about coding without an AI partner like we talk about coding with punch cards.

    • Maybe. I've seen a lot of "in 10 years..." predictions come and go and I'm still writing code pretty much the same way I did 40 years ago: in a terminal, in a text editor.

I don’t think the argument from such a simple example does much for the authors point.

The bigger risk is skill atrophy.

Proponents say, it doesn’t matter. We shouldn’t have to care about memory allocation or dependencies. The AI system will eventually have all of the information it needs. We just have to tell it what we want.

However, knowing what you want requires knowledge about the subject. If you’re not a security engineer you might not know what funny machines are. If someone finds an exploit using them you’ll have no idea what to ask for.

AI may be useful for some but at the end of the day, knowledge is useful.

I don't know. Cursor is decent at refactoring. ("Look at x and ____ so that it ____." With some level of elaboration, where the change is code or code organization centric.)

And it's okay at basic generation - "write a map or hash table wrapper where the input is a TZDB zone and the output is ______" will create something reasonable and get some of the TZDB zones wrong.

But it hasn't been that great for me at really extensive conceptual coding so far. Though maybe I'm bad at prompting.

Might be there's something I'm missing w/ my prompts.

  • For me, the hard part of programming is figuring out what I want to do. Sometimes talking with an AI helps with that, but with bugs like “1 out of 100 times a user follows this path, a message gets lost somewhere in our pipeline”, which are the type of bugs that really require knowledge and skill, AIs are completely worthless.

Well said. It's not that there would not be much to seriously think about and discuss – so much is changing, so quickly – but the stuff that a lot of these articles focus is a strange exercise in denial.

There really is a category of these posts that are coming from some alternate dimension (or maybe we're in the alternate dimension and they're in the real one?) where this isn't one of the most important things ever to happen to software development. I'm a person who didn't even use autocomplete (I use LSPs almost entirely for cross-referencing --- oh wait that's another thing I'm apparently never going to need to do again because of LLMs), a sincere tooling skeptic. I do not understand how people expect to write convincingly that tools that reliably turn slapdash prose into median-grade idiomatic working code "provide little value".

  • > I do not understand how people expect to write convincingly that tools that reliably turn slapdash prose into median-grade idiomatic working code "provide little value".

    Honestly, I'm curious why your experience is so different from mine. Approximately 50% of the time for me, LLMs hallucinate APIs, which is deeply frustrating and sometimes costs me more time than it would have taken to just look up the API. I still use them regularly, and the net value they've imparted has been overall greater than zero, but in general, my experience has been decidedly mixed.

    It might be simply that my code tends to be in specialized areas in which the LLM has little training data. Still, I get regular frustrating API hallucinations even in areas you'd think would be perfect use cases, like writing Blender plugins, where the documentation is poor (so the LLM has a relatively higher advantage over reading the documentation) and examples are plentiful.

    Edit: Specifically, the frustrating pattern is: (1) the LLM produces some code that contains hallucinated APIs; (2) in order to test (or even compile) that code, I need to write some extra supporting code to integrate it into my project; (3) I discover that the APIs were hallucinated because the code doesn't work; (4) now I not only have to rewrite the LLM's code, but I also have to rewrite all the supporting code I wrote, because it was based around a pattern that didn't work. Overall, this adds up to more time than if I had just written the code from scratch.

    • You're writing Rust, right? That's probably the answer.

      The sibling comment is right though: it matters hugely how you use the tools. There's a bunch of tricks that help and they're all kind of folkloric. And then you hear "vibe coding" stories of people who generate their whole app from a prompt, looking only at the outputs; I might generate almost my whole project from an LLM, but I'm reading every line of code it spits out and nitpicking it.

      "Hallucination" is a particularly uninteresting problem. Modern LLM coding environments are closed-loop ("agentic", barf). When an LLM "hallucinates" (ie: is wrong, like I am many times a day) about something, it figures it out pretty quick when it tries to build and run it!

      2 replies →

    • One of the frustrating things about talking about this is that the discussion often sounds like we're all talking about the same thing when we talk about "AI".

      We're not.

      Not only does it matter what language you code in, but the model you use and the context you give it also matter tremendously.

      I'm a huge fan of AI-assisted coding, it's probably writing 80-90% of my code at this point, but I've had all the same experiences that you have, and still do sometimes. There's a steep learning curve to leveraging AIs effectively, and I think a lot of programmers stop before they get far enough along on that curve to see the magic.

      For example, right now I'm coding with Cursor and I'm alternating between Claude 3.7 max, Gemini 2.5 pro max, and o3. They all have their strengths and weaknesses, and all cost for usage above the monthly subscription. I'm spending like $10 per day on these models at the moment. I could just use the models included with the subscription, but they tend to hallucinate more, or take odd steps around debugging, etc.

      I've also got a bunch of documents and rules setup for Cursor to guide it in terms of what kinds of context to include for the model. And on top of that, there are things I'm learning about what works best in terms of how to phrase my requests, what to emphasize or tell the model NOT to do, etc.

      Currently I usually start by laying out as much detail about the problem as I can, pointing to relevant files or little snippets of other code, linking to docs, etc, and asking it to devise a plan for accomplishing the task, but not to write any code. We'll go back and forth on the plan, then I'll have it implement test coverage if it makes sense, then run the tests and iterate on the implementation until they're green.

      It's not perfect, I have to stop it and backup often, sometimes I have to dig into docs and get more details that I can hand off to shape the implementation better, etc. I've cursed in frustration at whatever model I'm using more than once.

      But overall, it helps me write better code, faster. I never could have built what I've built over the last year without AI. Never.

      1 reply →

  • > tools that reliably turn slapdash prose into median-grade idiomatic working code

    This may be the crux of it.

    Turning slapdash prose into median-grade code is not a problem I can imagine needing to solve.

    I think I'm better at describing code in code than I am in prose.

    I Want to Believe. And I certainly don't want to be "that guy", but my honest assessment of LLMs for coding so far is that they are a frustrating Junior, who maybe I should help out because mentoring might be part of my job, but from whom I should not expect any near-term technical contribution.

> “It won’t deal with abstractions” -> try asking cursor for potential refactors or patterns that could be useful for a given text.

That is not what abstraction is about. Abstraction is having a simpler model to reason about, not simply code rearranging.

> “It doesn’t understand things beyond the code” -> try giving them an abstract jira ticket or asking what it things about certain naming, with enough context

Again, that is still pretty much coding. What matters is the overall design (or at least the current module).

> “Reading code and understanding whether it’s wrong will take more time than writing it yourself” -> ask any engineer that saves time with everything from test scaffolding to run-and-forget scripts.

Imagine having a script and not checking the man pages for expected behavior. I hope the backup games are strong.