← Back to context

Comment by simonw

6 days ago

tptacek wasn't making this argument six months ago.

LLMs get better over time. In doing so they occasionally hit points where things that didn't work start working. "Agentic" coding tools that run commands in a loop hit that point within the past six months.

If your mental model is "people say they got better every six months, therefore I'll never take them seriously because they'll say it again in six months time" you're hurting your own ability to evaluate this (and every other) technology.

> tptacek wasn't making this argument six months ago.

Yes, but other smart people were making this argument six months ago. Why should we trust the smart person we don't know now if we (looking back) shouldn't have trusted the smart person before?

Part of evaluating a claim is evaluating the source of the claim. For basically everybody, the source of these claim is always "the AI crowd", because those outside the AI space have no way of telling who is trustworthy and who isn't.

  • If you automatically lump anyone who makes an argument that AI is capable - not even good for the world on net, just useful in some tasks - into "the AI crowd", you will tautologically never hear that argument from anywhere else. But if you've been paying attention to software development discussion online for a few years, you've plausibly heard of tptacek and kentonv, eg, from prior work. If you haven't heard of them in particular, no judgement, but you gotta have someone you can classify as credible independently of their AI take if you want to be able to learn anything at all from other people on the subject.

  • Thomas is one of the pickier, crankier, least faddish technologists I've ever met. If he has gone fanboy that holds a lot of weight with me.

  • Part of being on Hacker News is learning that there are people in this community - like tptacek - who are worth listening to.

    In general, part of being an effective member of human society is getting good at evaluating who you should listen to and who is just hot air. I collect people who I consider to be credible and who have provided me with useful information in the past. If they start spouting junk I quietly drop them from my "pay attention to these people" list.

But they say "yes, it didn't work 6 months ago, but it does now", and they say this every month. They're constantly resetting the goal post.

Today it works, it didn't in the past, but it does now. Rinse and repeat.

  • It doesn’t really matter what this or that person said six months ago or what they are saying today. This morning I used cursor to write something in under an hour that previously would have taken me a couple of days. That is what matters to me. I gain nothing from posting about my experience here. I’ve got nothing to sell and nothing to prove.

    You write like this is some grand debate you are engaging in and trying to win. But to people on what you see as the other side, there is no debate. The debate is over.

    You drag your feet at your own peril.

    • The thing about people making claims like “An LLM did something for me in an hour that would take me days” is that people conveniently leave out what their own skill level is.

      I’ve definitely seen humans do stuff in an hour that takes others days to do. In fact, I see it all the time. And sometimes, I know people who have skills to do stuff very quickly but they choose not to because they’d rather procrastinate and not get pressured to pick up even more work.

      And some people waste even more time writing stuff from scratch when libraries exist for whatever they’re trying to do, which could get them up and running quickly.

      So really I don’t think these bold claims of LLMs being so much faster than humans hit as hard as some people think they do.

      And here’s the thing: unless you’re using the time you save to fill yourself up with even more work, you’re not really making productivity gains, you’re just using an LLM to acquire more free time on the company dime.

      7 replies →

  • this is only a compelling counter-argument if you are referring to a single, individual person who is saying this repeatedly. and there probably are! but the author of this article is not that person, and is also speaking to a very specific loop that only first truly became prevalent 6-9 months ago.

  • I don’t think this is true actually. There was a huge shift of llm coding ability with the release of sonnet 2.5. That was a real shift in how people started using LMS for coding. Before that it was more of a novelty not something people used a lot for real work. As someone who is not a software engineer, as of about November 2024, I “write” hundreds of lines of code a day for meaningful work to get done.

  • "they say this every month" But I think the commenter is saying "they" comprises many different people, and they can each honestly say, at different times, "LLMs just started working". I had been loving LLMs for solving NLP since they came out, and playing with them all the time, but in my field I've only found them to improve productivity earlier this year (gemini 2.5).

  • Why focus on the 6 months or however long you think the cycle is. The milestones of AI coding are self-explanatory: autocomplete (shit) -> multi-files edit (useful for simple cases) -> agent (feedback loop with rag & tool use), this is where we are.

    Really think about it and ask yourself if it's possible that AI can make any, ANY work a little more efficient?

  • I don't really get this argument. Technology can be improving can't it? You're just saying that people saying it's improving, isn't a great signal. Maybe not, but you still don't conclude that the tech isn't improving, right? If you're old enough, remember the internet was very much hyped. Al gore was involved. But it's probably been every bit as transformative as promised.

    • Technology improving is not the issue.

      1. LLM fanboy: "LLMs are awesome, they can do x, y, and z really well."

      2. LLM skeptic: "OK, but I tried them and found them wanting for doing x, y, and z"

      3. LLM fanboy: "You're doing it wrong. Do it this way ..."

      4. The LLM skeptic goes to try it that way, still finds it unsatisfactory. A few months pass....

      5. LLM fanboy: "Hey, have you tried model a.b.c-new? The problems with doing x, y, and z have now been fixed" (implicitly now agrees that the original complaints were valid)

      6. LLM skeptic: "What the heck, I though you denied there were problems with LLMs doing x, y, and z? And I still have problems getting them to do it well"

      7. Goto 3

      1 reply →

I stopped paying attention for a few days so I'm way out of date. What is the state of the art for agentic coding now?

I've been using Cline and it can do a few of the things suggested as "agentic", but I'd have no idea how to leave it writing and then running tests in a VM and creating a PR for me to review. Or let it roam around in the file tree and create new files as needed. How does that work? Are there better tools for this? Or do I need to configure Cline in some way?

  • tptacek is using Zed, which I've not tried myself.

    I actually do most of my "agentic coding" (not a fan of the term, but whatever) in ChatGPT Code Interpreter, which hasn't changed much in two years other than massive upgrades to the model it uses - I run that mainly via o4-mini-high or o3 these days.

    OpenAI's Codex is a leading new thing, but only if you pay $200/month for it. Google's equivalent https://jules.google/ is currently free.

    GitHub Copilot gained an "agent mode" recently: https://github.blog/ai-and-ml/github-copilot/agent-mode-101-...

    There's also Copilot Coding Agent, which is confusingly an entirely different product: https://github.blog/changelog/2025-05-19-github-copilot-codi...

    • I'd be quite interested in a more formal post with a detailed analysis of the effectiveness of the different agent impls, including Claude Code and Jetbrains Junie.

      Do you use ChatGPT Code Interpreter because it's better, or is it just something you're more familiar with and you're sticking with it for convenience?

      Of course, I don't know how one would structure a suitable test, since doing it sequentially would likely bias the later agents with clearer descriptions & feedback on the tasks. I imagine familiarity with how to prompt each particular model is also a factor.

      1 reply →

  • The current state of agent in the last batch of project launches (copilot agent, jules, devin...) is to take over and do things in a PR like you want. However the vedict is still out there in terms of whether these implementation prove more useful than agentic code in an IDE.

Have the models significantly improved, or have we just developed new programs that take better advantage of them?

  • Both, but it’s mostly the models. The programs like Claude Code are actually simpler than the ones from before because of this.

[flagged]

  • How is this "appeal to authority"?

    I'm countering the argument that "people say the same thing every six months" by arguing that the individual in question did NOT say the same thing six months ago.