Comment by iLoveOncall

6 days ago

It is all marketing. The easiest way to tell is that a year ago the same people said the inflection point was X or Y model.

When people claim LLMs just don't work for them, the first question is whether they're using the latest model or not, and if not, dismissing the poster.

The thing is that that same question was being asked a year ago, and even a year before that, but with the models that lead to a dismissal today.

Just make the experiment yourself, wait 6 months, say LLMs just aren't working for the software engineering that you do, and people will dismiss you if you say that you use Opus 4.5 and not the latest model Claude MegaMind 8.8 pro max gigathinking. Despite this model being touted as the inflection point in this article.

I think it's because both sides are talking about different things. If you go in expecting it is good enough to make developers obsolete today(reasonable impression to get from the way a lot of people hype it) you would be disappointed and after first couple of tries every few months you would probably not try it much with next generations. Reasonable if it's considered a dichotomy.

But a lot of people excited about new generations(including me, now) are not seeing it as a dichotomy but rather a spectrum where models are getting better and indeed once a year or even 6 months at times there comes a sudden growth which feels like an inflection point from what came before. Practically, it's a tool like any other, you evaluate it based on if it's worth the effort and cost for the benefit you get from it and if it is and has a good DX you use it. If the calculation doesn't work for you, it doesn't. For me, it has gone from a novelty, to good for some kind of quick manual search, to I guess it can debug some kind of errors at times in very specific conditions, to hey I think I am getting a bit addicted to autocomplete in IDE provided by them even if I don't use them for anything intelligent but it's becoming indispensable now but only this part, to it's good for areas I lack expertise in, to agentic sucks I will stick with discussing algorithms and architecture with it on greenfield projects, to holy shit it can do agentic decently well now, I am skeptic to give it access more than in limited cases, to now I am getting close to letting it run free on my device in not so distant future I guess. Some of these were big jumps, at each point I was skeptical of growth. Everytime I thought now the growth will slow down from days 2k context window to millions now. From basic chat completion to working on complex adaptive systems, game theoretic modelling, heurestics and constraint modelling and other things I throw at it. I am still needed in the loop, it can be so smart at times and then will do something so stupid, but the frequency of stupidity is rapidly decreasing. I am still needed, I don't think it could accomplish alone all that it has done for me. But I do at times at night remain awake reflecting on my self worth for the potential day when I don't add that value. When I have a harder time keeping up.

Also had someone told me not in even 2019 that in 2026 we could have NLP models do what they do today, I would have posited it all as sci-fi and here I am waking up in awe of the world we live in and how quickly we adapt.

  • You're completely twisting what I said. I've never talked about people claiming it's not making developers obsolete. We are obviously extremely far from that. I'm talking about people who say it doesn't work to build basic features in their projects correctly.

    Just take a look at this comment on a different topic, which lists all the pre-requisite for those AI models to work well, from the perspective of someone who has bought into the hype: https://news.ycombinator.com/item?id=48157235

    If this is everything needed for an LLM to generate acceptable code, what is even the point of them?

    • Maybe we come from different cultures and context is harder to grasp just in text so maybe for those reasons your response feels ruder than I hope it was intended to be.

      I am sorry for not being clear in my response but I didn't intend to twist your words. I am not sure where I did so. My response was intended to be a more general remark on the kind of discourse on this topic I see and that I think both sides are right from the context they are looking in with and also why I think both sides come out of this discussion exhausted of the other. Not discounting presence of bad actors but generally I think there are most engaging in good faith like you are probably.

      Coming specifically to respond your last response, I don't think one needs all of these prerequisites to get value out of LLMs. In fact LLMs have helped me untangle some very messy ball of muds on projects where we previously deemed it not worth the effort and basically carried some codebases as legacy. Now we can write enough tests to feel confidence and do a port against those tests all in a span of few days, which we found impressive.

      Now having said all this, I think I understand your perspective a bit better on your original comment.

      While it's a very versatile hammer, if it doesn't work for your use case that's all great. I just think that a bit more patience though with honing it maybe could help you find areas where it could work for you. If not, cheers!

    • That's a list of like 6 things. And each of those less complicated a question then the seven thousand questions people throw at you when you complain about something not working right on a Linux distro or about speeding up build times for a new tool or configuring webpack or like pretty much any software tool. What lint rules are you using are you using poetry or uv are you running on Mac windows linux or wsl how are your security groups configured in aws - some tools are more plug and play but it's quite the stretch to say that asking "how is your code organized, do you have your agents.md config file set up, do you have tests, and how large is the codebase" is some sort of unmanageable list of questions for a software engineer to think through when figuring out wtf is going on with some new tooling they're using

My take is there was one big inflection point around opus 4.5 when they got the agentic stuff working and now whether or not it works depends on whether your use case/area of software engineering is profitable enough for the companies to have spent a bunch of money generating synthetic data to RL on, or if it's similar enough to areas that they've done that for. With similar enough being a very loose constraint given how much overlap there is in a lot of coding fundamentals. Tbh if the models aren't working for you now I don't think they're gonna be working for you in 6 months