Comment by tombert

1 day ago

I find it a bit odd that people are acting like this stuff is an abject failure because it's not perfect yet.

Generative AI, as we know it, has only existed ~5-6 years, and it has improved substantially, and is likely to keep improving.

Yes, people have probably been deploying it in spots where it's not quite ready but it's myopic to act like it's "not going all that well" when it's pretty clear that it actually is going pretty well, just that we need to work out the kinks. New technology is always buggy for awhile, and eventually it becomes boring.

> Generative AI, as we know it, has only existed ~5-6 years, and it has improved substantially, and is likely to keep improving.

Every 2/3 months we're hearing there's a new model that just blows the last one out of the water for coding. Meanwhile, here I am with Opus and Sonnet for $20/mo and it's regularly failing at basic tasks, antigravity getting stuck in loops and burning credits. We're talking "copy basic examples and don't hallucinate APIs" here, not deep complicated system design topics.

It can one shot a web frontend, just like v0 could in 2023. But that's still about all I've seen it work on.

  • You’re doing exactly the thing that the parent commenter pointed out: Complaining that they’re not perfect yet as if that’s damning evidence of failure.

    We all know LLMs get stuck. We know they hallucinate. We know they get things wrong. We know they get stuck in loops.

    There are two types of people: The first group learns to work within these limits and adapt to using them where they’re helpful while writing the code when they’re not.

    The second group gets frustrated every time it doesn’t one-shot their prompt and declares it all a big farce. Meanwhile the rest of us are out here having fun with these tools, however limited they are.

    • Someone else said this perfectly farther down:

      > The whole discourse around LLMs is so utterly exhausting. If I say I don't like them for almost any reason, I'm a luddite. If I complain about their shortcomings, I'm just using it wrong. If I try and use it the "right" way and it still gets extremely basic things wrong, then my expectations are too high.

      As I’ve said, I use LLMs, and I use tools that are assisted by LLMs. They help. But they don’t work anywhere near as reliably as people talk about them working. And that hasn’t changed in the 18 months since I first promoted v0 to make me a website.

      5 replies →

  • Sure, but think about what it's replacing.

    If you hired a human, it will cost you thousands a week. Humans will also fail at basic tasks, get stuck in useless loops, and you still have to pay them for all that time.

    For that matter, even if I'm not hiring anyone, I will still get stuck on projects and burn through the finite number of hours I have on this planet trying to figure stuff out and being wrong for a lot of it.

    It's not perfect yet, but these coding models, in my mind, have gotten pretty good if you're specific about the requirements, and even if it misfires fairly often, they can still be useful, even if they're not perfect.

    I've made this analogy before, but to me they're like really eager-to-please interns; not necessarily perfect, and there's even a fairly high risk you'll have to redo a lot of their work, but they can still be useful.

    • I am an AI-skeptic but I would agree this looks impressive from certain angles, especially if you're an early startup (maybe) or you are very high up the chain and just want to focus on cutting costs. On the other hand, if you are about to be unemployed, this is less impressive. Can it replace a human? I would say no its still long way to go, but a good salesman can convince executives that it does and thats all that matters.

      3 replies →

    • You’ve missed my point here - I agree that gen AI has changed everything and is useful, _but_ I disagree that it’s improved substantially - which is what the comment I replied to claimed.

      Anecdotally I’ve seen no difference in model changes in the last year, but going from LLM to Claude code (where we told the LLMs they can use tools on our machines) was a game changer. The improvement there was the agent loop and the support for tools.

      In 2023 I asked v0.dev to one shot me a website for a business I was working on and it did it in about 3 minutes. I feel like we’re still stuck there with the models.

      3 replies →

  • There’s a subtle point a moment when you HAVE to take the driver wheel from the AI. All issues I see are from people insisting to use far beyond the point it stops being useful.

    It is a helper, a partner, it is still not ready go the last mile

    • It's funny how many people don't get that. It's like adding a pretty great senior or staff level engineer to sit on-call next to every developer and assist them, for basically free (I've never used any of the expensive stuff yet. Just things like Copilot, Grok Code in JetBrains, just asking Gemini to write bits of code for me).

      If you hired a staff engineer to sit next to me, and I just had him/her write 100% of the code and never tried to understand it, that would be an unwise decision on my part and I'd have little room to complain about the times he made mistakes.

    • As someone else said in this thread:

      > The whole discourse around LLMs is so utterly exhausting. If I say I don't like them for almost any reason, I'm a luddite. If I complain about their shortcomings, I'm just using it wrong. If I try and use it the "right" way and it still gets extremely basic things wrong, then my expectations are too high.

      I’m perfectly happy to write code, to use these tools. I do use them, and sometimes they work (well). Other times they have catastrophic failures. But apparently it’s my failure for not understanding the tool or expecting too much of the tool, while others are screaming from the rooftops about how this new model changes everything (which happens every 3 months at this point)

      2 replies →

  • > We're talking "copy basic examples and don't hallucinate APIs" here, not deep complicated system design topics.

    If your metric is an LLM that can copy/paste without alterations, and never hallucinate APIs, then yeah, you'll always be disappointed with them.

    The rest of us learn how to be productive with them despite these problems.

    • > If your metric is an LLM that can copy/paste without alterations, and never hallucinate APIs, then yeah, you'll always be disappointed with them.

      I struggle to take comments like this seriously - yes, it is very reasonable to expect these magical tools to copy and paste something without alterations. How on earth is that an unreasonable ask?

      The whole discourse around LLMs is so utterly exhausting. If I say I don't like them for almost any reason, I'm a luddite. If I complain about their shortcomings, I'm just using it wrong. If I try and use it the "right" way and it still gets extremely basic things wrong, then my expectations are too high.

      What, precisely, are they good for?

      13 replies →

  • >Every 2/3 months we're hearing there's a new model that just blows the last one out of the water for coding

    I haven't heard that at all. I hear about models that come out and are a bit better. And other people saying they suck.

    >Meanwhile, here I am with Opus and Sonnet for $20/mo and it's regularly failing at basic tasks, antigravity getting stuck in loops and burning credits.

    Is it bringing you any value? I find it speeds things up a LOT.

  • I have a hard time believing that this v0, from 2023, achieved comparable results to Gemini 3 in Web design.

    Gemini now often produces output that looks significantly better than what I could produce manually, and I'm an expert for web, although my expertise is more in tooling and package management.

  • Frankly I think the 'latest' generation of models from a lot of providers, which switch between 'fast' and 'thinking' modes, are really just the 'latest' because they encourage users to use cheaper inference by default. In chatgpt I still trust o3 the most. It gives me fewer flat-out wrong or nonsensical responses.

    I'm suspecting that once these models hit 'good enough' for ~90% of users and use cases, the providers started optimizing for cost instead of quality, but still benchmark and advertise for quality.

We implement pretty cool workflows at work using "GenAI" and the users of our software are really appreciative. It's like saying a hammer sucks because it breaks most things you hit with it.

>Generative AI, as we know it, has only existed ~5-6 years

Probably less than that, practically speaking. ChatGPT's initial release date was November 2022. It's closer to 3 years, in terms of any significant amount of people using them.

I don't think LLMs are an abject failure, but I find it equally odd that so many people think that transformer-based LLMs can be incrementally improved to perfection. It seems pretty obvious to me now that we're not gonna RLHF our way out of hallucinations. We'll probably need a few more fundamental architecture breakthroughs to do that.

> Generative AI, as we know it, has only existed ~5-6 years, and it has improved substantially, and is likely to keep improving.

I think the big problem is that the pace of improvement was UNBELIEVABLE for about 4 years, and it appears to have plateaued to almost nothing.

ChatGPT has barely improved in, what, 6 months or so.

They are driving costs down incredibly, which is not nothing.

But, here's the thing, they're not cutting costs because they have to. Google has deep enough pockets.

They're cutting costs because - at least with the current known paradigm - the cost is not worth it to make material improvements.

So unless there's a paradigm shift, we're not seeing MASSIVE improvements in output like we did in the previous years.

You could see costs go down to 1/100th over 3 years, seriously.

But they need to make money, so it's possible non of that will be passed on.

  • I think that even if it never improves, its current state is already pretty useful. I do think it's going to improve though I don't think AGI is going to happen any time soon.

    I have no idea what this is called, but it feels like a lot of people assume that progress will continue at a linear pace for forever for things, when I think that generally progress is closer to a "staircase" shape. A new invention or discovery will lead to a lot of really cool new inventions and discoveries in a very short period of time, eventually people will exhaust the low-to-middle-hanging fruit, and progress kind of levels out.

    I suspect it will be the same way with AI; I don't now if we've reached the top of our current plateau, but if not I think we're getting fairly close.

    • Yes I've read about something like before - like the jump from living in 1800 to 1900 - you go from no electricity at home to having electricity at home for example. The jump from 1900 to 2000 is much less groundbreaking for the electricity example - you have more appliances and more reliable electricity but it's nothing like the jump from candle to light bulb.

      2 replies →

  • They are focused on reducing costs in order to survive. Pure and simple.

    Alphabet / Google doesn’t have that issue. OAI and other money losing firms do.

>and is likely to keep improving.

I'm not trying to be pedantic, but how did you arrive at 'keep improving' as a conclusion? Nobody is really sure how this stuff actually works. That's why AI safety was such a big deal a few years ago.

  • Totally reasonable question, and I only am making an assumption based on observed progress. AI generated code, at least in my personal experience, has gotten a lot better, and while I don't think that will go to infinity, I do think that there's still more room for improvement that could happen.

    I will acknowledge that I don't have any evidence of this claim, so maybe the word "likely" was unwise, as that suggests probability. Feel free to replace "is "likely to" with "it feels like it will".

Because the likes of Altman have set short term expectations unrealistically high.

  • I mean that's every tech company.

    I made a joke once after the first time I watched one of those Apple announcement shows in 2018, where I said "it's kind of sad, because there won't be any problems for us to solve because the iPhone XS Max is going to solve all of them".

    The US economy is pretty much a big vibes-based Ponzi scheme now, so I don't think we can single-out AI, I think we have to blame the fact that the CEOs running these things face no negative consequences for lying or embellishing and they do get rewarded for it because it will often bump the stock price.

    Is Tesla really worth more than every other car company combined in any kind of objective sense? I don't think so, I think people really like it when Elon lies to them about stuff that will come out "next year", and they feel no need to punish him economically.

  • I maintain that most anti-AI sentiment is actually anti-lying-tech-CEO sentiment misattributed.

    The technology is neat, the people selling it are ghouls.

    • Exactly: the technology is useful but because the executive class is hyping it as close to AGI because their buddies are slavering for layoffs. If that “when do you get fired?” tone wasn’t behind the conversation, I think a lot of people would be interested in applying LLMs to the smaller subset of things they actually perform well at.

      3 replies →

    • I hate the Anthropic guy so much.. when I see the face it just brings back all the nonsense lies and "predictions" he says. Altman is kind of the same but for some reason Dario kind of takes the cake.

You're saying the same thing cryptobros say about bitcoin right now, and that's 17 years later.

It's a business, but it won't be the thing the first movers thought it was.

  • It’s different in that Bitcoin was never useful in any capacity when it was new. AI is at least useful right now and it’s improved considerably in the last few years.