Comment by lufenialif2

16 hours ago

I sent this to accounting friends and this aligns with what I've been going through trying to use LLMs to create a game from scratch. Seems like the current best use case for language models (even with agent mode) is to feed it exactly what you want to get out, essentially turning it into a better auto complete. Still saves tons of time, but it isn't a panacea.

this exactly right. remember, these models were trained to be functions. f(x)=y. thats an interface at its heart. when x and y are language, then its a translator.

they have emergent capabilities, like "translating" instructions/questions in X to the probable answers in Y, but i think people are getting way way ahead of themselves with those. these things still fundamentally cant think, and we can try to mimic thinking with scaffolding but then your just going to learn the bitter lesson again

I'm not even sure it saves a ton of time to be honest. It sure _feels_ like I spend more time writing up tasks and researching/debugging hallucinations than just doing the thing myself.

  • This is consistently my experience too, I'm seriously just baffled by reports of time saved. I think it costs me more time cleaning up its mistakes than it saves me by solving my problems

    • ive found that the shorter the "task horizon" the more time saved

      essentially, a longer horizon increases chances of mistakes, increasing time needed to find and fix them. so at one point that becomes greater than the time saved in not having to do it myself

      this is why im not bullish on AI agents. task horizon is too long and dynamical

    • There's really pernicious stuff I've noticed cropping up too, over the months of use.

      Not just subtle bugs, but unused variables (with names that seem to indicate some important use), comments that don't accurately describe the line of code that it precedes and other things that feel very 'uncanny.'

      The problem is, the code often looks really good at first glance. Generally LLMs produce well structured code with good naming conventions etc.

    • I think people are doing one of several things to get value:

      0. Use it for research and prototyping, aka throwaway stuff.

      2. Use it for studying an existing, complex project. More or less read only or very limited writes.

      3. Use it for simple stuff they don't care much about and can validate quickly and reasonably accurately, the standard examples are CLI scripts and GUI layouts.

      4. Segment the area in which the LLM works very precisely. Small functions, small modules, ideally they add tests from another source.

      5. Boilerplate.

      There can be a lot of value in those areas.

      1 reply →

    • The reports of time saved are so cooked it's not funny. Just part of the overall AI grift going on - the actual productivity gains will shake out in the next couple years, just gotta live through the current "game changer" and "paradigm shifting event" nonsense the upper management types and VC's are pushing.

      When I see stuff like "Amazon saved 4500 dev years of effort by using AI", I know it's on stuff that we would use automation for anyways so it's not really THAT big of a difference over what we've done in the past. But it sounds better if we just pretend like we can compare AI solutions to literally having thousands of developers write Java SDK upgrades manually.

I feel it does essentially save a lot of time in bookkeeping, but doesn’t negate the need for a human bookkeeper. Who knows what they’re doing