Comment by roenxi

6 hours ago

Just because something didn't work out doesn't mean it was a waste, and it isn't particularly clear that the the LLM boom was wasted, or that it is over, or that it isn't working. I can't figure out what people mean when they say "AGI" any more, we appear to be past that. We've got something that seems to be general and seems to be more intelligent than an average human. Apparently AGI means a sort of Einstein-Tolstoy-Jesus hybrid that can ride a unicycle and is far beyond the reach of most people I know.

Also, if anyone wants to know what a real effort to waste a trillion dollars can buy ... https://costsofwar.watson.brown.edu/

> Just because something didn't work out doesn't mean it was a waste

Its all about scale.

If you spend $100 on something that didn't work out that money wasn't wasted if you learned something amazing. If you spend $1,000,000,000,000 on something that didn't work out the expectation is that you learn something close to 1,000,000,000x more than the $100 spend. If the value of learning is several orders of magnitude less than the level of investment there is absolutely tremendous waste.

For example: nobody qualifies spending a billion dollars on a failed project as value if your learning only resulted in avoiding future paper cuts.

  • It's not waste, it's a way to get rid of excess liquidity caused by massive money printing operations.

We currently have human-in-the-loop AGI.

While it doesn't seem we can agree on a meaning for AGI, I think a lot of people think of it as an intelligent entity that has 100% agency.

Currently we need to direct LLM's from task to task. They don't yet posses the capability of full real world context.

This is why I get confused when people talk about AI replacing jobs. It can replace work, but you still need skilled workers to guide them. To me, this could result in humans being even more valuable to businesses, and result in an even greater demand for labor.

If this is true, individuals need to race to learn how to use AI and use it well.

  • > Currently we need to direct LLM's from task to task.

    Agent-loops that can work from larger scale goals work just fine. We can't letting them run with no oversight, but we certainly also don't need to micro-manage every task. Most days I'll have 3-4 agent-loops running in parallel, executing whole plans, that I only check in on occasionally.

    I still need to review their output occasionally, but I certianly don't direct them task to task.

    I do agree with you we still need skilled workers to guide them, so I don't think we necessarily disagree all that much, but we're past the point where they need to be micromanaged.

  • If we can't agree on a definition of AGI, then what good is it to say we have "human-in-the-loop AGI"? The only folks that will agree with you will be using your definition of AGI, which you haven't shared (at least in this posting). So, what is your definition of AGI?

> We've got something that seems to be general and seems to be more intelligent than an average human.

We've got something that occasionally sounds as if it were more intelligent than an average human. However, if we stick to areas of interest of that average human, they'll beat the machine in reasoning, critical assessment, etc.

And in just about any area, an average human will beat the machine wherever a world model is required, i.e., a generalized understanding of how the world works.

It's not to criticize the usefulness of LLMs. Yet broad statements that an LLM is more intelligent than an average Joe are necessarily misleading.

I like how Simon Wardley assesses how good the most recent models are. He asks them to summarize an article or a book which he's deeply familiar with (his own or someone else's). It's like a test of trust. If he can't trust the summary of the stuff he knows, he can't trust the summary that's foreign to him either.

AI capabilities today are jagged and people look at what they want to.

Boosters: it can answer PhD-level questions and it helps me a lot with my software projects.

Detractors: it can't learn to do a task it doesn't already know how to do.

Boosters: But actually it can actually sometimes do things it wouldn't be able to do otherwise if you give it lots of context and instructions.

Detractors: I want it to be able to actually figure out and retain the context itself, without being given detailed instructions every time, and do so reliably.

Boosters: But look, in this specific case it sort of does that.

Detractors: But not in my case.

Boosters: you're just using it wrong. There must be something wrong with your prompting strategy or how you manage context.

etc etc etc...

AFAICT "AGI" is a placeholder for peoples fears and hopes for massive change caused by AI. The singularity, massive job displacement, et cetera.

None of this is a binary, though. We already have AGI that is superhuman in some ways and subhuman in others. We are already using LLM's to help improve themselves. We already have job displacement.

That continuum is going to continue. AI will become more superhuman in some ways, but likely stay subhuman in others. LLM's will help improve themselves. Job displacement will increase.

Thus the question is whether this rate of change will be fast or slow. Seems mundane, but it's a big deal. Humans can adapt to slow changes, but not so well to fast ones. Thus AGI is a big deal, even if it's a crap stand in for the things people care about.

> Just because something didn't work out doesn't mean it was a waste

Here i think it's more about opportunity cost.

> I can't figure out what people mean when they say "AGI" any more, we appear to be past that

What i ask of an AGI is to not hallucinate idiotic stuff. I don't care about being bullshitted too much if the bullshit is logic, but when i ask "fix mypy errors using pydantic" and instead of declaring a type for a variable it invent weird algorithms that make no sense and don't work (and the fix would have taken 5 minutes for any average dev).I mean, Claude 4.5 and Codex have replaced my sed/search and replaces, write my sanity tests, write my commit comment, write my migration scripts (and most of my scripts), and make refactor so easy i now do one refactor every month or so, but if it is AGI, i _really_ wonder what people mean by intelligence.

> Also, if anyone wants to know what a real effort to waste a trillion dollars can buy

100% agree. Pleas Altman, Ilya and other, i will hapilly let you use whatever money you want if that money is taken from war profiteers and warmongers.

> Just because something didn't work out doesn't mean it was a waste

One thing to keep in mind, is that most of these people who go around spreading unfounded criticism of LLMs, "Gen-AI" and just generally AI aren't usually very deep into understanding computer science, and even less science itself. In their mind, if someone does an experiment, and it doesn't pan out, they'll assume that means "science itself failed", because they literally don't know how research and science work in practice.

  • Maybe true in general, but Gary Marcus is an experienced researcher and entrepreneur who’s been writing about AI for literally decades.

    I’m quite critical, but I think we have to grant that he has plenty of credentials and understands the technical nature of what he’s critiquing quite well!

    • Yeah, my comment was mostly about the ecosystem at large, rather than a specific dig to this particular author, I mostly agree with your comment.

> Just because something didn't work out doesn't mean it was a waste, and it isn't particularly clear that the the LLM boom was wasted, or that it is over, or that it isn't working

Agreed. Has there been waste? Inarguably. Has the whole thing been a waste? Absolutely not. There are lessons from our past that in an ideal world would have allowed us to navigate this much more efficiently and effectively. However, if we're being honest with ourselves, that's been true of any nascent technology (especially hyped ones) for as long as we've been recording history. The path to success is paved with failure, Hindsight is 20/20, History rhymes and all that.

> I can't figure out what people mean when they say "AGI" any more

We've been asking "What is intelligence" (and/or Sentience) for as long as we've been alive, and still haven't come to a consensus on that. Plenty people will confidently claim they have an answer, which is great, but it's entirely irrelevant if there's not a broad consensus on that definition or a well defined way to verify AI/people/anything against it. Point in case...

> we appear to be past that. We've got something that seems to be general and seems to be more intelligent than an average human

Hard disagree specifically as it regards to Intelligence. They are certainly useful utilities when you use them right, but I digress. What are you basing that on? How can we be sure we're past a goal-post when we don't even know where the goal-post is? For starters, how much is Speed (or latency or IOP/TPSs or however you wish to contextualize it) a function of "intelligence"? For a tangible example of that: If an AI came to a conclusion derived from 100 separate sources, and a human manually went through those same 100 sources and came to the same conclusion, is the AI more intelligent by virtue of completing that task faster? I can absolutely see (and agree with) how that is convenient/useful, but the question specifically is: Does the speed it can provide answers (assuming they're both correct/same) make it smarter or as smart as the human?

How do they rationalize and reason their way through new problems? How do we humans? How important is the reasoning or the "how" of how it arrives at answers to the questions we ask it if the answers are correct? For a tangible example of that: What is happening when you ask an AI to compute the sum of 1 plus 1? What are we doing when we're asking to perform the same task? What about proving it to be correct? More broadly, in the context of AGI/Intelligence, does it matter if the "path of reason" differs if the answers are correct?

What about how confidently it presents those answers (correct or not)? It's well known that us humans are incredibly biased towards confidence. Personally, I might start buying into the hype the day that AI starts telling me "I'm not sure" or "I don't know." Ultimately, until I can trust it to tell me it doesn't know/isn't certain, I wont trust it when it tells me it does know/is certain, regardless of how "Correct" it may be. We'll get there one day, and until then I'm happy to use it for the utility and convenience it provides while doing my part to make it better and more useful.

Eh, tearing down a straw man is not an impressive argument from you either.

As a counter-point, LLMs still do embarrassing amounts of hallucinations, some of which are quite hilarious. When that is gone and it starts doing web searches -- or it has any mechanisms that mimic actual research when it does not know something -- then the agents will be much closer to whatever most people imagine AGI to be.

Have LLMs learned to say "I don't know" yet?

  • > Have LLMs learned to say "I don't know" yet?

    Can they, fundamentally, do that? That is, given the current technology.

    Architecturally, they don't have a concept of "not knowing." They can say "I don't know," but it simply means that it was the most likely answer based on the training data.

    A perfect example: an LLM citing chess rules and still making an illegal move: https://garymarcus.substack.com/p/generative-ais-crippling-a...

    Heck, it can even say the move would have been illegal. And it would still make it.

  • > When that is gone and it starts doing web searches -- or it has any mechanisms that mimic actual research when it does not know something

    ChatGPT and Gemini (and maybe others) can already perform and cite web searches, and it vastly improves their performance. ChatGPT is particularly impressive at multi-step web research. I have also witnessed them saying "I can't find the information you want" instead of hallucinating.

    It's not perfect yet, but it's definitely climbing human percentiles in terms of reliability.

    I think a lot of LLM detractors are still thinking of 2023-era ChatGPT. If everyone tried the most recent pro-level models with all the bells and whistles then I think there would be a lot less disagreement.

    • Well please don't include me in some group of Luddites or something.

      I use the mainstream LLMs and I've noted them improving. They have ways to go still.

      I was objecting to my parent poster's implication that we have AGI. However muddy that definition is, I don't feel like we do have that.