Comment by joelthelion

3 days ago

> When do you expect that impact? I think the models seem smarter than their economic impact would imply.

> Yeah. This is one of the very confusing things about the models right now.

As someone who's been integrating "AI" and algorithms into people's workflows for twenty years, the answer is actually simple. It takes time to figure out how exactly to use these tools, and integrate them into existing tooling and workflows.

Even if the models don't get any smarter, just give it a few more years and we'll see a strong impact. We're just starting to figure things out.

No doubt LLMs and tooling will continue to improve, and best use cases for them better understood, but what Ilya seems to be referring to is the massive disconnect between the headline-grabbing benchmarks such as "AI performs at PhD level on math", etc, and the real-world stupidity of these models such as his example of a coding agent toggling between generating bug #1 vs bug #2, which in fact largely explains why the current economic and visible impact is much less than if the "AI is PhD level" benchmark narrative was actually true.

Calling LLMs "AI" makes them sound much more futuristic and capable than they actually are, and being such a meaningless term invites extrapolation to equally meaningless terms like AGI and visions of human-level capability.

Let's call LLMs what they are - language models - tools for language-based task automation.

Of course we eventually will do this. Fuzzy meaningless names like AI/AGI will always be reserved for the cutting edge technology du jour, and older tech that is realized in hindsight to be much more limited will revert to being called by more specific names such as "expert system", "language model", etc.

  • There is actually an interesting scenario in this disconnect that we are experiencing. Maybe "real" AGI in the sense of intelligence that self-corrects effectively like a human is still a long way. Maybe we will be stuck with this kind of ever-improving but still kind of deficient LLM intelligence we have right now.

    There are tons of use cases even for such a limited type of intelligence. No, it is not a million math PhDs at your disposal. It is a narrow intelligence that is still hugely useful and businesses will need a few years to adapt. The impact on topics like customer service with LLM+RAG+triggering actions is very close already and should transform the industry in the next years.

    • Yes - LLMs are useful, even if auto-regressively trained GPTs aren't the answer to human intelligence, and outside of software development (maybe there too) it seems we're still very early in companies trying to figure out what they can and can not usefully be used for.

      It seems the LLM companies generating all the hype (mostly OpenAI & Anthropic) may be shooting themselves in the foot a bit here, raising false expectations of what LLMs can do, or soon will be able to do, and therefore encouraging all the misapplication and failed corporate projects that are currently happening. Anthropic are talkiing out of both sides of their mouth here, saying that AGI is imminent, about to replace developers and remote workers, yet acknowledging that the technology and use case selection is so fickle that corporations aren't likely to be successful without 1-on-1 guidance from Anthropic.

      The mythical AGI, an artificial human, will presumably be transformative if/when it ever arrives, but even if we're still early days in LLM adoption it's not clear if that (LLMs) really will be. Developers get a new tool to use, consumers get a new frustrating AI customer service to deal with, corporate e-mails, marketing literature and powerpoints become enshittified LLM-generated AI slop, etc. Maybe the biggest "transformative" (widely felt) impact of LLMs is potentially chatbots and AI-search, but it seems people are just taking that in their stride, and not obvious that the experience and impact from that is going to change much going forwards.

  • > the real-world stupidity of these models such as his example of a coding agent toggling between generating bug #1 vs bug #2, which in fact largely explains why the current economic and visible impact is much less than if the "AI is PhD level" benchmark narrative was actually true.

    this could be true in the past, but in recent weeks I started more and more trust top AI models and less PhDs I work with. Quality jump is very real imo.

    • Are you a mathematician? I’m not an expert on the math field but it seems like they are hitting the same issues everyone else has: current LLMs still more or less need to be supervised by an expert and struggle to do something actually novel or build out a complicated proof correctly.

      3 replies →

Could this be a problem not with AI, but with our understanding of how modern economies work?

The assumption here is that employees are already tuned so be efficient, so if you help them complete tasks more quickly then productivity improves. A slightly cynical alternate hypothesis could be that employees are generally already massively over-provisioned, because an individual leader's organisational power is proportional to the number of people working under them.

If most workers are already spending most of their time doing busy-work to pad the day, then reducing the amount of time spent on actual work won't change the overall output levels.

  • You describe the "fake email jobs" theory of employment. Given that there are way fewer email jobs in China does this imply that China will benefit more from AI? I think it might.

    • As China’s population gets older and more middle class is this shifting to be more like America?

      I really don’t know and am curious.

  • This is a part of it indeed. Most people (and even a significant number of economists) assume that the economy is somehow supply-limited (and it doesn't help that most 101 econ class will introduce the markets as a way of managing scarcity), but in reality demand is the limit in 90-ish% of the case.

    And when it's not, the supply generally don't increase as much as it could, became supplier expect to be demand-limited again at some point and don't want to invest in overcapacity.

    • Agreed. If you "create demand", it usually just means people are spending on the thing you provide, and consequently less on something else. Ultimately it goes back to a few basic needs, something like Maslow's hierarchy of needs.

      And then there's followup needs, such as "if I need to get somewhere to have a social life, I have a need for transportation following from that". A long chain of such follow-up needs gives us agile consultants and what not, but one can usually follow it back to the source need by following the money.

      Startup folks like to highlight how they "create value", they added something to the world that wasn't there before and they get to collect the cash for it.

      But assuming that population growth will eventually stagnate, I find it hard to not ultimately see it all as a zero sum game. Limited people with limited time and money, that's limited demand. What companies ultimately do, is fight for each other for that. And when the winners emerge and the dust settles, supply can go down to meet the demand.

      3 replies →

  • Varies depending on the field and company. Sounds like you may be speaking from your own experiences?

    In medicine, we're already seeing productivity gains from AI charting leading to an expectation that providers will see more patients per hour.

    • > In medicine, we're already seeing productivity gains from AI charting leading to an expectation that providers will see more patients per hour.

      And not, of course, an expectation of more minutes of contact per patient, which would be the better outcome optimization for both provider and patient. Gotta pump those numbers until everyone but the execs are an assembly line worker in activity and pay.

      2 replies →

  • It is the delusion of the Homo Economicus religion.

    I think the problem is a strong tie network of inefficiency that is so vast across economic activity that it will take a long time to erode and replace.

    The reason it feels like it is moving slow is because of the delusion the economy is made up a network of Homo Economicus agents who would instantaneously adopt the efficiencies of automated intelligence.

    As opposed to the actual network of human beings who care about their lives because of a finite existence who don't have much to gain from economic activity changing at that speed.

    That is different though than the David Graeber argument. A fun thought experiment that goes way too far and has little to do with reality.

AI makes the parts of my work that I spend the least time on a whole lot quicker, but (so far / still) has negligible effects on the parts of my work that I spend the most time on.

I'm still not sure if this is due to a technological limitation or an organizational one. Most of my time is not spent on solving tech problems but rather solving "human-to-human" problems (prioritization between things that need doing, reaching consensus in large groups of people of how to do things that need doing, ...)

Oh yes, this is 100% accurate.

Very often, when designing ERP, or other system, people think: "This is easy, I just this XYZ I am done." Then, you find that there are many corner use-cases. XYZ can be split to phases, you might need to add approvals, logging, data integrations... and what was a simple task, becomes 10 tasks.

In the first year of CompSci uni, our teacher told us a thing I remember: Every system is 90% finished 90% of time. He was right.

Yeah but it's just one model.

Call it Dave. Now Microsoft hires Dave and Open AI hires Dave. And Meta hires Dave and Oracle hires Dave and the US govt hires Dave. And soon each of those had hired not just one Dave but 50 identical copies of Dave.

It doesn't matter if Dave is a smart-ish ok guy. That's not the problem with this scenario. The problem is the the only thing on the market is Dave and people who think exactly like Dave thinks

  • That seems like a valid problem that was also mentioned in the podcast. 50 copies of Ilya, Dave or Einstein will have diminishing returns. I think the proposed solution is ongoing training and making them individuals. MS Dave will be a different individual than Dave.gov. But then why don't we just train humans in the first place.

That’s likely exactly how I feel about it. In the end the product companies like OpenAI will harness the monetary benefits of the academic advances.

You integrate, you build the product, you win, you don’t need to understand anything in terms of academic disciplines, you need the connections and the business smarts. In the end the majority of the population will be much more familiar with the terms ChatGPT and Copilot than with the names behind it, even if the academic behemoths such as Ilya and Andrej, who are quite prominent in their public appearance.

For the major population, I believe it all began with search over knowledge graphs. Wikipedia presented a dynamic and vibrant corpus. Some NLP began to become more prominent. With OCR, more and more printed works had begun to get digitalized. The corpus had been growing. With opening the gates of scientific publishers, the quality might have also improved. All of it was part of the grunt work to make today’s LLMs capable. The growth of the Cloud DCs and compute advancements have been making deep nets more and more feasible. This is just an arbitrary observation on the surface of the pieces that fell into place. And LLMs are likely just another composite piece for something bigger yet to come.

To me, that’s the fascination of how scientific theory and business applications live in symbiosis.

As someone who is building an LLM-powered product on the side, using AI coding agents to help with development of said LLM-powered product and for my day job, and has a long-tail of miscellaneous uses for AI, I suspect you're right.

Beyond that the smartness is very patchy. They can do math problems beyond 99% of humans but lack the common sense understanding to take over most jobs.

  • Yep, the lack of commons sense is sometimes very evident.

    For instance, one of these popular generative AI services refused to remove copyright watermark from an image when asked directly. Then I told it that the image has weird text artifacts on it, and asked it to remove them. That worked perfectly.

  • Most jobs involve complex long term tasks - which isn't something that's natural to LLMs.

Yeah, I spend most of my days keeping up with current AI development these days, and I'm only scratching the surface of how to integrate it in my own business. For people for whom it's not their actual job, it will take a lot more time to figure out even which questions to ask about where it makes sense to integrate in their workflows.

Another limitation that I see right now is that for "economic impact" you want the things to have initiative and some agency, and there is well-justified hesitancy in providing that even where possible.

Having a bunch of smart developers that are not allowed to do anything on their own and have to be prompted for every single action is not too advantageous if everyone is human, either ;)

  • Screw driver doesnt have agency but it certainly helps me get tasks done faster. AIs don't need agency to accelerate a ton of work

    • I did not mean to imply that AI isn't helpful already.

      But a screw-driving assistant is more useful if he drives in screws on his own than if you have to prompt his every action. I'm not saying that a "dumb" assistant does not help at all.

We’re also still at a point where security is a big question mark. My employer won’t let us hook GenAI up to office 365 or slack, so any project or product management use of GenAI first requires manually importing docs into a database and pointing to that. Efficiency gains are hard to come by when you don’t meet people where their “knowledge” is already stored.

> Even if the models don't get any smarter, just give it a few more years and we'll see a strong impact. We're just starting to figure things out.

2 years ? 15 years ? It matters a lot for people, the stock market and governments