Comment by Retric

5 days ago

This is all really arbitrary metrics across such wildly different fields. IMO LLMs are where computer vision was 20+ years ago in terms of real world accuracy. Other people feel LLMs offer far more value to the economy etc.

4 comments

Retric

adamzwasserman 5 days ago

I understand the temptation to compare LLMs and computer vision, but I think it’s misleading to equate generative AI with feature-identification or descriptive AI systems like those in early computer vision. LLMs, which focus on generating human-like text and reasoning across diverse contexts, operate in a fundamentally different domain than descriptive AI, which primarily extracts patterns or features from data, like early vision systems did for images.

Comparing their 'real-world accuracy' oversimplifies their distinct goals and applications. While LLMs drive economic value through versatility in language tasks, their maturity shouldn’t be measured against the same metrics as descriptive systems from decades ago.

Retric 5 days ago
I don’t think it’s an oversimplification as accuracy is what constrains LLMs across so many domains. If you’re a wealthy person asking ChatGPT to write a prenup or other contract to use would be an act of stupidity unless you vetted it with an actual lawyer. My most desired use case is closer, but LLMs are still more than an order of magnitude below what I am willing to tolerate.
IMO that’s what maturity means in AI systems. Self driving cars aren’t limited by the underlying mechanical complexity, it’s all about the long quest for a system to make reasonably correct decisions hundreds of times a second for years across widely varying regions and weather conditions. Individual cruse missiles on the other hand only needed to operate across a single short and pre-mapped flight in specific conditions, therefore they used visual navigation decades earlier.
- adamzwasserman 4 days ago
  
  You're conflating two different questions. I'm not arguing LLMs are mature or reliable enough for high-stakes tasks. My argument is about why they produce output that creates the illusion of understanding in the language domain, while the same techniques applied to other domains (video generation, molecular modeling, etc.) don't produce anything resembling 'understanding' despite comparable or greater effort.
  The accuracy problems you're describing actually support my point: LLMs navigate linguistic structures effectively enough to fool people into thinking they understand, but they can't verify their outputs against reality. That's exactly what you'd expect from a system that only has access to the map (language) and not the territory (reality).
  
  1 reply →