Comment by impossiblefork

7 hours ago

LLMs may appear to do well on certain programming tasks on which they are trained intensively, but they are incredibly weak. If you try to use an LLM to generate, for example, a story, you will find that it will make unimaginable mistakes. If you ask an LLM to analyze a conversation from the internet it will misrepresent the positions of the participants, often restating things so that they mean something different or making mistakes about who said what in a way that humans never do. The longer the exchange the more these problems are exacerbated.

We are incredibly far from AGI.

We do have AI systems that write stories [0]. They work. The quality might not be spectacular but if you've ever gone out and spent time reading fanfiction you'd have to agree there are a lot of rather terrible human writers too (bless them). It still hits this issue that if we want LLMs to compete with the best of humanity then they aren't there yet, but that means defining human intelligence as something that most people don't have access to.

> If you ask an LLM to analyze a conversation from the internet it will misrepresent the positions of the participants, often restating things so that they mean something different or making mistakes about who said what in a way that humans never do.

AI transcription & summary seems to be a strong point of the models so I don't know what exactly you're trying to get to with this one. If you have evidence for that I'd actually be quite interested because humans are so bad at representing what other people said on the internet it seems like it should be an easy win for an AI. Humans typically have some wild interpretations of what other people write that cannot be supported from what was written.

[0] https://github.com/google-deepmind/dramatron

  • I haven't tried Dramatron, but my experience is that it isn't possible to do sensibly. With regard to the second part

    >AI transcription & summary seems to be a strong point of the models so I don't know what exactly you're trying to get to with this one. If you have evidence for that I'd actually be quite interested because humans are so bad at representing what other people said on the internet it seems like it should be an easy win for an AI. Humans typically have some wild interpretations of what other people write that cannot be supported from what was written.

    Transcription and summarization is indeed fine, but try posting a longer reddit or HN discussion you've been part of into any model of your choice and ask it to analyze it, and you will see severe errors very soon. It will consistently misrepresent the views expressed and it doesn't really matter what model you go for. They can't do it.

> We are incredibly far from AGI.

This and we don't actually know what the foundation models are for AGI, we're just assuming LLMs are it.

This seems distant from my experience. Modern LLMs are superb at summarisation, far better than most people.