Comment by roenxi

2 months ago

We do have AI systems that write stories [0]. They work. The quality might not be spectacular but if you've ever gone out and spent time reading fanfiction you'd have to agree there are a lot of rather terrible human writers too (bless them). It still hits this issue that if we want LLMs to compete with the best of humanity then they aren't there yet, but that means defining human intelligence as something that most people don't have access to.

> If you ask an LLM to analyze a conversation from the internet it will misrepresent the positions of the participants, often restating things so that they mean something different or making mistakes about who said what in a way that humans never do.

AI transcription & summary seems to be a strong point of the models so I don't know what exactly you're trying to get to with this one. If you have evidence for that I'd actually be quite interested because humans are so bad at representing what other people said on the internet it seems like it should be an easy win for an AI. Humans typically have some wild interpretations of what other people write that cannot be supported from what was written.

[0] https://github.com/google-deepmind/dramatron

5 comments

roenxi

impossiblefork 2 months ago

I haven't tried Dramatron, but my experience is that it isn't possible to do sensibly. With regard to the second part

>AI transcription & summary seems to be a strong point of the models so I don't know what exactly you're trying to get to with this one. If you have evidence for that I'd actually be quite interested because humans are so bad at representing what other people said on the internet it seems like it should be an easy win for an AI. Humans typically have some wild interpretations of what other people write that cannot be supported from what was written.

Transcription and summarization is indeed fine, but try posting a longer reddit or HN discussion you've been part of into any model of your choice and ask it to analyze it, and you will see severe errors very soon. It will consistently misrepresent the views expressed and it doesn't really matter what model you go for. They can't do it.

roenxi 2 months ago
I can see why they'd struggle, I'm not sure what you're trying to ask the model to do. What type of analysis are you expecting? If the model is supposed to represent the views expressed that would be a summary. If you aren't asking it for a summary what do you want it to do? Do you literally mean you want the model to perform conversational analysis (ie, https://en.wikipedia.org/wiki/Conversation_analysis#Method)?
- impossiblefork 2 months ago
  
  Usually I use the format "Analyze the following ...".
  For simple discussions this is fine. For complex discussions, especially when people get into conflict-- whether that conflict is really complex or not, problems usually result. The big problems are that the model will misquote or misrepresent views-- attempted paraphrases that actually change the meaning, the ordinary hallucinations etc.
  For stories the confusion is much greater. Much of it is due to the basic way LLMs work: stories have dialogue, so if the premise contains people not being able to speak each other's language problems come very soon. I remember asking some recent Microsoft Copilot variant to write some portal scenario-- some guys on vacation to Teneriffe rent a catamaran and end up falling through a hole in the world of ASoIAF and into the seas off Essos, where they obviously have a terrible time, and it kept forgetting that they don't know English.
  This is of course not obviously relevant for what Copilot is intended for, but I feel that if you actually try this you will understand how far we are from something like AGI, because if things like OpenAIs or whoever's systems were in fact close, this would be close too. If we were close we'd probably see silly errors too, but it'd be different kinds of errors, things like not telling you the story you want, not ignoring core instructions or failing to understand conversations.
  
  2 replies →