Comment by impossiblefork

2 months ago

Usually I use the format "Analyze the following ...".

For simple discussions this is fine. For complex discussions, especially when people get into conflict-- whether that conflict is really complex or not, problems usually result. The big problems are that the model will misquote or misrepresent views-- attempted paraphrases that actually change the meaning, the ordinary hallucinations etc.

For stories the confusion is much greater. Much of it is due to the basic way LLMs work: stories have dialogue, so if the premise contains people not being able to speak each other's language problems come very soon. I remember asking some recent Microsoft Copilot variant to write some portal scenario-- some guys on vacation to Teneriffe rent a catamaran and end up falling through a hole in the world of ASoIAF and into the seas off Essos, where they obviously have a terrible time, and it kept forgetting that they don't know English.

This is of course not obviously relevant for what Copilot is intended for, but I feel that if you actually try this you will understand how far we are from something like AGI, because if things like OpenAIs or whoever's systems were in fact close, this would be close too. If we were close we'd probably see silly errors too, but it'd be different kinds of errors, things like not telling you the story you want, not ignoring core instructions or failing to understand conversations.

2 comments

impossiblefork

jfreds 2 months ago

Your points about misquotes and language troubles are very valid and interesting. But a word of caution on your prompt: you’re asking a lot of the word “analyze” here; if the LLM responded that the thread had 15 comments by 10 unique authors, and a total of 2000 characters, I would classify that as a completely satisfactory answer (assuming the figures were correct) based on the query

roenxi 2 months ago

> Usually I use the format "Analyze the following ...".

It doesn't surprise me that you're getting nonsense, that is an ill-formed request. The AI can't fulfil it because it isn't asking it to do anything. I'm in the same boat as an AI would be, I can't tell what outcome you want. I'd probably interpret it as "summarise this conversation" if someone asked that of me, but you seem to agree that AI are good at summery tasks so that doesn't seem like it would be what you want. If I had my troll hat on I'd give you a frequency analysis of the letters and call it a day which is more passive-aggressive than I'd expect of the AI, they tend to just blather when they get a vague setup. They aren't psychic, it is necessary to give them instructions to carry out.