Comment by handoflixue
14 days ago
> Our main experiment is a round-trip relay with N = 10 consecutive round-trips per environment, simulating 20 delegated interactions. In each interaction, the model receives all work environment documents as text in its context window in a single turn
The LLM isn't being given an actual file system they can work with - they're expected to receive the document as text in the prompt, perform a task, and then re-output text into the conversation?
Maybe I'm misunderstanding the methodology, but this feels a lot like the human game of Telephone - or perhaps, asking one to do a similar editing task using only Microsoft Outlook with copy/paste disabled.
I'd imagine that one gets radically different results if one uses the appropriate desktop tools, just like humans do much better outside games of Telephone.
No comments yet
Contribute on Hacker News ↗