← Back to context

Comment by handoflixue

14 days ago

> Our main experiment is a round-trip relay with N = 10 consecutive round-trips per environment, simulating 20 delegated interactions. In each interaction, the model receives all work environment documents as text in its context window in a single turn

The LLM isn't being given an actual file system they can work with - they're expected to receive the document as text in the prompt, perform a task, and then re-output text into the conversation?

Maybe I'm misunderstanding the methodology, but this feels a lot like the human game of Telephone - or perhaps, asking one to do a similar editing task using only Microsoft Outlook with copy/paste disabled.

I'd imagine that one gets radically different results if one uses the appropriate desktop tools, just like humans do much better outside games of Telephone.