Comment by caetris2

3 months ago

LLMs are stateless and they do not remember the past (as in they don't have a database), making the training data a non-issue here. Therefore, the claims made here in this paper are not possible because the simulation would require each agent to have a memory context larger than any available LLM's context window. The claims made here by the original poster are patently false.

The ideas here are not supported by any kind of validated understanding of the limitations of language models. I want to be clear -- the kind of AI that is being purported to be used in the paper is something that has been in video games for over 2 decades, which is akin to Starcraft or Diablo's NPCs.

The key issue is that this is a intentional false claim that can certainly damage mainstream understanding of LLM safety and what is possible at the current state of the art.

Agentic systems are not well-suited to achieve any of the things that are proposed in the paper, and Generative AI does not enable these kinds of advancements.

> LLMs are stateless and they do not remember the past (as in they don't have a database), making the training data a non-issue here.

That's not what they said. They said that a LLM knows what elections are, which suggests they could have the requisite knowledge to act one out.

> Therefore, the claims made here in this paper are not possible because the simulation would require each agent to have a memory context larger than any available LLM's context window. The claims made here by the original poster are patently false.

No, it doesn't. They aren't passing in all prior context at once: they are providing relevant subsets of memory as context. This is a common technique for language agents.

> Agentic systems are not well-suited to achieve any of the things that are proposed in the paper, and Generative AI does not enable these kinds of advancements.

This is not new ground. Much of the base social behaviour here comes from Generative Agents [0], which they cite. Much of the Minecraft related behaviour is inspired by Voyager [1], which they also cite.

There isn't a fundamental breakthrough or innovation here that was patently impossible before, or that they are lying about: this combines prior work, iterates upon it, and scales it up.

[0]: https://arxiv.org/abs/2304.03442

[1]: https://voyager.minedojo.org/

  • Voyager's claims that it's a "learning agent" and that it "make new discoveries consistently without human intervention" are pretty much wrong considering how part of that system is using GPT's giant memory of ~~all~~ a lot of human knowledge (including how to play Minecraft, the most popular game ever made).

    In the same sense, LLMs "not remembering the past" is wrong (especially when part of a larger system). This seems like claiming humans / civilizations don't have a "memory" because you've redefined long term memory / repositories of knowledge like books to not be counted as "memory" ?

    Or am I missing something ??

Perhaps I've made a big assumption / oversimplification about how this works. But..

> LLMs are stateless and they do not remember the past (as in they don't have a database), making the training data a non-issue here

Yes. I never said they were stateful? The context given is the state. And training data is hugely important. Once upon a time there was a guy that claimed ChatGPT could simulate a command line shell. "Simulate" ended up being the wrong word. "Largely hallucinate" was a more accurate description. Shell commands and sessions were for sure part of the training data for ChatGPT, and that's how it could be prompted into largely hallucinating one. Same deal here with "election activities" I think.

> Therefore, the claims made here in this paper are not possible because the simulation would require each agent to have a memory context larger than any available LLM's context window. The claims made here by the original poster are patently false.

Well no, they can always trim the data put into the context. And then the agents would start "forgetting" things and the "election activities" would be pretty badly "simulated".

Honestly, I think you're right that the paper is misleading people into thinking the system is doing way more than it actually is. But you make it sound like the whole thing is made up and impossible. The reality is somewhere in the middle. Yes they set up hundreds of agents, they give the agents data about the world, some memory of their interactions, and some system prompt to say what actions they can perform. This led to some interesting and surprising behaviours. No, this isn't intelligence, and isn't much more than a fancy representation of what is in the model weights.

  • These are extremely hard problems to solve and it is important for any claims to be validated at this early phase of generative AI.