Comment by caetris2

3 months ago

I've reviewed the paper and I'm confident this paper was fabricated over a collection of false claims. The claims made are not genuine and should not be taken at face value without peer review. The provided charts and graphics are sophisticated forgeries in many cases when reviewing and vetting their applicability to the claims made.

It is currently not possible for any kind of LLM to do what is being proposed, while maybe the intentions are good with regard to commercial interests, I want to be clear: this paper seems indicate that election-related activities were coordinated by groups of AI agents in a simulation. These kinds of claims require substantial evidence and that was not provided.

The prompts that are provided are not in any way connected to an applied usage of LLMs that are described.

I don't think you understood the paper.

The "election" experiment was a prefined scenario. There isn't any "coordination" of election activities. There were preassigned "influencers" using the conversation system built into PIANO. The sentiment was collected automatically by the simulation and the "Election Manager" was another predefined agent. Specically this part of the experiment was designed to look at how the presence or absence of specific modules in the PIANO framework would affect the behavior.

> this paper seems indicate that election-related activities were coordinated by groups of AI agents in a simulation

I mean, that's surely within the training data of LLMs? The effectiveness etc of the election activities is likely very low. But I don't think it's outside the realms of possibility that the agents prompted each other into the latent spaces of the LLM to do with elections.

  • LLMs are stateless and they do not remember the past (as in they don't have a database), making the training data a non-issue here. Therefore, the claims made here in this paper are not possible because the simulation would require each agent to have a memory context larger than any available LLM's context window. The claims made here by the original poster are patently false.

    The ideas here are not supported by any kind of validated understanding of the limitations of language models. I want to be clear -- the kind of AI that is being purported to be used in the paper is something that has been in video games for over 2 decades, which is akin to Starcraft or Diablo's NPCs.

    The key issue is that this is a intentional false claim that can certainly damage mainstream understanding of LLM safety and what is possible at the current state of the art.

    Agentic systems are not well-suited to achieve any of the things that are proposed in the paper, and Generative AI does not enable these kinds of advancements.

    • > LLMs are stateless and they do not remember the past (as in they don't have a database), making the training data a non-issue here.

      That's not what they said. They said that a LLM knows what elections are, which suggests they could have the requisite knowledge to act one out.

      > Therefore, the claims made here in this paper are not possible because the simulation would require each agent to have a memory context larger than any available LLM's context window. The claims made here by the original poster are patently false.

      No, it doesn't. They aren't passing in all prior context at once: they are providing relevant subsets of memory as context. This is a common technique for language agents.

      > Agentic systems are not well-suited to achieve any of the things that are proposed in the paper, and Generative AI does not enable these kinds of advancements.

      This is not new ground. Much of the base social behaviour here comes from Generative Agents [0], which they cite. Much of the Minecraft related behaviour is inspired by Voyager [1], which they also cite.

      There isn't a fundamental breakthrough or innovation here that was patently impossible before, or that they are lying about: this combines prior work, iterates upon it, and scales it up.

      [0]: https://arxiv.org/abs/2304.03442

      [1]: https://voyager.minedojo.org/

      1 reply →

    • Perhaps I've made a big assumption / oversimplification about how this works. But..

      > LLMs are stateless and they do not remember the past (as in they don't have a database), making the training data a non-issue here

      Yes. I never said they were stateful? The context given is the state. And training data is hugely important. Once upon a time there was a guy that claimed ChatGPT could simulate a command line shell. "Simulate" ended up being the wrong word. "Largely hallucinate" was a more accurate description. Shell commands and sessions were for sure part of the training data for ChatGPT, and that's how it could be prompted into largely hallucinating one. Same deal here with "election activities" I think.

      > Therefore, the claims made here in this paper are not possible because the simulation would require each agent to have a memory context larger than any available LLM's context window. The claims made here by the original poster are patently false.

      Well no, they can always trim the data put into the context. And then the agents would start "forgetting" things and the "election activities" would be pretty badly "simulated".

      Honestly, I think you're right that the paper is misleading people into thinking the system is doing way more than it actually is. But you make it sound like the whole thing is made up and impossible. The reality is somewhere in the middle. Yes they set up hundreds of agents, they give the agents data about the world, some memory of their interactions, and some system prompt to say what actions they can perform. This led to some interesting and surprising behaviours. No, this isn't intelligence, and isn't much more than a fancy representation of what is in the model weights.

      1 reply →

For others, it's probably worth pointing that this person's account is about a day old and they have left no contact information for the author's of the paper to follow up with them on.

For "caetris2" I'll just use the same level of rigor and authenticity that you used in your comment when I say "you're full-of-shit/jealous and clearly misunderstood large portions of this paper".

Yeah, I haven't looked into this much so far but I am extremely skeptical of the claims being made here. For one agent to become a tax collector and another to challenge the tax regime without such behavior being hard coded would be extremely impressive.

  • They were assigned roles to examine the spread of information and behaviour. The agents pay tax into a chest, as decreed by the (dynamic) rules. There are agents assigned to the roles of pro- and anti-tax influencers; agents in proximity to these influencers would change their own behaviour appropriately, including voting for changes in the tax.

    So yes, they didn't take on these roles organically, but no, they weren't aiming to do so: they were examining behavioral influence and community dynamics with that particular experiment.

    I'd recommend skimming over the paper; it's a pretty quick read and they aren't making any truly outrageous claims IMO.

  • You can imagine a conversation with an LLM getting to that territory pretty quickly if you pretend to be an unfair tax collector. It sounds impressive on the surface, but in the end it's all LLMs talking to each other, and they'll enit whatever completions are likely given the context.