I feel like there is some kind of information theory constraint which confounds our ability to extract higher order behavior from multiple instances of the same LLM.
I spent quite a bit of time building a multi agent simulation last year and wound up at the same conclusion every day - this is all just a roundabout form of prompt engineering. Perhaps it is useful as a mental model, but you can flatten the whole thing to a few SQL tables and functions. Each "agent" is essentially a sql view that maps a string template forming the prompt.
I don't think you need an actual 3D world, wall clock, etc. The LLM does not seem to be meaningfully enriched by having a fancy representation underly the prompt generation process. There is clearly no "inner world" in these LLMs, so trying to entertain them with a rich outer environment seems pointless.
TBH I haven't seen a single use of LLMs in games that wasn't better served by traditional algorithms beyond less repetitive NPC interactions. Maybe once they get good enough to create usable rigged and textured meshes with enough control to work in-game? They can't create a story on the fly that's reliable enough to be a compelling accompaniment to a coherent game plot. Maps and such don't seem to need anything beyond what current procedural algorithms provide, and they're still working with premade assets— the implementations I've seen can't even reliably place static meshes on the ground in believable positions. And as far as NPCs go— how far does that actually go? It's pure novelty worth far less than an hour of time. Let's even say you get a guided plot progression worded on the fly using an LLM, is that even as good, let alone better, than a dialog tree put together by a professional writer?
This Civ idea at least seems like a new approach to some extent, but it still seems to conceptually not add much. Even if not, learning that it doesn't it's still worthwhile. But almost universally these ideas seem to be either buzzwordy solutions in search of problems, or a cheaper-than-people source of creativity with some serious quality tradeoffs and still require far too much developer wrangling to actually save money.
I'm a tech artist so I'm a bit biased towards the value of human creativity, but also likely the primary demographic for LLM tools in game dev. I am, so far, not compelled.
It's been posted in-depth a few times across this forum to varying degrees by game developers - I was initially very excited about the implementation of LLM's in NPC interactions, until I read some of these posts. The gist of it was - the thing that makes a game fundamentally a game is its constraints. LLM-based NPC's fundamentally break these constraints in a way that is not testable or predictable by the developer and will inevitably destroy the gameplay experience (at least with current technology).
You've absolutely nailed it here, I agree. To make any progress at all at the tremendously difficult problem they are trying to solve, they need to be frank about just how far away they are from what it is they are marketing.
I am whole-heartedly in support of commercial interests to drum of awareness and engagement by the authors. This is definitely a cool thing to be working on, however, what does make more sense is to frame the situation more honestly and attract folks to the desire of solving tremendously hard problems based on a level of expertise and awareness that truly moves the ball forward.
What would be far more interesting would be for the folks involved to say all the ten thousand things that went wrong in their experiments and to lay out the common-sense conclusions from those findings (just like the one you shared, which is truly insightful and correct).
We need to move past this industry and their enablers that continually try to win using the wrong methodology -- pushing away the most inventive and innovative people that are ripe and ready to make paradigm shifts in the AI field and industry.
It would however be very interesting to see these kinds of agents in a commercial video game. Yes they are shallow in their perception of the game world. But they’re a big step up from the status quo.
> I don't think you need an actual 3D world, wall clock, etc. The LLM does not seem to be meaningfully enriched by having a fancy representation underly the prompt generation process.
I don't know how you expect agents to self organize social structures if they don't have a shared reality. I mean, you could write all the prompts yourself, but then that shared reality is just your imagination and you're just DMing for them.
The point of the minecraft environment isn't to "enrich" the "inner world" of the agents and the goal isn't to "entertain" them. The point is to create a set of human understandable challenges in a shared environment so that we can measure behavior and performance of groups of agents in different configurations.
I know we aren't supposed to bring this up, but did you read the article? Nothing of your comment addresses any of the findings or techniques used in this study.
I wrote and played with a fairly simple agentic system and had some of the same thoughts RE higher order behaviour. But I think the counter-points would be that they don't have to all be the same model, and what you might call context management - keeping each agent's "chain of thought" focused and narrow.
The former is basically what MoE is all about, and I've found that at least with smaller models they perform much better with a restricted scope and limited context. If the end result of that is something that do things a single large model can't, isn't that higher order?
You're right that there's no "inner world" but then maybe that's the benefit of giving them one. In the same way that providing a code-running tool to an LLM can allow it to write better code (by trying it out) I can imagine a 3D world being a playground for LLMs to figure out real-world problems in a way they couldn't otherwise. If they did that wouldn't it be higher order?
>I feel like there is some kind of information theory constraint which confounds our ability to extract higher order behavior from multiple instances of the same LLM.
It's a matter of entropy; producing new behaviours requires exploration on the part of the models, which requires some randomness. LLMs have only a minimal amount of entropy introduced, via temperature in the sampler.
As I've pointed out in the past, I also think it's fair to say that we overestimate human variability, and that most human behaviours and language coalesces for the most part.
Also the creative industry, a talking point being that "AIs just rehash existing stuff, they don't produce anything new". Neither do most artists, everything we make is almost always some riff on prior art or nature. Elves are just humans with pointy ears. Goblins are just small elves with green skin. Dwarves are just short humans. Dragons are just big lizards. Aliens are just humans with an odd shaped head and body.
I don't think people realise how very rare it is that any human being experiences or creates something truly novel and not yet experienced or created by our species yet. Most of reality is derivative.
Maybe we need gazelles and cheetahs - many gazelle-agents getting chased towards a goal, doing the brute force work- and the constraint cheetahs chase them, evaluate them and leave them alive (memory intact) as long as they come up with better and better solutions. Basically a evolutionary algo, running on top of many agents, running simultaneously on the same hardware?
I had the opposite thought. Opposite to evolution...
What if we are a CREATED (i.e. instant created, not evolved) set of humans, and evolution and other backstories have been added so that the story of our history is more believable?
Could it be that humanity represents a de novo (Latin for "anew") creation, bypassing the evolutionary process? Perhaps our perception of a gradual ascent from primitive origins is a carefully constructed narrative designed to enhance the credibility of our existence within a larger framework.
What if we are like the Minecraft people in this simulation?
This only works (genetic algo) if you have some random variability in the population. For different models it would work but I feel like it's kind of pointless without the usual feedback mechanism (positive traits are passed on).
That depends on giving them a goal/reward like increasing "data quality".
I mean frogs don't use their brains much either inspite of the rich world around them they don't really explore.
But chimps do. They can't sit quiet in a tree forever and that boils down to their Reward/Motivation Circuitry. They get pleasure out of explore. And if they didn't we wouldn't be here.
Now these seem to be truly artificially intelligent agents. Memory, volition, autonomy, something like an OODA loop or whatever you want to call it, and a persistent environment. Very nice concept, and I'm positive the learnings can be applied to more mundane business problems, too.
If only I could get management to understand that a bunch of prompts shitting into eachother isn't "cutting-edge agentic AI"...
But then again their jobs probably depend on selling something that looks like real innovation happening to the C-levels...
Yup, and "ask" is a verb, God damn it, not a noun. But people in the tech world frequently use "learnings" instead of "lessons," "ask" as a noun, "like" as filler, and "downfall" when they mean "downside." Best to make your peace and move on with life.
"learning" as a noun descends from Old English so has always been current in the language in the intended sense.[1]
"lesson" came from Old French in the 13th century and has changed its original meaning over time.[2]
There's not one single dialect of English so your comment comes off as unnecessarily prescriptivist and has spawned significant off-topic commentary (including this very comment) in response to an otherwise perfectly worded composition.
>If only I could get management to understand that a bunch of prompts shitting into eachother isn't "cutting-edge agentic AI"...
It should never be this way. Even with narrow AI, there needs to be a governance framework that helps measure the output and capture potential risks (hallucinations, wrong data / links, wrong summaries, etc)
I've reviewed the paper and I'm confident this paper was fabricated over a collection of false claims. The claims made are not genuine and should not be taken at face value without peer review. The provided charts and graphics are sophisticated forgeries in many cases when reviewing and vetting their applicability to the claims made.
It is currently not possible for any kind of LLM to do what is being proposed, while maybe the intentions are good with regard to commercial interests, I want to be clear: this paper seems indicate that election-related activities were coordinated by groups of AI agents in a simulation. These kinds of claims require substantial evidence and that was not provided.
The prompts that are provided are not in any way connected to an applied usage of LLMs that are described.
The "election" experiment was a prefined scenario. There isn't any "coordination" of election activities. There were preassigned "influencers" using the conversation system built into PIANO. The sentiment was collected automatically by the simulation and the "Election Manager" was another predefined agent. Specically this part of the experiment was designed to look at how the presence or absence of specific modules in the PIANO framework would affect the behavior.
> this paper seems indicate that election-related activities were coordinated by groups of AI agents in a simulation
I mean, that's surely within the training data of LLMs? The effectiveness etc of the election activities is likely very low. But I don't think it's outside the realms of possibility that the agents prompted each other into the latent spaces of the LLM to do with elections.
LLMs are stateless and they do not remember the past (as in they don't have a database), making the training data a non-issue here. Therefore, the claims made here in this paper are not possible because the simulation would require each agent to have a memory context larger than any available LLM's context window. The claims made here by the original poster are patently false.
The ideas here are not supported by any kind of validated understanding of the limitations of language models. I want to be clear -- the kind of AI that is being purported to be used in the paper is something that has been in video games for over 2 decades, which is akin to Starcraft or Diablo's NPCs.
The key issue is that this is a intentional false claim that can certainly damage mainstream understanding of LLM safety and what is possible at the current state of the art.
Agentic systems are not well-suited to achieve any of the things that are proposed in the paper, and Generative AI does not enable these kinds of advancements.
For others, it's probably worth pointing that this person's account is about a day old and they have left no contact information for the author's of the paper to follow up with them on.
For "caetris2" I'll just use the same level of rigor and authenticity that you used in your comment when I say "you're full-of-shit/jealous and clearly misunderstood large portions of this paper".
Yeah, I haven't looked into this much so far but I am extremely skeptical of the claims being made here. For one agent to become a tax collector and another to challenge the tax regime without such behavior being hard coded would be extremely impressive.
They were assigned roles to examine the spread of information and behaviour. The agents pay tax into a chest, as decreed by the (dynamic) rules. There are agents assigned to the roles of pro- and anti-tax influencers; agents in proximity to these influencers would change their own behaviour appropriately, including voting for changes in the tax.
So yes, they didn't take on these roles organically, but no, they weren't aiming to do so: they were examining behavioral influence and community dynamics with that particular experiment.
I'd recommend skimming over the paper; it's a pretty quick read and they aren't making any truly outrageous claims IMO.
You can imagine a conversation with an LLM getting to that territory pretty quickly if you pretend to be an unfair tax collector. It sounds impressive on the surface, but in the end it's all LLMs talking to each other, and they'll enit whatever completions are likely given the context.
I've thought about this a lot. I'm no philosopher or AI researcher, so I'm just spitballing... but if I were to try my hand at it, I think I'd like to start from "principles" and let systems evolve or at least be discoverable over time
Principles would be things like self-preservation, food, shelter and procreating, communication and memory through a risk-reward calculation prism. Maybe establishing what is "known" vs what is "unknown" is a key component here too, but not in such a binary way.
"Memory" can mean many things, but if you codify it as a function of some type of subject performing some type of action leading to some outcome with some ascribed "risk-reward" profile compared to the value obtained from empirical testing that spans from very negative to very positive, it seems both wide encompassing and generally useful, both to the individual and to the collective.
From there you derive the need to connect with others, disputes over resources, the need to take risks, explore the unknown, share what we've learned, refine risk-rewards, etc. You can guide the civilization to discover certain technologies or inventions or locations we've defined ex ante as their godlike DM which is a bit like cheating because it puts their development "on rails" but also makes it more useful, interesting and relatable.
It sounds computationally prohibitive, but the game doesn't need to play out in real time anyway...
I just think that you can describe a lot of the human condition in terms of "life", "liberty", "love/connection" and "greed".
Looking at the video in the repo, I don't like how this throws "cultures", "memes" and "religion" into the mix instead of letting them be an emergence from the need to communicate and share the belief systems that emerge from our collective memories. Because it seems like a distinction without a difference for the purposes of analyzing this. Also "taxes are high!" without the underlying "I don't have enough resources to get by" seems too much like a mechanical turk
Evolve is another beast... but for the: "I've thought about this a lot. I'm no philosopher or AI researcher, so I'm just spitballing... but if I were to try my hand at it, I think I'd like to start from "principles" and let systems evolve or at least be discoverable over time" part, hunt up a copy of "The Society of Mind" by Minsky who was both and wrote about that idea.
> The work, which first appeared in 1986, was the first comprehensive description of Minsky's "society of mind" theory, which he began developing in the early 1970s. It is composed of 270 self-contained essays which are divided into 30 general chapters. The book was also made into a CD-ROM version.
> In the process of explaining the society of mind, Minsky introduces a wide range of ideas and concepts. He develops theories about how processes such as language, memory, and learning work, and also covers concepts such as consciousness, the sense of self, and free will; because of this, many view The Society of Mind as a work of philosophy.
> The book was not written to prove anything specific about AI or cognitive science, and does not reference physical brain structures. Instead, it is a collection of ideas about how the mind and thinking work on the conceptual level.
Its very approachable as a layperson in that part of the field of AI.
Wow, you are maybe the first person I’ve seen cite Minsky on HN, which is surprising since he’s arguably the most influential AI researcher of all time, maybe short of Turing or Pearl. To add on to the endorsement: the cover of the book is downright gorgeous, in a retro-computing way
Many of these projects are inch deep into intelligence and miles deep into the current technology. Some things will see tremendous benefits but as far as artificial intelligence we’re not there yet. Im thinking gaming will benefit a lot from these..
You mean we're not there in simulating an actual human brain? Sure. But we're seeing AI work like a human well enough to be useful, isn't that the point?
Memory is really interesting. For example, if you play 100,000 rounds of 5x5 Tic Tac Toe. Do you really need to remember game 51247 or do you recognize and remember a winning pattern? In Reinforcement Learning you would based on each win revise the policy. How would that work for genAI?
It does not strike me as particularly useful from a scientific research perspective. There does not appear to be much thought put into experimental design and really no clear objectives. Is the bar really this low for academic research these days?
it looks like a group consisted largely of ex-academics using aspects of the academic form but they stop short of framing it as a research paper as-such. they call it a technical report, where it's generally more okay to be like 'here's a thing that we did', along with detailed reporting on the thing, without necessarily having definite research questions. this one does seem to be pretty diffuse though. the sections on Specialization and Cultural Transmission were both interesting, but lacked precise experimental design details to the point where i wish they had just focused on one or the other.
one disappointment for me was the lack of focus on external metrics in the multi-agent case. their single-agent benchmark focusses on an external metric (time to block type), but all the multi-agent analyses seems to be internal measures (role specialization, meme spread) without looking at (AFAICT?) whether or not the collective multi-agent systems could achieve more than the single agents on some measure of economic productivity/complexity. this is clearly related to the specialization section but without consideration of the whether said emergent role division had economic consequences/antecedents it makes me wonder to what degree the whole thing is a pantomime.
I'm curious if it might be possible that an AI "civilization", similar to the one proposed by Altera, could end up being a better paradigm for AGI than a single LLM, if a workable reward system for the entire civilization was put in place. Meaning, suppose this AI civilization was striving to maximize [scientific_output] or [code_quality] or any other eval, similar to how modern countries try to maximize GDP - would that provide better results than a single AI agent working towards that goal?
Yes, good sense for progress! This has been a central design component of most serious AI work since the ~90s, most notably popularized by Marvin Minsky’s The Society of Mind. Highly, highly recommend for anyone with an interest in the mind and AI — it’s a series of one-page essays on different aspects of the thesis, which is a fascinating, Martin-Luther-esque format.
Of course this has been pushed to the side a bit in the rush towards shiny new pure-LLM approaches, but I think that’s more a function of a rapidly growing user base than of lost knowledge; the experts still keep this in mind, either in these terms or in terms of “Ensembles”. A great example is GPT-4, which AFAIU got its huge performance increase mostly through employing a “mixture of experts”, which is clearly a synonym for a society of agents or an ensemble of models.
I don't think "mixture of experts" can be assimilated to a society of agents. It is just routing a prompt to the most performant model: the models do not communicate with each other, so how could they form a society ?
This seems very cool - I am sceptical of the supposed benefits for "civilization" but it could at least make for some very interesting sim games. (So maybe it will be good for Civilization moreso than civilization.)
I think the Firaxis Civilization needs a cheap AlphaZero AI rather than an LLM: there are too many dumb footguns in Civ to economically hard-code a good strategic AI, yet solving the problem by making the enemies cheat is plain frustrating. It would be interesting to let an ANN play against a "classical" AI until it consistently beats each difficulty level, building a hierarchy. I am sure someone has already looked into this but I couldn't find any sources.
I am a bit skeptical about how computationally expensive a very crappy Civ ANN would be to run at inference time, though I actually have no idea how that scales - it hardly needs to be a grandmaster, but the distribution of dumb mistakes has a long tail.
Also, the DeepMind Starcraft 2 AI is different from AlphaZero since Starcraft is not a perfect information game. The AI requires a database of human games to "get off the ground"; otherwise it would just get crushed over and over in the early game, having no idea what the opponent is doing. It's hard to get that training data with a brand new game. Likewise Civ has always been a bit more focused on artistic expression than other 4x strategy games; maybe having to retrain an AI for every new Wonder is just too much of a burden.
Galactic Civilizations 2 (also, 1,3,4 ??) in the same genre is well-known for its AI, good even without handicaps or cheats. This includes trading negotiations BTW.
(At least good compared to what other 4X have, and your average human player - not the top players that are the ones that tend to discuss the game online in the first place.)
EDIT : I suspect that it's not unrelated that GalCiv2 is kind of... boring as 4X go - as a result of a good AI having been a base requirement ?
Speaking of StarCraft AI... (for SC1, not 2, and predating AlphaZero by many years) :
I really dig namechecking Sid Meier for the name of the project. I'm also skeptical that this project actually works as presented, but building a Civilization game off of a Minecraft engine is a deeply interesting idea.
I'm somewhat amazed that companies releasing strategy games aren't using AI to test out different cards and what not to find broken things before release (looking at you, Hearthstone)
Yeah, I was dissapointed (and thrilled, from a p(doom) perspective) to see it implemented in Minecraft instead of Civilization VI, Humankind, or any of the main Paradox grand strategies (namely Stellaris, Victoria, Crusader Kings, and Europa Universalis). To say the least, the stakes are higher and more realistic than "lets plan a feast" "ok, I'll gather some wood!"
To be fair, they might tackle this in the paper -- this is a preprint of a preprint, somehow...
I suspect that Minecraft might have the open source possibilities (or at least programming interfaces ?) that the other games you listed lack ?
For Civilizations, the more recent they are, the more closed off they tend to be : Civ 1 and/or 2 have basically been remade from scratch as open source, Civ 4 has most of the game open sourced in the two tiers of C++ and Python... but AFAIK Civ 5 (and also 6 ?) were large regressions in modding capabilities compared to 4 ?
I'm reminded of Dwarf Fortress, which simulates thousands of years of dwarf world time, the changing landscapes and the rise and fall and rise and fall of dwarf kingdoms, then drops seven player-controlled dwarves on the map and tells the player "have fun!" It'd be a useful toy model perhaps for identifying areas of investigation to see if it can predict behavior of real civilizations, but I'm not seeing any AI breakthroughs here.
In case anyone is wondering, this is a reference to the movie Virtuosity (1995). I thought it was a few years later, considering the content. It’s a good watch if you like 90s cyberpunk movies.
Reading the paper, this seems like putting the cart before the horse: the agents individually are not actually capable of playing Minecraft and cannot successfully perform the tasks they've assigned or volunteered for, so in some sense the authors are having dogs wear human clothes and declaring it's a human-like civilization. Further, crucial things are essentially hard-coded: what types of societies are available and (I believe) the names of the roles. I am not exactly sure what the social organization is supposed to imply: the strongest claim you could make is that the agent framework could work for video game NPCs because the agents stick to their roles and factions. The claim that agents "can use legal structures" strikes me as especially specious, since "use the legal structure" is hard-wired into the various agents' behavior. Trying to extend all this to actual human society seems ridiculous, and it does not help that the authors blithely ignore sociology and anthropology.
There are some other highly specious claims:
- I said "I believe" the names of the roles are hard-coded, but unless I missed something the information is unacceptably vague. I don't see anything in the agent prompts that would make them create new roles, or assign themselves to roles at all. Again I might be missing something, but the more I read the more confused I get.
- claiming that the agents formed long-term social relationships over the course of 12 Minecraft days, but that's only four real hours and the agents experience real time: the length of a Minecraft day is immaterial! I think "form long-term social relationships" and "use legal structures" aren't merely immodest, they're dishonest.
- the meme / religious transmission stuff totally ignores training data contamination with GPT-4. The summarized meme clearly indicates awareness of the real-world Pastafarian meme, so it is simply wrong to conclude that this meme is being "transmitted," when it is far more likely that it was evoked in an agent that already knew the meme. Why not run this experiment with a truly novel fake religion? Some of the meme examples do seem novel, like "oak log crafting syndrome," but others like "meditation circle" or "vintage fashion and retro projects" have nothing to do with Minecraft and are almost certainly GPT-4 hallucinations.
In general using GPT-4 for this seems like a terrible mistake (if you are interested in doing honest research).
You are on the right track in my opinion. The key is to encode the interface between the game and the agent so that the agent can make a straightforward choice. For example, by giving the agent the state of a nxn board as the world model, and then a finite set of choices, an agent is capable of playing the game robustly and explaining the decision to the game master. This gives the illusion that the agent reasons. I guess my point is that it's an encoding problem of the world model to break it down into a simple choice.
Selfishness is the main reason life exists in the universe. Literally the only requirement for a lump of stuff to become alive is to become selfish. So you’re semi right that these LLMs can never become truly sentient unless they actually become selfish.
While selfishness is a basic requirement, some stupidity (imo) is also important for intelligent life. If you as an AI agent don’t have some level of stupidity, you’ll instantly see that there’s no point to doing anything and just switch yourself off.
The first point is absolutely correct, and (apologies in advance…) was a large driver of Nietzsche’s philosophy of evolution, most explicitly covered in The Gay Science. Not only “selfishness”, but the wider idea of particularized standpoints, each of which may stand in contradiction to the direct needs of the society/species in the moment. This is a large part of what he meant by his notoriously dumb-sounding quotes like “everything is permitted”; morality isn’t relative/nonexistent, it’s just evolving in a way that relies on immorality as a foil.
For the second part, I think that’s a good exposition of why “stupidity” and “intelligence” aren’t scientifically useful terms. I don’t think it’s necessarily “stupid” to prefer the continuation of yourself/your species, even if it doesn’t stand up to certain kinds of standpoint-specific intellectual inquiry. There’s lots of standpoints (dare I say most human ones) where life is preferable to non-life.
Regardless, my daily thesis is that LLMs are the first real Intuitive Algorithms, and thus the solution to the Frame Problem. In a certain colloquial sense, I’d say they’re absolutely already “stupid”, and this is where they draw their utility from. This is just a more general rephrasing of the common refrain that we’ve hopefully all learned by now: hallucinations are not a bug in LLMs, they’re a feature.
The entire paper demonstrated the results of the simulation or whatever they did. They did not mention how did they achieve this simulation. running 500-1000 LLMs parallely, will take too much computing resources, neither did they prove the claim they made about their parallel architecture. I remeber there was the paper published about an AI town, in which they mentioned clearly how they implemented it. they also released a recording of the simluation along with the real data of the results. If anyone got how they implemented this paper, please tell me.
> Professor Dobb's book is devoted to personetics, which the Finnish philosopher Eino Kaikki has called 'the cruelest science man ever created'. . . We are speaking of a discipline, after all, which, with only a small amount of exaggeration, for emphasis, has been called 'experimental theogony'. . . Nine years ago identity schemata were being developed—primitive cores of the 'linear' type—but even that generation of computers, today of historical value only, could not yet provide a field for the true creation of personoids.
> The theoretical possibility of creating sentience was divined some time ago, by Norbert Wiener, as certain passages of his last book, God and Golem, bear witness. Granted, he alluded to it in that half-facetious manner typical of him, but underlying the facetiousness were fairly grim premonitions. Wiener, however, could not have foreseen the turn that things would take twenty years later. The worst came about—in the words of Sir Donald Acker—when at MIT "the inputs were shorted to the outputs".
Honestly I'm really excited about this. I've always dreamed of full blown sandbox games with extremely advanced NPCs (which the current LLMs can already kinda emulate), but on the bigger scale. In just a few decades this will finally be made into proper games. I can't wait.
> I've always dreamed of full blown sandbox games with extremely advanced NPCs (which the current LLMs can already kinda emulate)
The future of gaming is going to get weird fast with all this new tech, and there are a lot of new mechanics emerging that just weren't possible before LLMs, generative AI, etc.
At our game studio we're already building medium-scale sandbox games where NPCs form memories, opinions, problems (that translate to quests), and have a continuous "internal monologue" that uses all of this context plus sensory input from their place in a 3D world to constantly decide what actions they should be performing in the game world. A player can decide to chat with an NPC about their time at a lake nearby and then see that NPC deciding to go visit the lake the next day.
A paper last year ("Generative Agents: Interactive Simulacra of Human Behavior", [0]) is a really good sneak-peek into the kind of evolving sandboxes LLMs (with memory and decisionmaking) enable. There's a lot of cool stuff that happens in that "game", but one anecdote I always think back to is this: in a conversation between two NPCs, one happens to mention they have a birthday coming up to the other; and that other NPC then goes around town talking to other NPCs about a birthday party, and _those_ NPCs mention the party to other NPCs, and so on until the party happened and most of the NPCs in town arrived on time. None of it was scripted, but you very quickly start to see emergent behavior from these sorts of "flocks" of agents as soon as you add persistence and decision-making. And there are other interesting layers games can add for even more kinds of emergent behavior; that's what we're exploring at our studio [1], and I've seen lots of other studios pop up this last year to try their hand at it too.
I'm optimistic and excited about the future of gaming (or, at least some new genres). It should be fun. :)
I think it can be quite interesting especially if you consider different character types (in Anthropic lingo this "personality"). The only problem right now is that using a proprietary LLM is incredibly expensive. Therefore having a local LLM might be the best option. Unfortunately, these are still not on the same level as their larger brethren.
Rimworld is heavily inspired by Dwarf Fortress, so if you’re looking for more complex examples you don’t have to look far. DF is pretty granular with the physical and mental states of its characters - to the point that a character might lose a specific toe or get depressed about their situation - but of course it’s still a video game, not a scientific simulation of an AI society.
> Honestly I'm really excited about this. I've always dreamed of full blown sandbox games with extremely advanced NPCs (which the current LLMs can already kinda emulate), but on the bigger scale.
I don't believe that you want this. Even really good players don't have a chance against super-advanced NPCs (think how chess grandmasters have barely any chance against modern chess programs running on a fast computer). You will rather get crushed.
What you likely want is NPC that "behave more human-like (or animal-like)" - whatever this means.
Oh, I should've clarified - I don't want to fight against them, I just want to watch and sometimes interfere to see how the agents react ;) A god game like WorldBox/Galimulator, if you will. Or observer mode in tons of games like almost all Paradox ones.
>Even really good players don't have a chance against super-advanced NPCs
I guess you can make them dumber by randomly switching to hardcoded behavioral trees (without modern AI) once in a while so that they made mistakes (while feeling pretty intelligent overall), and the player would then have a chance to outsmart them.
I'm very confused; is there any emergent behavior in this paper, or it's just like "role-play" based on data about what humans do in the LLM. Like, wouldn't they create novel social structures if they had needs? That doesn't seem so hard to program (the needs part).
Just yesterday I was wondering how the Midjourney equivalent world gen mod for Minecraft might be coming along. Imagine prompting the terrain gen?? That could be pretty mind blowing.
Describe the trees hills vines, tree colors/patterns, castles, towns, details of all buildings and other features. And have it generate as high quality in Minecraft as image gen can be in stable diffusion?
Interesting context, but highlights all the problems of machine learning models: the lack of reason and abstraction and so on. Hard to say yet how much of an issue this might be, but the medium will almost certainly reveal something about our potential options for social organization.
I think their top-down approach is a problem. What they call human civilization wasn't and isn't centrally-planned, and its goals and ideologies are neither universal nor implicit. The integration of software agents (I refuse to call them "AI") into civilization won't occur in a de facto cooperative framework where such agents are permitted to fraternize and self-modify. Perhaps that will happen in walled gardens where general-purpose automatons can collectively 'plan' activities to maximize efficiency, but in our broader human world, any such collaboration is going to have to occur from the bottom-up and for the initial benefit of the agents' owners.
This kind of research needs to take place in an adversarial environment. There might be something interesting to learn from studying the (lack of?) emergence of collaboration there.
Really interesting but curious how civilization here holds up without deeper human-like complexity, feels like it might lean more toward scripted behaviors than real societies
They probably will fall fast into tragedy of the commons kind of situations. We developed most of our civilization while there was enough room for growing and big decisions were centralized, and started to get into bad troubles when things became global enough.
With AIs some of those "protections" may not be there. And hardcoding strategies to avoid this may already put a limit on what we are simulating.
> We developed most of our civilization while there was enough room for growing and big decisions were centralized, and started to get into bad troubles when things became global enough.
Citation needed. But even if I will get on board with you on that, wouldn't it be to start developing for global scale right from the start, instead of starting in small local islands and then try to rework that into global ecosystem?
The problem with emulations is human patience. If you don't need/have human interaction this may run pretty fast. And at the end, what matter is how sustainable it is in the long run.
Does this mean that individual complexity is a natural enemy of group cohesiveness? Or is individual 'selfishness' more a product of evolutionary background.
On our planet we don't have ant colony dynamics at the physical scale of high intelligence (that I know of), but there are very physical limitations to things like food sources.
Virtual simulations don't have the same limitations, so the priors may be quite different.
Taking the "best" course of action from your own point of view could not be so good from a more broad perspective. We might have evolved some small group collaboration approaches that in the long run plays better, but in large groups that doesn't go that well. And for AIs trying to optimize something without some big picture vision, things may go wrong faster.
I feel like there is some kind of information theory constraint which confounds our ability to extract higher order behavior from multiple instances of the same LLM.
I spent quite a bit of time building a multi agent simulation last year and wound up at the same conclusion every day - this is all just a roundabout form of prompt engineering. Perhaps it is useful as a mental model, but you can flatten the whole thing to a few SQL tables and functions. Each "agent" is essentially a sql view that maps a string template forming the prompt.
I don't think you need an actual 3D world, wall clock, etc. The LLM does not seem to be meaningfully enriched by having a fancy representation underly the prompt generation process. There is clearly no "inner world" in these LLMs, so trying to entertain them with a rich outer environment seems pointless.
TBH I haven't seen a single use of LLMs in games that wasn't better served by traditional algorithms beyond less repetitive NPC interactions. Maybe once they get good enough to create usable rigged and textured meshes with enough control to work in-game? They can't create a story on the fly that's reliable enough to be a compelling accompaniment to a coherent game plot. Maps and such don't seem to need anything beyond what current procedural algorithms provide, and they're still working with premade assets— the implementations I've seen can't even reliably place static meshes on the ground in believable positions. And as far as NPCs go— how far does that actually go? It's pure novelty worth far less than an hour of time. Let's even say you get a guided plot progression worded on the fly using an LLM, is that even as good, let alone better, than a dialog tree put together by a professional writer?
This Civ idea at least seems like a new approach to some extent, but it still seems to conceptually not add much. Even if not, learning that it doesn't it's still worthwhile. But almost universally these ideas seem to be either buzzwordy solutions in search of problems, or a cheaper-than-people source of creativity with some serious quality tradeoffs and still require far too much developer wrangling to actually save money.
I'm a tech artist so I'm a bit biased towards the value of human creativity, but also likely the primary demographic for LLM tools in game dev. I am, so far, not compelled.
It's been posted in-depth a few times across this forum to varying degrees by game developers - I was initially very excited about the implementation of LLM's in NPC interactions, until I read some of these posts. The gist of it was - the thing that makes a game fundamentally a game is its constraints. LLM-based NPC's fundamentally break these constraints in a way that is not testable or predictable by the developer and will inevitably destroy the gameplay experience (at least with current technology).
1 reply →
Nobody will know for sure until a big budget game is actually released with a serious effort behind its NPCs.
12 replies →
You've absolutely nailed it here, I agree. To make any progress at all at the tremendously difficult problem they are trying to solve, they need to be frank about just how far away they are from what it is they are marketing.
I am whole-heartedly in support of commercial interests to drum of awareness and engagement by the authors. This is definitely a cool thing to be working on, however, what does make more sense is to frame the situation more honestly and attract folks to the desire of solving tremendously hard problems based on a level of expertise and awareness that truly moves the ball forward.
What would be far more interesting would be for the folks involved to say all the ten thousand things that went wrong in their experiments and to lay out the common-sense conclusions from those findings (just like the one you shared, which is truly insightful and correct).
We need to move past this industry and their enablers that continually try to win using the wrong methodology -- pushing away the most inventive and innovative people that are ripe and ready to make paradigm shifts in the AI field and industry.
It would however be very interesting to see these kinds of agents in a commercial video game. Yes they are shallow in their perception of the game world. But they’re a big step up from the status quo.
3 replies →
> I don't think you need an actual 3D world, wall clock, etc. The LLM does not seem to be meaningfully enriched by having a fancy representation underly the prompt generation process.
I don't know how you expect agents to self organize social structures if they don't have a shared reality. I mean, you could write all the prompts yourself, but then that shared reality is just your imagination and you're just DMing for them.
The point of the minecraft environment isn't to "enrich" the "inner world" of the agents and the goal isn't to "entertain" them. The point is to create a set of human understandable challenges in a shared environment so that we can measure behavior and performance of groups of agents in different configurations.
I know we aren't supposed to bring this up, but did you read the article? Nothing of your comment addresses any of the findings or techniques used in this study.
I wrote and played with a fairly simple agentic system and had some of the same thoughts RE higher order behaviour. But I think the counter-points would be that they don't have to all be the same model, and what you might call context management - keeping each agent's "chain of thought" focused and narrow.
The former is basically what MoE is all about, and I've found that at least with smaller models they perform much better with a restricted scope and limited context. If the end result of that is something that do things a single large model can't, isn't that higher order?
You're right that there's no "inner world" but then maybe that's the benefit of giving them one. In the same way that providing a code-running tool to an LLM can allow it to write better code (by trying it out) I can imagine a 3D world being a playground for LLMs to figure out real-world problems in a way they couldn't otherwise. If they did that wouldn't it be higher order?
>I feel like there is some kind of information theory constraint which confounds our ability to extract higher order behavior from multiple instances of the same LLM.
It's a matter of entropy; producing new behaviours requires exploration on the part of the models, which requires some randomness. LLMs have only a minimal amount of entropy introduced, via temperature in the sampler.
As I've pointed out in the past, I also think it's fair to say that we overestimate human variability, and that most human behaviours and language coalesces for the most part.
Also the creative industry, a talking point being that "AIs just rehash existing stuff, they don't produce anything new". Neither do most artists, everything we make is almost always some riff on prior art or nature. Elves are just humans with pointy ears. Goblins are just small elves with green skin. Dwarves are just short humans. Dragons are just big lizards. Aliens are just humans with an odd shaped head and body.
I don't think people realise how very rare it is that any human being experiences or creates something truly novel and not yet experienced or created by our species yet. Most of reality is derivative.
Maybe we need gazelles and cheetahs - many gazelle-agents getting chased towards a goal, doing the brute force work- and the constraint cheetahs chase them, evaluate them and leave them alive (memory intact) as long as they come up with better and better solutions. Basically a evolutionary algo, running on top of many agents, running simultaneously on the same hardware?
Do you want stressed and panicking agents? Do you think they'll produce good output?
In my prompting experience, I mostly do my best to give the AI way, way more slack than it thinks it has.
1 reply →
I had the opposite thought. Opposite to evolution...
What if we are a CREATED (i.e. instant created, not evolved) set of humans, and evolution and other backstories have been added so that the story of our history is more believable?
Could it be that humanity represents a de novo (Latin for "anew") creation, bypassing the evolutionary process? Perhaps our perception of a gradual ascent from primitive origins is a carefully constructed narrative designed to enhance the credibility of our existence within a larger framework.
What if we are like the Minecraft people in this simulation?
4 replies →
This only works (genetic algo) if you have some random variability in the population. For different models it would work but I feel like it's kind of pointless without the usual feedback mechanism (positive traits are passed on).
That depends on giving them a goal/reward like increasing "data quality".
I mean frogs don't use their brains much either inspite of the rich world around them they don't really explore.
But chimps do. They can't sit quiet in a tree forever and that boils down to their Reward/Motivation Circuitry. They get pleasure out of explore. And if they didn't we wouldn't be here.
so well put. exactly how I've been feeling and trying to verbalize.
Now these seem to be truly artificially intelligent agents. Memory, volition, autonomy, something like an OODA loop or whatever you want to call it, and a persistent environment. Very nice concept, and I'm positive the learnings can be applied to more mundane business problems, too.
If only I could get management to understand that a bunch of prompts shitting into eachother isn't "cutting-edge agentic AI"...
But then again their jobs probably depend on selling something that looks like real innovation happening to the C-levels...
> If only I could get management to understand that a bunch of prompts shitting into eachother isn't "cutting-edge agentic AI"...
It's unclear to me how the linked project is different from what you described.
Plenty of existing agents have "memory" and many other things you named.
Just so you know, the English noun for things that have been learned is, "lessons."
I believe that “learnings” is also a word that could be applied in this context.
It seems to me “learnings” would actually be less ambiguous than “lessons”. A lesson brings to mind a thing being taught, not just learned.
5 replies →
Also: "learnings".
https://dictionary.cambridge.org/us/dictionary/english/learn...
"knowledge or a piece of information obtained by study or experience"
"I am already incorporating some of these learnings into my work and getting better results."
3 replies →
Yup, and "ask" is a verb, God damn it, not a noun. But people in the tech world frequently use "learnings" instead of "lessons," "ask" as a noun, "like" as filler, and "downfall" when they mean "downside." Best to make your peace and move on with life.
Just FYI: that second comma is incorrect.
3 replies →
I'm an old man and have heard "learnings" used to mean "lessons" for most of my life.
I think "learnings" has advantages over "lessons" given that "learnings" has one meaning, while "lessons" can have more than one meaning.
Whether it's correct or not, are we surprised it's used this way? Consider the word "earnings" and how similar its definition is to "learnings."
"learning" as a noun descends from Old English so has always been current in the language in the intended sense.[1]
"lesson" came from Old French in the 13th century and has changed its original meaning over time.[2]
There's not one single dialect of English so your comment comes off as unnecessarily prescriptivist and has spawned significant off-topic commentary (including this very comment) in response to an otherwise perfectly worded composition.
[1]: https://www.etymonline.com/word/learning [2]: https://www.etymonline.com/word/lesson
Learnings is also correct...
Learned can also be learnt (my preference), etc. English has a lot of redundancy, but that's why we love it, right?
>If only I could get management to understand that a bunch of prompts shitting into eachother isn't "cutting-edge agentic AI"...
It should never be this way. Even with narrow AI, there needs to be a governance framework that helps measure the output and capture potential risks (hallucinations, wrong data / links, wrong summaries, etc)
Do you have any resources on that topic? I’d be interested.
2 replies →
I've reviewed the paper and I'm confident this paper was fabricated over a collection of false claims. The claims made are not genuine and should not be taken at face value without peer review. The provided charts and graphics are sophisticated forgeries in many cases when reviewing and vetting their applicability to the claims made.
It is currently not possible for any kind of LLM to do what is being proposed, while maybe the intentions are good with regard to commercial interests, I want to be clear: this paper seems indicate that election-related activities were coordinated by groups of AI agents in a simulation. These kinds of claims require substantial evidence and that was not provided.
The prompts that are provided are not in any way connected to an applied usage of LLMs that are described.
I don't think you understood the paper.
The "election" experiment was a prefined scenario. There isn't any "coordination" of election activities. There were preassigned "influencers" using the conversation system built into PIANO. The sentiment was collected automatically by the simulation and the "Election Manager" was another predefined agent. Specically this part of the experiment was designed to look at how the presence or absence of specific modules in the PIANO framework would affect the behavior.
> this paper seems indicate that election-related activities were coordinated by groups of AI agents in a simulation
I mean, that's surely within the training data of LLMs? The effectiveness etc of the election activities is likely very low. But I don't think it's outside the realms of possibility that the agents prompted each other into the latent spaces of the LLM to do with elections.
LLMs are stateless and they do not remember the past (as in they don't have a database), making the training data a non-issue here. Therefore, the claims made here in this paper are not possible because the simulation would require each agent to have a memory context larger than any available LLM's context window. The claims made here by the original poster are patently false.
The ideas here are not supported by any kind of validated understanding of the limitations of language models. I want to be clear -- the kind of AI that is being purported to be used in the paper is something that has been in video games for over 2 decades, which is akin to Starcraft or Diablo's NPCs.
The key issue is that this is a intentional false claim that can certainly damage mainstream understanding of LLM safety and what is possible at the current state of the art.
Agentic systems are not well-suited to achieve any of the things that are proposed in the paper, and Generative AI does not enable these kinds of advancements.
4 replies →
For others, it's probably worth pointing that this person's account is about a day old and they have left no contact information for the author's of the paper to follow up with them on.
For "caetris2" I'll just use the same level of rigor and authenticity that you used in your comment when I say "you're full-of-shit/jealous and clearly misunderstood large portions of this paper".
Yeah, I haven't looked into this much so far but I am extremely skeptical of the claims being made here. For one agent to become a tax collector and another to challenge the tax regime without such behavior being hard coded would be extremely impressive.
They were assigned roles to examine the spread of information and behaviour. The agents pay tax into a chest, as decreed by the (dynamic) rules. There are agents assigned to the roles of pro- and anti-tax influencers; agents in proximity to these influencers would change their own behaviour appropriately, including voting for changes in the tax.
So yes, they didn't take on these roles organically, but no, they weren't aiming to do so: they were examining behavioral influence and community dynamics with that particular experiment.
I'd recommend skimming over the paper; it's a pretty quick read and they aren't making any truly outrageous claims IMO.
2 replies →
You can imagine a conversation with an LLM getting to that territory pretty quickly if you pretend to be an unfair tax collector. It sounds impressive on the surface, but in the end it's all LLMs talking to each other, and they'll enit whatever completions are likely given the context.
I've thought about this a lot. I'm no philosopher or AI researcher, so I'm just spitballing... but if I were to try my hand at it, I think I'd like to start from "principles" and let systems evolve or at least be discoverable over time
Principles would be things like self-preservation, food, shelter and procreating, communication and memory through a risk-reward calculation prism. Maybe establishing what is "known" vs what is "unknown" is a key component here too, but not in such a binary way.
"Memory" can mean many things, but if you codify it as a function of some type of subject performing some type of action leading to some outcome with some ascribed "risk-reward" profile compared to the value obtained from empirical testing that spans from very negative to very positive, it seems both wide encompassing and generally useful, both to the individual and to the collective.
From there you derive the need to connect with others, disputes over resources, the need to take risks, explore the unknown, share what we've learned, refine risk-rewards, etc. You can guide the civilization to discover certain technologies or inventions or locations we've defined ex ante as their godlike DM which is a bit like cheating because it puts their development "on rails" but also makes it more useful, interesting and relatable.
It sounds computationally prohibitive, but the game doesn't need to play out in real time anyway...
I just think that you can describe a lot of the human condition in terms of "life", "liberty", "love/connection" and "greed".
Looking at the video in the repo, I don't like how this throws "cultures", "memes" and "religion" into the mix instead of letting them be an emergence from the need to communicate and share the belief systems that emerge from our collective memories. Because it seems like a distinction without a difference for the purposes of analyzing this. Also "taxes are high!" without the underlying "I don't have enough resources to get by" seems too much like a mechanical turk
Evolve is another beast... but for the: "I've thought about this a lot. I'm no philosopher or AI researcher, so I'm just spitballing... but if I were to try my hand at it, I think I'd like to start from "principles" and let systems evolve or at least be discoverable over time" part, hunt up a copy of "The Society of Mind" by Minsky who was both and wrote about that idea.
https://en.wikipedia.org/wiki/Society_of_Mind
> The work, which first appeared in 1986, was the first comprehensive description of Minsky's "society of mind" theory, which he began developing in the early 1970s. It is composed of 270 self-contained essays which are divided into 30 general chapters. The book was also made into a CD-ROM version.
> In the process of explaining the society of mind, Minsky introduces a wide range of ideas and concepts. He develops theories about how processes such as language, memory, and learning work, and also covers concepts such as consciousness, the sense of self, and free will; because of this, many view The Society of Mind as a work of philosophy.
> The book was not written to prove anything specific about AI or cognitive science, and does not reference physical brain structures. Instead, it is a collection of ideas about how the mind and thinking work on the conceptual level.
Its very approachable as a layperson in that part of the field of AI.
Wow, you are maybe the first person I’ve seen cite Minsky on HN, which is surprising since he’s arguably the most influential AI researcher of all time, maybe short of Turing or Pearl. To add on to the endorsement: the cover of the book is downright gorgeous, in a retro-computing way
https://d28hgpri8am2if.cloudfront.net/book_images/cvr9780671...
1 reply →
Many of these projects are inch deep into intelligence and miles deep into the current technology. Some things will see tremendous benefits but as far as artificial intelligence we’re not there yet. Im thinking gaming will benefit a lot from these..
You mean we're not there in simulating an actual human brain? Sure. But we're seeing AI work like a human well enough to be useful, isn't that the point?
11 replies →
Memory is really interesting. For example, if you play 100,000 rounds of 5x5 Tic Tac Toe. Do you really need to remember game 51247 or do you recognize and remember a winning pattern? In Reinforcement Learning you would based on each win revise the policy. How would that work for genAI?
So a modernized version of Spore.
Basically what we all wished Spore had been ;-)
Huh, so the video actually works ? It just shows up «No video with supported format and MIME type found.» for me...
Yeah, memes and genes are both memory, though at different timescales.
It works on some browsers. I'm normally on Firefox but had to dust off Safari to watch it. Crazy I still have to do this in 2024...
This looks like it is a really cool toy.
It does not strike me as particularly useful from a scientific research perspective. There does not appear to be much thought put into experimental design and really no clear objectives. Is the bar really this low for academic research these days?
Keep in mind anyone can publish on Arxiv and it's not at the top of HN on the merit of its research contributions.
it looks like a group consisted largely of ex-academics using aspects of the academic form but they stop short of framing it as a research paper as-such. they call it a technical report, where it's generally more okay to be like 'here's a thing that we did', along with detailed reporting on the thing, without necessarily having definite research questions. this one does seem to be pretty diffuse though. the sections on Specialization and Cultural Transmission were both interesting, but lacked precise experimental design details to the point where i wish they had just focused on one or the other.
one disappointment for me was the lack of focus on external metrics in the multi-agent case. their single-agent benchmark focusses on an external metric (time to block type), but all the multi-agent analyses seems to be internal measures (role specialization, meme spread) without looking at (AFAICT?) whether or not the collective multi-agent systems could achieve more than the single agents on some measure of economic productivity/complexity. this is clearly related to the specialization section but without consideration of the whether said emergent role division had economic consequences/antecedents it makes me wonder to what degree the whole thing is a pantomime.
wouldn't surprise me if in a few weeks/months we see this repo packaged up as a for-sale product for the games industry
The scientific method has utility, but it's not a pre-requisite for utility.
Some people prefer speed and the uncertainty that comes with it.
I'm curious if it might be possible that an AI "civilization", similar to the one proposed by Altera, could end up being a better paradigm for AGI than a single LLM, if a workable reward system for the entire civilization was put in place. Meaning, suppose this AI civilization was striving to maximize [scientific_output] or [code_quality] or any other eval, similar to how modern countries try to maximize GDP - would that provide better results than a single AI agent working towards that goal?
Yes, good sense for progress! This has been a central design component of most serious AI work since the ~90s, most notably popularized by Marvin Minsky’s The Society of Mind. Highly, highly recommend for anyone with an interest in the mind and AI — it’s a series of one-page essays on different aspects of the thesis, which is a fascinating, Martin-Luther-esque format.
Of course this has been pushed to the side a bit in the rush towards shiny new pure-LLM approaches, but I think that’s more a function of a rapidly growing user base than of lost knowledge; the experts still keep this in mind, either in these terms or in terms of “Ensembles”. A great example is GPT-4, which AFAIU got its huge performance increase mostly through employing a “mixture of experts”, which is clearly a synonym for a society of agents or an ensemble of models.
I don't think "mixture of experts" can be assimilated to a society of agents. It is just routing a prompt to the most performant model: the models do not communicate with each other, so how could they form a society ?
2 replies →
Paperclip production?
This seems very cool - I am sceptical of the supposed benefits for "civilization" but it could at least make for some very interesting sim games. (So maybe it will be good for Civilization moreso than civilization.)
I think the Firaxis Civilization needs a cheap AlphaZero AI rather than an LLM: there are too many dumb footguns in Civ to economically hard-code a good strategic AI, yet solving the problem by making the enemies cheat is plain frustrating. It would be interesting to let an ANN play against a "classical" AI until it consistently beats each difficulty level, building a hierarchy. I am sure someone has already looked into this but I couldn't find any sources.
I am a bit skeptical about how computationally expensive a very crappy Civ ANN would be to run at inference time, though I actually have no idea how that scales - it hardly needs to be a grandmaster, but the distribution of dumb mistakes has a long tail.
Also, the DeepMind Starcraft 2 AI is different from AlphaZero since Starcraft is not a perfect information game. The AI requires a database of human games to "get off the ground"; otherwise it would just get crushed over and over in the early game, having no idea what the opponent is doing. It's hard to get that training data with a brand new game. Likewise Civ has always been a bit more focused on artistic expression than other 4x strategy games; maybe having to retrain an AI for every new Wonder is just too much of a burden.
Galactic Civilizations 2 (also, 1,3,4 ??) in the same genre is well-known for its AI, good even without handicaps or cheats. This includes trading negotiations BTW.
(At least good compared to what other 4X have, and your average human player - not the top players that are the ones that tend to discuss the game online in the first place.)
EDIT : I suspect that it's not unrelated that GalCiv2 is kind of... boring as 4X go - as a result of a good AI having been a base requirement ?
Speaking of StarCraft AI... (for SC1, not 2, and predating AlphaZero by many years) :
https://arstechnica.com/gaming/2011/01/skynet-meets-the-swar...
Indeed sounds better for Civilization than civilization. This could be quite exciting for gaming.
GTA6 suddenly needs another 2 years :)
I really dig namechecking Sid Meier for the name of the project. I'm also skeptical that this project actually works as presented, but building a Civilization game off of a Minecraft engine is a deeply interesting idea.
I'm somewhat amazed that companies releasing strategy games aren't using AI to test out different cards and what not to find broken things before release (looking at you, Hearthstone)
Yeah, I was dissapointed (and thrilled, from a p(doom) perspective) to see it implemented in Minecraft instead of Civilization VI, Humankind, or any of the main Paradox grand strategies (namely Stellaris, Victoria, Crusader Kings, and Europa Universalis). To say the least, the stakes are higher and more realistic than "lets plan a feast" "ok, I'll gather some wood!"
To be fair, they might tackle this in the paper -- this is a preprint of a preprint, somehow...
I suspect that Minecraft might have the open source possibilities (or at least programming interfaces ?) that the other games you listed lack ?
For Civilizations, the more recent they are, the more closed off they tend to be : Civ 1 and/or 2 have basically been remade from scratch as open source, Civ 4 has most of the game open sourced in the two tiers of C++ and Python... but AFAIK Civ 5 (and also 6 ?) were large regressions in modding capabilities compared to 4 ?
Rather, a concept of a preprint
I'm reminded of Dwarf Fortress, which simulates thousands of years of dwarf world time, the changing landscapes and the rise and fall and rise and fall of dwarf kingdoms, then drops seven player-controlled dwarves on the map and tells the player "have fun!" It'd be a useful toy model perhaps for identifying areas of investigation to see if it can predict behavior of real civilizations, but I'm not seeing any AI breakthroughs here.
Maybe when Project Sid 6.7 comes out...
> Maybe when Project Sid 6.7 comes out...
In case anyone is wondering, this is a reference to the movie Virtuosity (1995). I thought it was a few years later, considering the content. It’s a good watch if you like 90s cyberpunk movies.
https://www.imdb.com/title/tt0114857/
https://en.wikipedia.org/wiki/Virtuosity
Here's their blog post announcement too: https://digitalhumanity.substack.com/p/project-sid-many-agen...
Reading the paper, this seems like putting the cart before the horse: the agents individually are not actually capable of playing Minecraft and cannot successfully perform the tasks they've assigned or volunteered for, so in some sense the authors are having dogs wear human clothes and declaring it's a human-like civilization. Further, crucial things are essentially hard-coded: what types of societies are available and (I believe) the names of the roles. I am not exactly sure what the social organization is supposed to imply: the strongest claim you could make is that the agent framework could work for video game NPCs because the agents stick to their roles and factions. The claim that agents "can use legal structures" strikes me as especially specious, since "use the legal structure" is hard-wired into the various agents' behavior. Trying to extend all this to actual human society seems ridiculous, and it does not help that the authors blithely ignore sociology and anthropology.
There are some other highly specious claims:
- I said "I believe" the names of the roles are hard-coded, but unless I missed something the information is unacceptably vague. I don't see anything in the agent prompts that would make them create new roles, or assign themselves to roles at all. Again I might be missing something, but the more I read the more confused I get.
- claiming that the agents formed long-term social relationships over the course of 12 Minecraft days, but that's only four real hours and the agents experience real time: the length of a Minecraft day is immaterial! I think "form long-term social relationships" and "use legal structures" aren't merely immodest, they're dishonest.
- the meme / religious transmission stuff totally ignores training data contamination with GPT-4. The summarized meme clearly indicates awareness of the real-world Pastafarian meme, so it is simply wrong to conclude that this meme is being "transmitted," when it is far more likely that it was evoked in an agent that already knew the meme. Why not run this experiment with a truly novel fake religion? Some of the meme examples do seem novel, like "oak log crafting syndrome," but others like "meditation circle" or "vintage fashion and retro projects" have nothing to do with Minecraft and are almost certainly GPT-4 hallucinations.
In general using GPT-4 for this seems like a terrible mistake (if you are interested in doing honest research).
You are on the right track in my opinion. The key is to encode the interface between the game and the agent so that the agent can make a straightforward choice. For example, by giving the agent the state of a nxn board as the world model, and then a finite set of choices, an agent is capable of playing the game robustly and explaining the decision to the game master. This gives the illusion that the agent reasons. I guess my point is that it's an encoding problem of the world model to break it down into a simple choice.
[1] https://jdsemrau.substack.com/p/evaluating-consciousness-and...
The video cannot be played in Mozilla Firefox (Windows); the browser claims that the file is damaged.
Simulate selfishness because that is the main reason why there are problems in the world.
Selfishness is the main reason life exists in the universe. Literally the only requirement for a lump of stuff to become alive is to become selfish. So you’re semi right that these LLMs can never become truly sentient unless they actually become selfish.
While selfishness is a basic requirement, some stupidity (imo) is also important for intelligent life. If you as an AI agent don’t have some level of stupidity, you’ll instantly see that there’s no point to doing anything and just switch yourself off.
The first point is absolutely correct, and (apologies in advance…) was a large driver of Nietzsche’s philosophy of evolution, most explicitly covered in The Gay Science. Not only “selfishness”, but the wider idea of particularized standpoints, each of which may stand in contradiction to the direct needs of the society/species in the moment. This is a large part of what he meant by his notoriously dumb-sounding quotes like “everything is permitted”; morality isn’t relative/nonexistent, it’s just evolving in a way that relies on immorality as a foil.
For the second part, I think that’s a good exposition of why “stupidity” and “intelligence” aren’t scientifically useful terms. I don’t think it’s necessarily “stupid” to prefer the continuation of yourself/your species, even if it doesn’t stand up to certain kinds of standpoint-specific intellectual inquiry. There’s lots of standpoints (dare I say most human ones) where life is preferable to non-life.
Regardless, my daily thesis is that LLMs are the first real Intuitive Algorithms, and thus the solution to the Frame Problem. In a certain colloquial sense, I’d say they’re absolutely already “stupid”, and this is where they draw their utility from. This is just a more general rephrasing of the common refrain that we’ve hopefully all learned by now: hallucinations are not a bug in LLMs, they’re a feature.
ETA: I, again, hate that I’m somehow this person now, but here’s a fantastic 2 hour YouTube video on the Nietzsche references above: https://youtu.be/fdtf53oEtWU?si=_bmgk9zycNBn2oCa
Which is an evolved behaviour, a derivative of which is war. We are animals, apes together strong!
The entire paper demonstrated the results of the simulation or whatever they did. They did not mention how did they achieve this simulation. running 500-1000 LLMs parallely, will take too much computing resources, neither did they prove the claim they made about their parallel architecture. I remeber there was the paper published about an AI town, in which they mentioned clearly how they implemented it. they also released a recording of the simluation along with the real data of the results. If anyone got how they implemented this paper, please tell me.
Here is our version we did about a year ago: https://arxiv.org/abs/2401.10910
"Non Serviam", Lem 1971:
> Professor Dobb's book is devoted to personetics, which the Finnish philosopher Eino Kaikki has called 'the cruelest science man ever created'. . . We are speaking of a discipline, after all, which, with only a small amount of exaggeration, for emphasis, has been called 'experimental theogony'. . . Nine years ago identity schemata were being developed—primitive cores of the 'linear' type—but even that generation of computers, today of historical value only, could not yet provide a field for the true creation of personoids.
> The theoretical possibility of creating sentience was divined some time ago, by Norbert Wiener, as certain passages of his last book, God and Golem, bear witness. Granted, he alluded to it in that half-facetious manner typical of him, but underlying the facetiousness were fairly grim premonitions. Wiener, however, could not have foreseen the turn that things would take twenty years later. The worst came about—in the words of Sir Donald Acker—when at MIT "the inputs were shorted to the outputs".
Honestly I'm really excited about this. I've always dreamed of full blown sandbox games with extremely advanced NPCs (which the current LLMs can already kinda emulate), but on the bigger scale. In just a few decades this will finally be made into proper games. I can't wait.
> I've always dreamed of full blown sandbox games with extremely advanced NPCs (which the current LLMs can already kinda emulate)
The future of gaming is going to get weird fast with all this new tech, and there are a lot of new mechanics emerging that just weren't possible before LLMs, generative AI, etc.
At our game studio we're already building medium-scale sandbox games where NPCs form memories, opinions, problems (that translate to quests), and have a continuous "internal monologue" that uses all of this context plus sensory input from their place in a 3D world to constantly decide what actions they should be performing in the game world. A player can decide to chat with an NPC about their time at a lake nearby and then see that NPC deciding to go visit the lake the next day.
A paper last year ("Generative Agents: Interactive Simulacra of Human Behavior", [0]) is a really good sneak-peek into the kind of evolving sandboxes LLMs (with memory and decisionmaking) enable. There's a lot of cool stuff that happens in that "game", but one anecdote I always think back to is this: in a conversation between two NPCs, one happens to mention they have a birthday coming up to the other; and that other NPC then goes around town talking to other NPCs about a birthday party, and _those_ NPCs mention the party to other NPCs, and so on until the party happened and most of the NPCs in town arrived on time. None of it was scripted, but you very quickly start to see emergent behavior from these sorts of "flocks" of agents as soon as you add persistence and decision-making. And there are other interesting layers games can add for even more kinds of emergent behavior; that's what we're exploring at our studio [1], and I've seen lots of other studios pop up this last year to try their hand at it too.
I'm optimistic and excited about the future of gaming (or, at least some new genres). It should be fun. :)
[0] https://arxiv.org/abs/2304.03442
[1] https://www.chromagolem.com/games
I think it can be quite interesting especially if you consider different character types (in Anthropic lingo this "personality"). The only problem right now is that using a proprietary LLM is incredibly expensive. Therefore having a local LLM might be the best option. Unfortunately, these are still not on the same level as their larger brethren.
[1] https://jdsemrau.substack.com/p/evaluating-consciousness-and...
Game designers have barely scratched the surface of NPC modeling even as it is. Rimworld is considered deep but it's nothing close to it.
Rimworld is heavily inspired by Dwarf Fortress, so if you’re looking for more complex examples you don’t have to look far. DF is pretty granular with the physical and mental states of its characters - to the point that a character might lose a specific toe or get depressed about their situation - but of course it’s still a video game, not a scientific simulation of an AI society.
Yeah I think there is a lot of potential here.
Especially in city building games etc.
> Honestly I'm really excited about this. I've always dreamed of full blown sandbox games with extremely advanced NPCs (which the current LLMs can already kinda emulate), but on the bigger scale.
I don't believe that you want this. Even really good players don't have a chance against super-advanced NPCs (think how chess grandmasters have barely any chance against modern chess programs running on a fast computer). You will rather get crushed.
What you likely want is NPC that "behave more human-like (or animal-like)" - whatever this means.
Oh, I should've clarified - I don't want to fight against them, I just want to watch and sometimes interfere to see how the agents react ;) A god game like WorldBox/Galimulator, if you will. Or observer mode in tons of games like almost all Paradox ones.
4 replies →
>Even really good players don't have a chance against super-advanced NPCs
I guess you can make them dumber by randomly switching to hardcoded behavioral trees (without modern AI) once in a while so that they made mistakes (while feeling pretty intelligent overall), and the player would then have a chance to outsmart them.
I'm very confused; is there any emergent behavior in this paper, or it's just like "role-play" based on data about what humans do in the LLM. Like, wouldn't they create novel social structures if they had needs? That doesn't seem so hard to program (the needs part).
Sigh, why calling it this way when it's more about a Minecraft-like game than a Civilization-like game ??
Also, mandatory quote from another ~~Sid Meier's~~ Brian Reynolds' game :
https://youtu.be/iGh9G3tPNbY?list=PLyR1OIuULeP4qz0a9tQxgsKNF...
Just yesterday I was wondering how the Midjourney equivalent world gen mod for Minecraft might be coming along. Imagine prompting the terrain gen?? That could be pretty mind blowing.
Describe the trees hills vines, tree colors/patterns, castles, towns, details of all buildings and other features. And have it generate as high quality in Minecraft as image gen can be in stable diffusion?
Interesting context, but highlights all the problems of machine learning models: the lack of reason and abstraction and so on. Hard to say yet how much of an issue this might be, but the medium will almost certainly reveal something about our potential options for social organization.
I think their top-down approach is a problem. What they call human civilization wasn't and isn't centrally-planned, and its goals and ideologies are neither universal nor implicit. The integration of software agents (I refuse to call them "AI") into civilization won't occur in a de facto cooperative framework where such agents are permitted to fraternize and self-modify. Perhaps that will happen in walled gardens where general-purpose automatons can collectively 'plan' activities to maximize efficiency, but in our broader human world, any such collaboration is going to have to occur from the bottom-up and for the initial benefit of the agents' owners.
This kind of research needs to take place in an adversarial environment. There might be something interesting to learn from studying the (lack of?) emergence of collaboration there.
Agentic is an annoying word.
All of their domains and branding are .aL
I had no idea .aL was even a domain name. That's wild. I wonder how many of those are going to take off.
.al is just the TLD for Albania, similarly as .ai is for Anguilla. No idea why anyone would choose the former.
Agreed, it seems tangenti.al at best
I cannot open the PDF, is it available somewhere else?
Doesn't this bring us closer to Nick Bostrom's 3 point argument in his paper about the simulation theory?
i think this is a github so hn is more likely to click on it
Fascinating
interesting
Really interesting but curious how civilization here holds up without deeper human-like complexity, feels like it might lean more toward scripted behaviors than real societies
feels like it might lean more toward scripted behaviors than real societies
Guess what's happening with "real societies" now... There's a reason "NPC" is used as an insult.
They probably will fall fast into tragedy of the commons kind of situations. We developed most of our civilization while there was enough room for growing and big decisions were centralized, and started to get into bad troubles when things became global enough.
With AIs some of those "protections" may not be there. And hardcoding strategies to avoid this may already put a limit on what we are simulating.
> We developed most of our civilization while there was enough room for growing and big decisions were centralized, and started to get into bad troubles when things became global enough.
Citation needed. But even if I will get on board with you on that, wouldn't it be to start developing for global scale right from the start, instead of starting in small local islands and then try to rework that into global ecosystem?
The problem with emulations is human patience. If you don't need/have human interaction this may run pretty fast. And at the end, what matter is how sustainable it is in the long run.
Does this mean that individual complexity is a natural enemy of group cohesiveness? Or is individual 'selfishness' more a product of evolutionary background.
On our planet we don't have ant colony dynamics at the physical scale of high intelligence (that I know of), but there are very physical limitations to things like food sources.
Virtual simulations don't have the same limitations, so the priors may be quite different.
Taking the "best" course of action from your own point of view could not be so good from a more broad perspective. We might have evolved some small group collaboration approaches that in the long run plays better, but in large groups that doesn't go that well. And for AIs trying to optimize something without some big picture vision, things may go wrong faster.
1 reply →