Comment by bob1029

3 months ago

I feel like there is some kind of information theory constraint which confounds our ability to extract higher order behavior from multiple instances of the same LLM.

I spent quite a bit of time building a multi agent simulation last year and wound up at the same conclusion every day - this is all just a roundabout form of prompt engineering. Perhaps it is useful as a mental model, but you can flatten the whole thing to a few SQL tables and functions. Each "agent" is essentially a sql view that maps a string template forming the prompt.

I don't think you need an actual 3D world, wall clock, etc. The LLM does not seem to be meaningfully enriched by having a fancy representation underly the prompt generation process. There is clearly no "inner world" in these LLMs, so trying to entertain them with a rich outer environment seems pointless.

TBH I haven't seen a single use of LLMs in games that wasn't better served by traditional algorithms beyond less repetitive NPC interactions. Maybe once they get good enough to create usable rigged and textured meshes with enough control to work in-game? They can't create a story on the fly that's reliable enough to be a compelling accompaniment to a coherent game plot. Maps and such don't seem to need anything beyond what current procedural algorithms provide, and they're still working with premade assets— the implementations I've seen can't even reliably place static meshes on the ground in believable positions. And as far as NPCs go— how far does that actually go? It's pure novelty worth far less than an hour of time. Let's even say you get a guided plot progression worded on the fly using an LLM, is that even as good, let alone better, than a dialog tree put together by a professional writer?

This Civ idea at least seems like a new approach to some extent, but it still seems to conceptually not add much. Even if not, learning that it doesn't it's still worthwhile. But almost universally these ideas seem to be either buzzwordy solutions in search of problems, or a cheaper-than-people source of creativity with some serious quality tradeoffs and still require far too much developer wrangling to actually save money.

I'm a tech artist so I'm a bit biased towards the value of human creativity, but also likely the primary demographic for LLM tools in game dev. I am, so far, not compelled.

  • It's been posted in-depth a few times across this forum to varying degrees by game developers - I was initially very excited about the implementation of LLM's in NPC interactions, until I read some of these posts. The gist of it was - the thing that makes a game fundamentally a game is its constraints. LLM-based NPC's fundamentally break these constraints in a way that is not testable or predictable by the developer and will inevitably destroy the gameplay experience (at least with current technology).

    • Yeah, same. Epic's Matrix demo implemented it and even without a plot, the interactions were so heavily guided that the distinction was pointless. So you can find out what that NPCs spous's name is and their favorite color. It's that neat? Sure it's neat. It's it going to make it a better game? Probably less than hiring another good writer to make NPC dialog. To be truly useful, I think they would have to be able to affect the world in meaningful ways that worked with the game plot, and again, when you clamp that down as much as you'd need to to still have a plot, you're looking at a fancy decision tree.

  • Nobody will know for sure until a big budget game is actually released with a serious effort behind its NPCs.

    • I can't see anything that Gen AI NPCs would add unless maybe you're talking about a Sims kind of game where the interactions are the point, and they don't have to adhere to a defined progression. Other than that, it's a chat bot. We already have chatbots and having them in the context of a video game doesn't seem like it would add anything revolutionary to that product. And would that fundamentally stand a chance of being as compelling to socially-focused role-playing gamers as online games?

      This is my field so I'm always looking for the angle that new tech will take. I still rank this lower than VR— with all of its problems— for potential to significantly change player interactions. Tooling to make games is a different story, but for actual use in games? I don't see it yet.

      11 replies →

You've absolutely nailed it here, I agree. To make any progress at all at the tremendously difficult problem they are trying to solve, they need to be frank about just how far away they are from what it is they are marketing.

I am whole-heartedly in support of commercial interests to drum of awareness and engagement by the authors. This is definitely a cool thing to be working on, however, what does make more sense is to frame the situation more honestly and attract folks to the desire of solving tremendously hard problems based on a level of expertise and awareness that truly moves the ball forward.

What would be far more interesting would be for the folks involved to say all the ten thousand things that went wrong in their experiments and to lay out the common-sense conclusions from those findings (just like the one you shared, which is truly insightful and correct).

We need to move past this industry and their enablers that continually try to win using the wrong methodology -- pushing away the most inventive and innovative people that are ripe and ready to make paradigm shifts in the AI field and industry.

  • It would however be very interesting to see these kinds of agents in a commercial video game. Yes they are shallow in their perception of the game world. But they’re a big step up from the status quo.

    • Yes... Imagine a blog post at the same quality as this paper that framed their work and their pursuits in a way that genuinely got people excited about what could be around the corner, but with the context that frames exactly how far away they are from achieving what would be the ultimate vision.

      1 reply →

> I don't think you need an actual 3D world, wall clock, etc. The LLM does not seem to be meaningfully enriched by having a fancy representation underly the prompt generation process.

I don't know how you expect agents to self organize social structures if they don't have a shared reality. I mean, you could write all the prompts yourself, but then that shared reality is just your imagination and you're just DMing for them.

The point of the minecraft environment isn't to "enrich" the "inner world" of the agents and the goal isn't to "entertain" them. The point is to create a set of human understandable challenges in a shared environment so that we can measure behavior and performance of groups of agents in different configurations.

I know we aren't supposed to bring this up, but did you read the article? Nothing of your comment addresses any of the findings or techniques used in this study.

I wrote and played with a fairly simple agentic system and had some of the same thoughts RE higher order behaviour. But I think the counter-points would be that they don't have to all be the same model, and what you might call context management - keeping each agent's "chain of thought" focused and narrow.

The former is basically what MoE is all about, and I've found that at least with smaller models they perform much better with a restricted scope and limited context. If the end result of that is something that do things a single large model can't, isn't that higher order?

You're right that there's no "inner world" but then maybe that's the benefit of giving them one. In the same way that providing a code-running tool to an LLM can allow it to write better code (by trying it out) I can imagine a 3D world being a playground for LLMs to figure out real-world problems in a way they couldn't otherwise. If they did that wouldn't it be higher order?

>I feel like there is some kind of information theory constraint which confounds our ability to extract higher order behavior from multiple instances of the same LLM.

It's a matter of entropy; producing new behaviours requires exploration on the part of the models, which requires some randomness. LLMs have only a minimal amount of entropy introduced, via temperature in the sampler.

  • As I've pointed out in the past, I also think it's fair to say that we overestimate human variability, and that most human behaviours and language coalesces for the most part.

    Also the creative industry, a talking point being that "AIs just rehash existing stuff, they don't produce anything new". Neither do most artists, everything we make is almost always some riff on prior art or nature. Elves are just humans with pointy ears. Goblins are just small elves with green skin. Dwarves are just short humans. Dragons are just big lizards. Aliens are just humans with an odd shaped head and body.

    I don't think people realise how very rare it is that any human being experiences or creates something truly novel and not yet experienced or created by our species yet. Most of reality is derivative.

Maybe we need gazelles and cheetahs - many gazelle-agents getting chased towards a goal, doing the brute force work- and the constraint cheetahs chase them, evaluate them and leave them alive (memory intact) as long as they come up with better and better solutions. Basically a evolutionary algo, running on top of many agents, running simultaneously on the same hardware?

  • Do you want stressed and panicking agents? Do you think they'll produce good output?

    In my prompting experience, I mostly do my best to give the AI way, way more slack than it thinks it has.

    • No, i want the hunters to zap the prey with tiredness. Basically electron holes, hunting for free electrons, annhilating state. Neurons have something similar, were they usually prevent endless excitement and hyperfixation, which is why a coder in flow is such a strange thing.

  • I had the opposite thought. Opposite to evolution...

    What if we are a CREATED (i.e. instant created, not evolved) set of humans, and evolution and other backstories have been added so that the story of our history is more believable?

    Could it be that humanity represents a de novo (Latin for "anew") creation, bypassing the evolutionary process? Perhaps our perception of a gradual ascent from primitive origins is a carefully constructed narrative designed to enhance the credibility of our existence within a larger framework.

    What if we are like the Minecraft people in this simulation?

    • I feel that is too complicated. The most simplest explanation is usually the right one. I think we live on an earth with actual history. Note that this does not necessarily mean that we are not living in a simulation, as history itself can be simulated.

      If we are indeed in a simulation, I feel there are too many details to be "designed" by a being. There are too many facts that are connected and unless they fix the "bugs" as they appear and reboot the simulation constantly, I don't think it is designed. Otherwise we would have noticed the glitches by now.

      If we are in a simulation, it has probably been generated by a computer following a set of rules. Maybe it ran a simplified version to evolve millions of possible earths, and then we are living in the version they selected for the final simulation? In that case all the facts would align and it could potentially be harder to noticed the glitches.

      I don't think we are living in a simulation because bugs are hard to avoid, even with close to "infinite" computing power. With great power comes great possibilities for bugs

      Perhaps we are in fact living in one of the simplified simulations and will be turned off at any second after I have finished this senten

    • We also can't rule out that Gaia or Odin made the world five minutes ago, and went to great lengths to make the world appear ancient.

      It certainly makes sense if you assume that the world is a simulation. But does it actually explain anything that isn't equally well explained by assuming the simulation simulated the last 13 billion years, and evolution really happened?

      2 replies →

  • This only works (genetic algo) if you have some random variability in the population. For different models it would work but I feel like it's kind of pointless without the usual feedback mechanism (positive traits are passed on).

That depends on giving them a goal/reward like increasing "data quality".

I mean frogs don't use their brains much either inspite of the rich world around them they don't really explore.

But chimps do. They can't sit quiet in a tree forever and that boils down to their Reward/Motivation Circuitry. They get pleasure out of explore. And if they didn't we wouldn't be here.