← Back to context

Comment by aurareturn

10 days ago

  Prompting an LLM to generate and run a game like this gave immediate impressive results, 10 mins after starting we had something that looked great. The problem was that the game sucked. It always went 3-4 rounds of input regardless. It constantly gave the game away because it had all the knowledge in the context, and it just didn't have the right flow at all.

I'm confused. Did you ask the LLM to write the game in code? Or did the LLM run the entire game via inference?

Why do you expect that the LLM can generate the entire game with a few prompts and work exactly the way you want it? Did your prompt specify the exact conditions for the game?

> Or did the LLM run the entire game via inference?

This, this was our 10 minute prototype, with a prompt along the lines of "You're running a CYOA game about this scenario...".

> Why do you expect that the LLM can generate the entire game with a few prompts

I did not expect it to work, and indeed it didn't, however why it didn't work wasn't obvious to the whole group, and much of the iteration process in the hackathon was breaking things down into smaller components so that we could retain more control over the gameplay.

One surprising thing I hinted at there was using RAG not for its ability to expose more info to the model than can fit in context, but rather for its ability to hide info from the model until its "discovered" in some way. I hadn't considered that before and it was fun to figure out.

  • > using RAG not for its ability to expose more info to the model than can fit in context, but rather for its ability to hide info from the model until its "discovered" in some way

    Would you be willing to expand on this?

    • Yeah sure. The problem we had was that we had some "facts" to base the game on, but when the LLM generated multiple choice choose-you-own-adventure style options, they would end up being leading questions towards the facts, i.e. the LLM knows what's behind the door, so an option might have been "check for the thing behind the door", and the user now knows it's there because why else would it have asked.

      Instead we put all the facts in a RAG database. Now when we ask the LLM to generate options it does so not knowing the actual answer, so they can't really be leading questions. We then take the user input, use RAG to get relevant facts, and then "reveal" those facts to the LLM in subsequent prompts.

      Honestly we still didn't nail gameplay or anything, it was pretty janky but it was 2 days, a bunch of learning, and probably only 300 lines of Python in the end, so I don't want to overstate what we did. However this one detail was one that stuck with me.

      1 reply →

    • LLMs work much better on narrow tasks. They get more lost the more information you introduce. Models are introducing reasoning now which is trying to assert this problem and some models are getting really good at it like o3 or reasoner.com. I have access to both and it looks like, soon, we will have models that become more accurate when we introduce more complexities, which will be a huge breakthrough in AI.