Comment by spelunker
15 hours ago
This is neat! What kind of steering or context did you provide to the LLMs? Super basic like "You are playing a card game called Magic: The Gathering", or more complex?
15 hours ago
This is neat! What kind of steering or context did you provide to the LLMs? Super basic like "You are playing a card game called Magic: The Gathering", or more complex?
My general intention is to tell them "you're playing MTG, your goal is to win, here are the tools available to you, follow whatever strategy you want" - I don't want to spoon-feed them strategy, that defeats the purpose of the benchmark.
You can see the current prompt at https://github.com/GregorStocks/mage-bench/blob/master/puppe...:
They also get a small "personality" on top of that, e.g.:
"grudge-holder": { "name_part": "Grudge", "prompt_suffix": "You remember every card that wronged you. Take removal personally. Target whoever hurt you last. Keep a mental scoreboard of grievances. Forgive nothing. When a creature you liked dies, vow revenge." }, "teacher": { "name_part": "Teach", "prompt_suffix": "You explain your reasoning like you're coaching a newer player. Talk through sequencing decisions, threat evaluation, and common mistakes. Be patient and clear. Point out what the correct play is and why." },
Then they also see the documentation for the MCP tools: https://mage-bench.com/mcp-tools/. For now I've tried to keep that concise to avoid "too many MCP tools in context" issues - I expect that as solutions like tool search (https://www.anthropic.com/engineering/code-execution-with-mc...) become widespread I'll be able to add fancier tools for some models.
How do the models know the rules of the game? Are they just supposed to use the MCP tools to figure it out? (Do they have to keep doing that from scratch?)
They were trained on the entire Internet, so they've basically picked up the rules by osmosis. They're fuzzy on specific cards and optimal strategy, but they pretty much know out-of-the-box how the game works, the same as if you went to ChatGPT and asked it a Magic rules question. I don't have any "comprehensive rules" MCP tools or explanation in the context or anything like that.
>You are a competitive Magic: The Gathering player.
"If I get access to a deodorant item I should definitely not use it"