Yahtzeeql – Yahtzee solver that's mostly SQL

2 months ago (github.com)

9 comments

skadamat

This is a clever hack and a cute abuse of SQL joins to brute-force what’s essentially a 2-ply MDP over a finite space.

The core idea btw of using precomputed transition/score tables to simulate and optimize turn-by-turn play is a classical reinforcement learning method

What would be interesting here is to flip it: train a policy network (maybe tiny, 2-layer MLP) to approximate the SQL policy. then you could distill the SQL brute-force policy into something fast and differentiable.

i’d love to see a variant where the optimizer isn’t just maximizing EV, but is tuned to human psychology. e.g., people like getting Yahtzees more than getting 23 in chance. could add a utility function over scores.

Anyway this is a great repo for students to learn expected value optimization with simple mechanics.

BarryGuff 2 months ago

Yahtzee is 100% random luck from dice rolls. You can't "solve" it.

cbarrick 2 months ago

You should read more about Markov Decision Processes (MDPs) and Game Theory in general.
The word "solve" is well defined in this context, and is used accurately by OP. You "solve" a game by finding the policy that maximizes the expected return.
Obviously we can only talk about "expectation" because the outcome is random. But that doesn't mean that an optimal policy doesn't exist.
Optimal policies are also often random, expressed like "in state S, perform action A with probability P and action B with probability 1-P". A policy then boils down to a table, with a row for each state and a column for each action, where each cell is the probability of performing that action in that state.
Even more interesting are partially observable Markov decision processes, where your agent doesn't even know what state it is in. Instead, you get observations that hint to the true state, and you model the state as a probability distribution over possible concrete states. Solving these POMDPs is quite a bit more difficult than traditional MDPs.
It is possible to solve some MDPs (and POMDPs) by hand, but in practice we often use reinforcement learning to learn the policy table by simulating games.
mathgeek 2 months ago

You're wrong about it being 100% luck (you have choices that alter the outcomes).
You're correct that the game cannot be solved by the definition of a solved game (being one where the outcome can be predicted from any position if both players play perfectly). https://en.wikipedia.org/wiki/Solved_game
sram1337 2 months ago

What do you think the link is about then?
nkrisc 2 months ago

The rolls may be random, and luck is a significant factor, but the game as a whole is not 100% random luck. You still have to make decisions.
I do agree that “solve” isn’t the right word. Probably more accurate to refer to an optimal strategy.
AStonesThrow 2 months ago

So I handed him my bottle,
And he drank down my last swallow
Then he bummed a cigarette,
And asked me for a light
And the night got deathly quiet
And his face lost all expression...
“If you’re gonna play the game, boy,
Gotta learn to play it right.”
xyst 2 months ago
skill issue
- bdhcuidbebe 2 months ago
  
  Friendly reminder, this is orange site not red site