← Back to context

Comment by canvascritic

15 hours ago

This is a clever hack and a cute abuse of SQL joins to brute-force what’s essentially a 2-ply MDP over a finite space.

The core idea btw of using precomputed transition/score tables to simulate and optimize turn-by-turn play is a classical reinforcement learning method

What would be interesting here is to flip it: train a policy network (maybe tiny, 2-layer MLP) to approximate the SQL policy. then you could distill the SQL brute-force policy into something fast and differentiable.

i’d love to see a variant where the optimizer isn’t just maximizing EV, but is tuned to human psychology. e.g., people like getting Yahtzees more than getting 23 in chance. could add a utility function over scores.

Anyway this is a great repo for students to learn expected value optimization with simple mechanics.