Comment by canvascritic
15 hours ago
This is a clever hack and a cute abuse of SQL joins to brute-force what’s essentially a 2-ply MDP over a finite space.
The core idea btw of using precomputed transition/score tables to simulate and optimize turn-by-turn play is a classical reinforcement learning method
What would be interesting here is to flip it: train a policy network (maybe tiny, 2-layer MLP) to approximate the SQL policy. then you could distill the SQL brute-force policy into something fast and differentiable.
i’d love to see a variant where the optimizer isn’t just maximizing EV, but is tuned to human psychology. e.g., people like getting Yahtzees more than getting 23 in chance. could add a utility function over scores.
Anyway this is a great repo for students to learn expected value optimization with simple mechanics.
No comments yet
Contribute on Hacker News ↗