← Back to context

Comment by cshimmin

10 hours ago

It's on the page, if you click the little info icon in the upper-right. Here's the text but there's some nice graphics there too:

  Snake Game, training entirely in the browser. Built on tinygrad: the rollout / targets / train graphs are TinyJits authored in Python, then compiled once to WGSL and replayed here under WebGPU.

  Observation: flat 10×10 board (100) + 4-dim prev-action one-hot = 104 dims. fc_pi.weight is zero-init so the opening policy is uniform over the legal actions; fc_v uses tinygrad's default Kaiming init.

  Per rollout: T=24 × N=384 parallel snakes (9,216 transitions), then K=3 epochs × 4 mini-batches of PPO updates. GAE γ=0.99, λ=0.95; AdamW wd=0.01; ratio clip ε=0.1; grad-norm 0.5; Huber value β=1, val_coef=1; entropy bonus 0.008333333333333333.

  Action mask + value clip + KL early stop. The 4-dim prev_a obs tail lets fc_pi zero the U-turn logit (the env silently overrides same-axis reversals anyway). Value loss is max(huber(v_new−td), huber(v_clip−td)) at ε=0.2. Approx-KL is sampled after each epoch and breaks the loop at 1.5·kl_target.