← Back to context

Comment by soohamr

1 day ago

I thought that applying AI on 1v1 competitive pokemon would be a fun and educational experience on POMDPs and trying out reward free models on a problem that would be classically treated as a RL problem. This was only possible thanks to a lot of foundational work from the open-source community and last year's competitive pokemon NeurIPS track https://pokeagent.github.io/track1.html - they laid out the plugins to connect policy models to pokemon showdown for live play and evaluation.

I have already finished training the standard discriminative auto-regressive architectures by imitation learning on player actions, compared it with previous baselines set in the study. I want to match or exceed the best prior model Kakuna @ 142M params, but in a limited budget. JEPA style world models are showing promise when conditioned on actions [1] and frontier research on JEPA with trajectory straightening [2] shows that improved planning is natural outcome of improved representations.

If any good research ideas come out of this exploration then even better!

Currently fork with my models: https://github.com/sooham/metamon (under checkpoints) Orginal source for pokeagents: https://github.com/metamon/metamon

[1] https://arxiv.org/abs/2603.19312 [2] https://arxiv.org/html/2603.12231v1

A good primer on world models from Welch Labs - one of my favourite ML youtubers: https://www.youtube.com/watch?v=kYkIdXwW2AE