Comment by tl
6 days ago
Per BalatroBench, gemini-3-pro-preview makes it to round (not ante) 19.3 ± 6.8 on the lowest difficulty on the deck aimed at new players. Round 24 is ante 8's final round. Per BalatroBench, this includes giving the LLM a strategy guide, which first-time players do not have. Gemini isn't even emitting legal moves 100% of the time.
It beats ante eight 9 times out of 15 attempts. I do consider 60% winning chance very good for a first time player.
The average is only 19.3 rounds because there is a bugged run where Gemini beats round 6 but the game bugs out when it attempts to sell Invisible Joker (a valid move)[0]. That being said, Gemini made a big mistake in round 6 that would have costed it the run at higher difficulty.
[0]: given the existence of bugs like this, perhaps all the LLMs' performances are underestimated.
Are there benchmarks if we allow the LLM to practice and study the game?
You can make one, the balatro bench is open source. But I'm quite sure it'd be crazily expensive for a hobby project. At the end of the day, LLM can't actually 'practice and learn.'
1 reply →
Why not include a description of the bugs to avoid in the strategy guide?
https://balatrobench.com/