Comment by tl

6 days ago

Per BalatroBench, gemini-3-pro-preview makes it to round (not ante) 19.3 ± 6.8 on the lowest difficulty on the deck aimed at new players. Round 24 is ante 8's final round. Per BalatroBench, this includes giving the LLM a strategy guide, which first-time players do not have. Gemini isn't even emitting legal moves 100% of the time.

It beats ante eight 9 times out of 15 attempts. I do consider 60% winning chance very good for a first time player.

The average is only 19.3 rounds because there is a bugged run where Gemini beats round 6 but the game bugs out when it attempts to sell Invisible Joker (a valid move)[0]. That being said, Gemini made a big mistake in round 6 that would have costed it the run at higher difficulty.

[0]: given the existence of bugs like this, perhaps all the LLMs' performances are underestimated.

  • Are there benchmarks if we allow the LLM to practice and study the game?

    • You can make one, the balatro bench is open source. But I'm quite sure it'd be crazily expensive for a hobby project. At the end of the day, LLM can't actually 'practice and learn.'

      1 reply →

  • Why not include a description of the bugs to avoid in the strategy guide?