Comment by vunderba
14 days ago
Interesting but frustratingly vague on details. How exactly are the models playing? Is it using some kind of PGN equivalent in Tetris that represents a on-going game, passing an ASCII representation, encoding as a JSON structure, or just directly sending screenshots of the game to the various LLMs?
It has to be turn-based. Even with Flash's speed, the inference latency would kill you in a real-time loop. They're likely pausing the game state after every tick to wait for the API response before resuming.
answered this in a comment above! It's not turn or visual layout based since LLMs are not trained that way. The representation is a JSON structure, but LLMs plug in algorithms and keeps optimizing it as the game state evolves
Thanks for the clarification! Kind of reminds me of the Brian Moore's AI clocks which uses several LLMs to continuously generate HTML/CSS to create an analog clock for comparisons.
https://clocks.brianmoore.com
Wow this is incredible!!
Curious how the token economics compare here to a standard agent loop. It seems like if you're using the LLM as a JIT to optimize the algorithm as the game evolves, the context accumulation would get expensive fast even with Flash pricing.
I suppose you could argue about whether it's an LLM at that point but vision is a huge part of frontier models now, no?