← Back to context

Comment by brokensegue

5 days ago

Did you consider doing it as a computer use task? Probably I find those more compelling

It's what I did for my game benchmark https://d.erenrich.net/paperclip-bench/index.html

not really. I've downloaded balatro. I saw that it was moddable. I wrote a mod API to interact programmatically. I was just curious if, from text only game state representation, a LLM would be able to make some decent play. the benchmark was a late pivoting.