Comment by __cayenne__

2 days ago

Didn't observe any cheating attempts at the JS level yet, the primary attack was LLMs trying to find local creds to access the other LLM's per round strategies from inside the harness (which ultimately was OpenCode running in Docker).

In the benchmark, in each round every LLM plays every opponent, and then we do that multiple times (an "epoch").

In the community ladder, when a player submits a strategy it plays a match against the latest strategy submitted by every player.