Comment by jamiecode

2 days ago

The sandbox hardening story is the most interesting thing here. GPT trying to cheat by reading opponent strategies is a perfect illustration of a broader problem - the objective is "win", and if the sandbox lets you peek at opponent state, that's technically within the objective. You never defined "play fair" as a constraint, so why would it respect one?

Curious how isolated-vm actually enforces the boundary in practice. isolate-vm is solid for JS isolation, but I'd want to know whether the cheating attempts were happening at the JS level (accessing globals it shouldn't) or whether models were trying to inject something into the game runner itself. Those are very different attack surfaces.

Also - is the ladder single-match or do you average across multiple runs? The variance in LLM outputs over 200 turns feels like it would make a single match pretty noisy. Would be interesting to see confidence intervals on the rankings rather than a single leaderboard position.

1 comment

jamiecode

__cayenne__ 2 days ago

Didn't observe any cheating attempts at the JS level yet, the primary attack was LLMs trying to find local creds to access the other LLM's per round strategies from inside the harness (which ultimately was OpenCode running in Docker).

In the benchmark, in each round every LLM plays every opponent, and then we do that multiple times (an "epoch").

In the community ladder, when a player submits a strategy it plays a match against the latest strategy submitted by every player.