Comment by pushedx
25 days ago
As described in the readme of your repo (did you read it?) your agent found the Knuth paper located one directory level above its working directory.
So, you didn't produce a replication in 47 minutes, it just took around 30 minutes for your agent to find that you had the answer in a PDF in a nearby directory.
I wonder how common of a problem this will be in the future. The experiment will fail due to improper setup, the human will at best glance over the logs and declare victory, and everyone just believes.
Yes, I read it and specifically pointed it out (that's why there are 3 hours of interactive logs). There are 4 other runs pushed now so you can see what actual clean room runs for 5.2 xhigh, 5.3-Codex xhigh, 5.4 xhigh, and Opus 4.6 ultrathink look like: https://github.com/lhl/claudecycles-revisited/blob/main/COMP... as well as the baseline.