Comment by NitpickLawyer
12 hours ago
The reported tables also don't match the screenshots. And their baselines and tests are too close to tell (judging by the screenshots not tables). 29/33 baseline, 31/33 skills, 32/33 skills + use skill prompt, 33/33 agent.md
No comments yet
Contribute on Hacker News ↗