Comment by NitpickLawyer
9 days ago
The reported tables also don't match the screenshots. And their baselines and tests are too close to tell (judging by the screenshots not tables). 29/33 baseline, 31/33 skills, 32/33 skills + use skill prompt, 33/33 agent.md
No comments yet
Contribute on Hacker News ↗