Comment by NitpickLawyer
18 hours ago
The reported tables also don't match the screenshots. And their baselines and tests are too close to tell (judging by the screenshots not tables). 29/33 baseline, 31/33 skills, 32/33 skills + use skill prompt, 33/33 agent.md
No comments yet
Contribute on Hacker News ↗