Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library
← Back to context

Comment by gertlabs

2 hours ago

If you are referring to the parent post, yes, hard to draw conclusions from such a small sample size.

For our testing, we use hundreds of different environments across disciplines, and it seems to line up with subjective experience better than other benchmarks. We test coding, agentic coding, and non-coding reasoning in the environments.

0 comments

gertlabs

Reply

No comments yet

Contribute on Hacker News ↗

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities