Comment by djfergus 5 days ago We need a benchmark that tests a models ability to do LLM research. 0 comments djfergus Reply No comments yet Contribute on Hacker News ↗
No comments yet
Contribute on Hacker News ↗