Comment by lupusreal
7 days ago
What you propose isn't a meaningful benchmark because these models already excell at spinning bullshit, they'd ace the benchmark every time.
7 days ago
What you propose isn't a meaningful benchmark because these models already excell at spinning bullshit, they'd ace the benchmark every time.
No comments yet
Contribute on Hacker News ↗