Comment by OsrsNeedsf2P
3 days ago
I love obscure benchmarks, and I feel like I can trust their results a lot more - afterall, they (probably) weren't benchmaxxed. RuneBench[0] is another good example (how well LLMs can play Runescape)
3 days ago
I love obscure benchmarks, and I feel like I can trust their results a lot more - afterall, they (probably) weren't benchmaxxed. RuneBench[0] is another good example (how well LLMs can play Runescape)
No comments yet
Contribute on Hacker News ↗