Comment by XCSme
6 days ago
I just started creating my own benchmarks (very simple questions for humans but tricky for AI, like how many r's in strawberry kind of questions, still WIP).
Qwen3.5 is doing ok on my limited tests: https://aibenchy.com
6 days ago
I just started creating my own benchmarks (very simple questions for humans but tricky for AI, like how many r's in strawberry kind of questions, still WIP).
Qwen3.5 is doing ok on my limited tests: https://aibenchy.com
No comments yet
Contribute on Hacker News ↗