Comment by SomaticPirate
9 hours ago
This seems to be testing the models on leetcode style prompts that also require the model to implement TCP calls to send the results. Interesting but probably not a apples to apples comparison. The fact only Grok qualified for the first one seems suspect
No comments yet
Contribute on Hacker News ↗