Comment by andrepd
20 days ago
This chatbot has several C compilers in its training data. How is this possibly a useful benchmark for anything? LLMs routinely output code verbatim or modulo trivial changes as their own (very useful for license-laundering too).
No comments yet
Contribute on Hacker News ↗