Comment by andrepd

20 days ago

This chatbot has several C compilers in its training data. How is this possibly a useful benchmark for anything? LLMs routinely output code verbatim or modulo trivial changes as their own (very useful for license-laundering too).

0 comments