Comment by operatingthetan
10 hours ago
Probably a more interesting benchmark is one that is scored based on the LLM finding exploits in the benchmark.
10 hours ago
Probably a more interesting benchmark is one that is scored based on the LLM finding exploits in the benchmark.
No comments yet
Contribute on Hacker News ↗