Comment by operatingthetan
14 hours ago
Probably a more interesting benchmark is one that is scored based on the LLM finding exploits in the benchmark.
14 hours ago
Probably a more interesting benchmark is one that is scored based on the LLM finding exploits in the benchmark.
No comments yet
Contribute on Hacker News ↗