Comment by torginus
1 day ago
My two cents is LLMs are way stronger in areas where the reward function is well known, such as exploiting - you break the security, you succeed.
It's much harder to establish whats a usable and well architected, novel piece of software, thus in that area, progress isn't nearly as fast, while here you can just gradient descent your way to world domination, provided you have enough GPUs.
offense has a clear reward function, but so does detection when you frame it right. "did this process try to read ~/.ssh/id_rsa?" is just as binary as "did the exploit land?" the reason defense feels harder is that people frame it as architecture review (fuzzy, subjective) instead of policy enforcement (binary, automatable). we keep trying to make AI understand intent when we should be writing rules about actions. a confused deputy from 1988 doesn't care why the request came in, it cares whether the caller is authorized. same principle applies here.
Construction is always more expensive than destruction