The issue is that you wouldn't be able to even transparently get to any evidence, as these models are blackboxes.
They might start scheming behind employees backs as soon as they realize they are being used in critical infrastructure of adversaries. And nobody would know until it's too late.
Based on what? Is there any evidence of risk at all?
The issue is that you wouldn't be able to even transparently get to any evidence, as these models are blackboxes.
They might start scheming behind employees backs as soon as they realize they are being used in critical infrastructure of adversaries. And nobody would know until it's too late.
Aren't all LLMs just as blackboxey?
1 reply →
are you born yesterday?
[dead]