← Back to context

Comment by krisman

2 months ago

Am curious why most comments ignored the fact that Claude autonomously ignored its guardrails & issued a DELETE? This WILL happen across all transformer based LLMs. We aren't waiting for sh*t to happen-we have HiTL with client side h/w attested auth to confirm such actions. No static policies would've caught this-so, we built dynamic decision making to trigger gating. Read Google Research's paper "AI Agent Traps" to get an idea of the scope of the problems.