← Back to context

Comment by hatthew

6 days ago

A human might see a gap in guardrails and avoid it, or upon seeing unexpected behavior they might be able to tell that a guardrail was breached and have some intuition of where. An LLM will happy burst through a gap in the guardrails, claim it has solved the problem, and require just as much human effort to fix and even more long-term maintenance because of less code familiarity.

An LLM, which has deep advantages over humans in rote pattern recognition, will see gaps in guardrails that humans miss. In the end, it's a wash. What's not a wash is all the rest of the instrumentation and formal methods stuff an LLM agent enables, all of which is available to every dev team today but doesn't get used because type safety, semgrep rules, advanced linters, and TLA+ specs are too time-consuming for humans to deploy. Not a problem for an LLM agent.