Comment by swiftcoder

12 hours ago

> Isn’t “bad output” already worst case?

Worst case in a modern agentic scenario is more like "drained your bank account to buy bitcoin and then deleted your harddrive along with the private key"

> Pre-LLMs correct output was table stakes

We're only just getting to the point where we have languages and tooling that can reliably prevent segfaults. Correctness isn't even on the table, outside of a few (mostly academic) contexts

2 comments

swiftcoder

simonw 5 hours ago

> Worst case in a modern agentic scenario is more like "drained your bank account to buy bitcoin and then deleted your harddrive along with the private key"

Hence my interest in sandboxes!

MeetingsBrowser 8 hours ago

> drained your bank account to buy bitcoin and then deleted your harddrive

These are what I meant by correct output. The software does what you expect it to.

> We're only just getting to the point where we have languages and tooling that can reliably prevent segfaults

This is not really an output issue IMO. This is a failing edge case.

LLMs are moving the industry away from trying to write software that handles all possible edge cases gracefully and towards software developed very quickly that behaves correctly on the happy paths more often than not.