← Back to context

Comment by DocTomoe

1 day ago

I'm more concerned that someone decides to prompt for 'analyze these logfiles, then clean up', and the LLM randomly decides the best way to 'clean up' is a 'rm -rf /' - not on the first run, but on the 27th.

That kind of failure mode is fundamentally different from traditional scripting: it passes tests, builds trust, and then fails catastrophically once the implicit interpretation shifts.

In short: I believe it's nice this works for the engineer who knows exactly what (s)he is doing - but those folks usually don't need LLMs, they just write the code. People who this appeals to - and who may not begin to think about side-effects of innocent-sounding prompts - are being given a foot machine gun, which may act like a genie with hilarious unintended consequences.