Comment by ben_w

7 months ago

> Since the risks are blindingly obvious

Blindingly obvious to thee and me.

Without test results like in the o1 report, we get more real-life failures like this Canadian lawyer: https://www.theguardian.com/world/2024/feb/29/canada-lawyer-...

And these New York lawyers: https://www.reuters.com/legal/new-york-lawyers-sanctioned-us...

And those happened despite the GPT-4 report and the message appearing when you use it that was some variant — I forget exactly how it was initially phrased and presented — of "this may make stuff up".

I have no doubt there's similar issues with people actually running buggy code, some fully automated version of "rm -rf /", the only reason I'm not seeing headlines about it is that "production database goes offline" or "small company fined for GDPR violation" is not as newsworthy.