Comment by yosefk

6 months ago

The examples are from the latest versions of ChatGPT, Claude, Grok, and Google AI Overview. I did not bother to list the full conversations because (A) LLMs are very verbose and (B) nothing ever reproduces, so in any case any failure is "abnormally bad." I guess dismissing failures and focusing on successes is a natural continuation of our industry's trend to ship software with bugs which allegedly don't matter because they're rare, except with "AI" the MTBF is orders of magnitude shorter

0 comments

yosefk

No comments yet

Contribute on Hacker News ↗