Comment by yosefk
5 months ago
The examples are from the latest versions of ChatGPT, Claude, Grok, and Google AI Overview. I did not bother to list the full conversations because (A) LLMs are very verbose and (B) nothing ever reproduces, so in any case any failure is "abnormally bad." I guess dismissing failures and focusing on successes is a natural continuation of our industry's trend to ship software with bugs which allegedly don't matter because they're rare, except with "AI" the MTBF is orders of magnitude shorter
No comments yet
Contribute on Hacker News ↗