Comment by JohnLeitch

1 month ago

The problem is hallucinations. It's incredibly frustrating to have an LLM describe an API or piece of functionality that fulfills all requirements perfectly, only to find it was a hallucination. They are impressive sometimes though. Recently I had an issue with a regression in some of our test capabilities after a pivot to Microsoft Orleans. After trying everything I could think of, I asked Sonnet 4.5, and it came up with a solution to a problem I could not even find described on the internet, let alone solved. That was quite impressive, but I almost gave up on it because it hallucinated wildly before and after the workable solution.

The same stuff happens when summarizing documentation. In that regard, I would say that, at best, modern LLMs are only good for finding an entrypoint into the docs.

1 comment

JohnLeitch

MrDarcy 1 month ago

While my reply was snarky I am prepared to take a reasonable bet with a reasonable test case. And pay out.

Why I think I’d win the bet is I’m proficient with tcpdump and wireshark and I’m reasonably confident that running to a frontier model and dealing with any hallucinations is more efficient and faster than recalling the incantantions and parsing the output myself.