Comment by pdimitar

4 hours ago

Eh, tearing down a straw man is not an impressive argument from you either.

As a counter-point, LLMs still do embarrassing amounts of hallucinations, some of which are quite hilarious. When that is gone and it starts doing web searches -- or it has any mechanisms that mimic actual research when it does not know something -- then the agents will be much closer to whatever most people imagine AGI to be.

Have LLMs learned to say "I don't know" yet?

3 comments

pdimitar

flail 2 hours ago

> Have LLMs learned to say "I don't know" yet?

Can they, fundamentally, do that? That is, given the current technology.

Architecturally, they don't have a concept of "not knowing." They can say "I don't know," but it simply means that it was the most likely answer based on the training data.

A perfect example: an LLM citing chess rules and still making an illegal move: https://garymarcus.substack.com/p/generative-ais-crippling-a...

Heck, it can even say the move would have been illegal. And it would still make it.

in-silico 3 hours ago

> When that is gone and it starts doing web searches -- or it has any mechanisms that mimic actual research when it does not know something

ChatGPT and Gemini (and maybe others) can already perform and cite web searches, and it vastly improves their performance. ChatGPT is particularly impressive at multi-step web research. I have also witnessed them saying "I can't find the information you want" instead of hallucinating.

It's not perfect yet, but it's definitely climbing human percentiles in terms of reliability.

I think a lot of LLM detractors are still thinking of 2023-era ChatGPT. If everyone tried the most recent pro-level models with all the bells and whistles then I think there would be a lot less disagreement.

pdimitar 3 hours ago

Well please don't include me in some group of Luddites or something.
I use the mainstream LLMs and I've noted them improving. They have ways to go still.
I was objecting to my parent poster's implication that we have AGI. However muddy that definition is, I don't feel like we do have that.