Comment by plaidfuji

2 years ago

These LLMs do not have a concept of factual correctness and are not trained/optimized as such. I find it laughable that people expect these things to act like quiz bots - this misunderstands the nature of a generative LLM entirely.

It simply spits out whatever output sequence it feels is most likely to occur after your input sequence. How it defines “most likely” is the subject of much research, but to optimize for factual correctness is a completely different endeavor. In certain cases (like coding problems) it can sound smart enough because for certain prompts, the approximate consensus of all available text on the internet is pretty much true and is unpolluted by garbage content from laypeople. It is also good at generating generic fluffy “content” although the value of this feature escapes me.

In the end the quality of the information it will get back to you is no better than the quality of a thorough google search.. it will just get you a more concise and well-formatted answer faster.

13 comments

plaidfuji

zmmmmm 2 years ago

> because for certain prompts, the approximate consensus of all available text on the internet is pretty much true

I think you're slightly mischaracterising things here. It has potential to be at least slightly and possibly much better than that. This is evidenced by the fact it is much better than chance at answering "novel" questions that don't have a direct source in the training data. Why it can do it is because at a certain point, to solve the optimisation problem of "what word comes next" the least complex strategy actually becomes to start modeling principles of logic and facts connecting them. It is not in any systematic or reliable way so you can't ever guarantee when or how well it is going to apply these, but it is absolutely learning higher order patterns than simple text / pattern matching, and it is absolutely able to generalise these across topics.

plaidfuji 2 years ago
You’re absolutely right and I’m sure that something resembling higher-level pattern matching is present in the architecture and weights of the model, I’m just saying that I’m not aware of “logical thought” being explicitly optimized or designed for - it’s more of a sometimes-emergent feature of a machine that tries to approximate the content of the internet, which for some topics is dominated by mostly logical thought. I’m also unaware of a ground truth against which “correct facts” could even be trained for..
- Closi 2 years ago
  
  > I’m also unaware of a ground truth against which “correct facts” could even be trained for..
  Seems like there are quite a few obvious possibilities here off the top of my head. Ground truth for correct facts could be:
  1) Wikidata
  2) Mathematical ground truth (can be both generated and results validated automatically) including physics
  3) Programming ground truth (can be validated by running the code and defining inputs/outputs)
  4) Chess
  5) Human labelled images and video
  6) Map data
  7) Dependent on your viewpoint, peer reviewed journals, as long as cited with sources.

eurekin 2 years ago

The first question I always ask myself in such cases: how much input data has a simple "I don't know" lines? This is clearly a concept (not knowing sth) that has to be learned in order to be expressed in the output.

visarga 2 years ago
What stops you from asking the same question multiple times, and seeing if the answers are consistent. I am sure the capital of France is always going to come out Paris, but the name of a river passing a small village might be hallucinated differently. Even better - use two different models, if they agree it's probably true. And probably the best - provide the data to the model in context, if you have a good source. Don't use the model as fact knowledge base, use RAG.
- ntlk 2 years ago
  
  Can’t speak for other people but I find it more time consuming to get ChatGPT to correct its mistakes than to do the work myself.
  
  1 reply →
plaidfuji 2 years ago

Ha, probably an insignificant amount. The internet is nothing if not confidently-stated positive results, no matter how wrong they might be. No wonder this is how LLMs act.

TerrifiedMouse 2 years ago

> In the end the quality of the information it will get back to you is no better than the quality of a thorough google search.. it will just get you a more concise and well-formatted answer faster.

I would say it’s worse than Google search. Google tells you when it can’t find what you are looking for. LLMs “guess” a bullshit answer.

Closi 2 years ago
Not always, I think that is an unfair reflection of LLM's in their current state. See two trivial examples below:
https://chat.openai.com/share/ca733a4a-7cdb-4515-abd0-0444a4...
https://chat.openai.com/share/dced0cb7-b6c3-4c85-bc16-cdbf22...
Hallucinations are definitely a problem, but they are certainly less than they used to be - They will often say that they aren't sure but can speculate, or "it might be because..." etc.
- TerrifiedMouse 2 years ago
  
  I get the feeling that LLMs will tell you they don’t know if “I don’t know” is one of the responses in their training data set. If they actually don’t know, i.e. no trained responses, that’s when they start hallucinating.

visarga 2 years ago

> It simply spits out whatever output sequence it feels is most likely to occur after your input sequence... but to optimize for factual correctness is a completely different endeavor

What if the input sequence says "the following is truth:", assuming it skillfully predicts following text, it would mean telling the most likely truth according to its training data.

melagonster 2 years ago

unfortunately, this is the product they want to sold.