← Back to context

Comment by plaidfuji

2 years ago

These LLMs do not have a concept of factual correctness and are not trained/optimized as such. I find it laughable that people expect these things to act like quiz bots - this misunderstands the nature of a generative LLM entirely.

It simply spits out whatever output sequence it feels is most likely to occur after your input sequence. How it defines “most likely” is the subject of much research, but to optimize for factual correctness is a completely different endeavor. In certain cases (like coding problems) it can sound smart enough because for certain prompts, the approximate consensus of all available text on the internet is pretty much true and is unpolluted by garbage content from laypeople. It is also good at generating generic fluffy “content” although the value of this feature escapes me.

In the end the quality of the information it will get back to you is no better than the quality of a thorough google search.. it will just get you a more concise and well-formatted answer faster.

> because for certain prompts, the approximate consensus of all available text on the internet is pretty much true

I think you're slightly mischaracterising things here. It has potential to be at least slightly and possibly much better than that. This is evidenced by the fact it is much better than chance at answering "novel" questions that don't have a direct source in the training data. Why it can do it is because at a certain point, to solve the optimisation problem of "what word comes next" the least complex strategy actually becomes to start modeling principles of logic and facts connecting them. It is not in any systematic or reliable way so you can't ever guarantee when or how well it is going to apply these, but it is absolutely learning higher order patterns than simple text / pattern matching, and it is absolutely able to generalise these across topics.

  • You’re absolutely right and I’m sure that something resembling higher-level pattern matching is present in the architecture and weights of the model, I’m just saying that I’m not aware of “logical thought” being explicitly optimized or designed for - it’s more of a sometimes-emergent feature of a machine that tries to approximate the content of the internet, which for some topics is dominated by mostly logical thought. I’m also unaware of a ground truth against which “correct facts” could even be trained for..

    • > I’m also unaware of a ground truth against which “correct facts” could even be trained for..

      Seems like there are quite a few obvious possibilities here off the top of my head. Ground truth for correct facts could be:

      1) Wikidata

      2) Mathematical ground truth (can be both generated and results validated automatically) including physics

      3) Programming ground truth (can be validated by running the code and defining inputs/outputs)

      4) Chess

      5) Human labelled images and video

      6) Map data

      7) Dependent on your viewpoint, peer reviewed journals, as long as cited with sources.

The first question I always ask myself in such cases: how much input data has a simple "I don't know" lines? This is clearly a concept (not knowing sth) that has to be learned in order to be expressed in the output.

  • What stops you from asking the same question multiple times, and seeing if the answers are consistent. I am sure the capital of France is always going to come out Paris, but the name of a river passing a small village might be hallucinated differently. Even better - use two different models, if they agree it's probably true. And probably the best - provide the data to the model in context, if you have a good source. Don't use the model as fact knowledge base, use RAG.

    • Can’t speak for other people but I find it more time consuming to get ChatGPT to correct its mistakes than to do the work myself.

      1 reply →

  • Ha, probably an insignificant amount. The internet is nothing if not confidently-stated positive results, no matter how wrong they might be. No wonder this is how LLMs act.

> In the end the quality of the information it will get back to you is no better than the quality of a thorough google search.. it will just get you a more concise and well-formatted answer faster.

I would say it’s worse than Google search. Google tells you when it can’t find what you are looking for. LLMs “guess” a bullshit answer.

> It simply spits out whatever output sequence it feels is most likely to occur after your input sequence... but to optimize for factual correctness is a completely different endeavor

What if the input sequence says "the following is truth:", assuming it skillfully predicts following text, it would mean telling the most likely truth according to its training data.