Comment by aerhardt
9 months ago
My mom told me yesterday that Paul Newman had massive problems with alcohol. I was somewhat skeptical, so this morning I asked ChatGPT a very simple question:
"Is Paul Newman known for having had problems with alcohol?"
All of the models up to o3-mini-high told me he had no known problems. Here's o3-mini-high's response:
"Paul Newman is not widely known for having had problems with alcohol. While he portrayed characters who sometimes dealt with personal struggles on screen, his personal life and public image were more focused on his celebrated acting career, philanthropic work, and passion for auto racing rather than any issues with alcohol. There is no substantial or widely reported evidence in reputable biographies or interviews that indicates he struggled with alcohol abuse."
There is plenty of evidence online that he struggled a lot with alcohol, including testimony from his long-time wife Joanne Woodward.
I sent my mom the ChatGPT reply and in five minutes she found an authoritative source to back her argument [1].
I use ChatGPT for many tasks every day, but I couldn't fathom that it would get so wrong something so simple.
Lesson(s) learned... Including not doubting my mother's movie trivia knowledge.
[1] https://www.newyorker.com/magazine/2022/10/24/who-paul-newma...
This is less an LLM thing than an information retrieval question. If you choose a model and tell it to “Search,” you find citation based analysis that discusses that he indeed had problems with alcohol. I do find it interesting it quibbles whether he was an alcoholic or not - it seems pretty clear from the rest that he was - but regardless. This is indicative of something crucial when placing LLMs into a toolkit. They are not omniscient nor are they deductive reasoning tools. Information retrieval systems are excellent at information retrieval and should be used for information retrieval. Solvers are excellent at solving deductive problems. Use them. The better they get at these tasks alone is cool but is IMO a parlor trick since we have nearly optimal or actually optimal techniques that don’t need an LLM. The LLM should use those tools. So, click search next time you have an information retrieval question. https://chatgpt.com/share/67f2dac0-3478-8000-9055-2ae5347037...
I realise your answer wasn't assertive, but if I heard this from someone actively defending AI it would be a copout. If the selling point is that you can ask these AIs anything then one can't retroactively go "oh but not that" when a particular query doesn't pan out.
My point is the opposite of this point of view. I believe generative AI is the most significant advance since hypertext and the overlay of inferred semantic relationships via pagerank etc. In fact the creation of hypertext and the toolchains around it led to this point at all - neural networks were understood at that point and transformer attention is just an innovation. It’s the collective human assembly of language and visual interconnected knowledge at a pan cultural and global scale that enabled the current state.
The abilities of LLM alone to do astounding natural language processing beyond the ability of anything prior by unthinkable Turing test passing miles. The fact it can reason abductively, which computing techniques to date have been unable to is amazing. The fact you can mix it with multimodal regimes - images, motion, virtually anything that can be semantically linked via language, is breathtaking. The fact it can be augmented with prior computing techniques - IR, optimization, deductive solvers, and literally everything we’ve achieved to date should give anyone knowledgeable of such things shivers for what the future holds.
But I would never hold that generative AI techniques are replacements for known optimal techniques. But the ensemble is probably the solution to nearly every challenge we face. When we hit the limits of LLMs today, I think, well, at least we already have grand master beating chess solvers and it’s irrelevant the LLM can’t directly. The LLM and other generative AI techniques in my mind are like gasses that fill through learned approximation the things we’ve not been able to solve directly, including the assembly of those solutions ad hoc. This is why since the first time BERT came along I knew agent based techniques were the future.
Right now we live at time like early hypertext with respect to AI. Toolchains suck, LLMs are basically geocities pages with “under construction” signs. We will go through an explosive exploration, some stunning insights that’ll change the basic nature of our shared reality (some wonderful some insidious), then if we aren’t careful - and we rarely are - enshitification at scale unseen before.
This is a bit of a strawman. There are certainly people who claim that you can ask AIs anything but I don't think the parent commenter ever made that claim.
"AI is making incredible progress but still struggles with certain subsets of tasks" is self-consistent position.
1 reply →
lotta words here to say AI can't do basic search right
Lotta words to say AI can’t do basic search in the same way a web browser can’t do basic search, but given a search engine both can.
1 reply →
Any information found in a web search about Newman will be available in the training set (more or less). It's almost certainly a problem of alignment / "safety" causing this issue.
"Any information found in a web search about Newman will be available in the training set"
I don't think that is a safe assumption these days. Training modern LLM isn't about dumping in everything on the Internet. To get a really good model you have to be selective about your sources of training data.
They still rip off vast amounts of copyrighted data, but I get the impression they are increasingly picky about what they dump into their training runs.
There’s a simpler explanation than that’s that the model weights aren’t an information retrieval system and other sequences of tokens are more likely given the totality of training data. This is why for an information retrieval task you use an information retrieval tool similarly to how for driving nails you use a hammer rather than a screw driver. It may very well be you could drive the nail with the screw driver, but why?
2 replies →
Gemini (2.5 Pro):
"Yes, Paul Newman was widely known for being a heavy drinker, particularly of beer. He himself acknowledged his significant alcohol consumption."
The answer I got (https://gemini.google.com/share/9e327dc4be03) includes references such as https://apnews.com/article/entertainment-reviews-movies-paul... and https://www.psychologytoday.com/us/blog/the-playing-field/20... although they are redacted from the public-sharing link.
Though a local model I'm running (gemma-3-27b-it; https://huggingface.co/lmstudio-community/gemma-3-27b-it-GGU...) just told me various correct sounding bits about his history with alcohol (correctly citing his alma mater and first wife), but threw in:
"Sobriety & AA: Newman got sober in 1964 and remained so for the rest of his life."
Which doesn't check out. And it includes plausible but completely hallucinated URLs (as well as a valid biography.com URL that completely omits information about alcohol.)
Gemma 3 4B (QAT quant): Yes, Paul Newman was indeed known to have struggled with alcohol throughout his life. While he maintained a public image of a charming, clean-cut star, he privately battled alcoholism for many years. He sought treatment in the late 1980s and early 1990s and was reportedly very open about his struggles and the importance of seeking help.
Perplexity:
>Paul Newman is indeed known for having struggled with alcohol during his life. Accounts from various sources, including his own memoir and the documentary ... (https://www.perplexity.ai/search/is-paul-newman-known-for-ha...)
I guess there's something about ChatGPT's set up that makes it different? Maybe they wanted it to avoid libeling people?
Grok:
> Yes, Paul Newman was known to have struggled with alcohol at certain points in his life. In his early years, particularly during his time in the Navy and into his acting career, Newman admitted to heavy drinking. He was open about his fondness for beer and once jokingly referred to himself as a “functional alcoholic.” In a 1988 interview with The New York Times, he acknowledged that he had a period where he drank too much, stating, “I was a very good drinker. I could put it away.” ...
https://grok.com/share/bGVnYWN5_86843e8a-39ee-415d-8785-4f8e...
Does the as yet unwritten prequel of Idiocracy tell the tale of when we started asking Ai chat bots for facts and this was the point of no return for humanity?
The chatgpt'd tariffs marks this epoch perfectly.
It turns out there's huge demand for un-monetized web search.
I like that it's unmonetized, of course, but that's not why I use AI. I use AI because it's better at search. When I can't remember the right keywords to find something, or when the keywords aren't unique, I frequently find that web search doesn't return what I need and AI does.
It's impressive how often AI returns the right answer to vague questions. (not always though)
2 replies →
Soon sadly, there will be a huge demand for un-monetized LLMs. Enshitification is coming.
1 reply →
Can you blame the users for asking it, when everyone is selling that as a key defining feature?
I use it for asking - often very niche - questions on advanced probability and simulation modeling, and it often gets those right - why those and not a simple verifiable fact about one of the most popular actors in history?
I don’t know about Idiocracy, but something that I have read specific warnings about is that people will often blame the user for any of the tool’s misgivings.
Some prior works that work as prequels include C.M. Kornbluth's "The Marching Morons" and "The Little Black Bag."
"Is Paul Newman known for having had problems with alcohol?"
https://chatgpt.com/share/67f332e5-1548-8012-bd76-e18b3f8d52...
Your query indeed answers "...not widely known..."
"Did Paul Newman have problems with alcoholism?"
https://chatgpt.com/share/67f3329a-5118-8012-afd0-97cc4c9b72...
"Yes, Paul Newman was open about having struggled with alcoholism"
What's the issue? Perhaps Paul Newman isn't _famous_ ("known") for struggling with alcoholism. But he did struggle with alcoholism.
Your usage of "known for" isn't incorrect, but it's indeed slightly ambiguous.
Counterpoint: Paul Newman was absolutely a famous drunk, as evidenced by this Wikipedia page.* Any query for "paul newman alcohol" online will return dozens of reputable sources on the topic. Your post is easily interpretable as handwaving apologetics, and it gives big "Its the children who are wrong" energy.
*https://en.wikipedia.org/wiki/Newman_Day
"Paul Newman alcohol" is just showing you results where those words are all present, it's not really implying how widely known it is.
3 replies →
This may have hit the nail on the head about the weaknesses of LLM's.
They're going to regurgitate something not so much based on facts, but based on things that are accessible as perceived facts. Those might be right, but they might be wrong also; and no one can tell without doing the hard work of checking original sources. Many of what are considered accepted facts, and also accessible to LLM harvesting, are at best derived facts, often mediated by motivated individuals, and published to accessible sources by "people with an interest".
The weightings used by any AI should be based on the facts, and not the compounded volume of derived, "mediated", or "directed" facts - simply, because they're not really facts; they're reports.
It all seems like dumber, lazier search engine stuff. Honestly, what do I know about Paul Newman? But, Joanne Woodward and others who knew and worked with him should be weighted as being, at least, slightly more credible that others; no matter how many text patterns "catch the match" flow.
> I use ChatGPT for many tasks every day, but I couldn't fathom that it would get so wrong something so simple.
I think we'll have a term like we have for parents/grandparents that believe everything they see on the internet but specifically for people using LLMs.
Look at how many people believe in extremist news outlets!
What is that term?
Thats not really 'simple' for an LLM. This is a niche information about a specifc person, LLM's train on massive amount of data, the more a topic is being present in the data, the better will the answers be.
Also, you can/should use the "research" mode for questions like this.
The question is simple and verifiable - it is impressive to me that it’s not contained in the LLM’s body of knowledge - or rather that it can’t reach the answer.
This is niche in the grand scheme of knowledge but Paul Newman is easily one of the biggest actors in history, and the LLM has been trained on a massive corpus that includes references to this.
Where is the threshold for topics with enough presence in the data?
The question might be simple and verifiable, but it is not a simple for an LLM to mark a particular question as such. This is the tricky part.
An LLM does not care about your question, it is a bunch of math that will spit out a result based on what you typed in.
[flagged]
I asked GPT-4.5 and it searched the web and immediately gave me a "yes" with paragraphs of sources cited.
Truth is a probability game. Just keep trying until you arrive.
"man puts prompt into LLM" > "LLM ships bad code" > "machine fails" > "person dies" > "fire man" > "man puts prompt into LLM"
1 reply →
3-4 hours is enough time for It to have crawled the hacker news comments section. That's about the frequency the AI bots crawl my little out of the way blog.
I just had Cursor Pro + Sonnet 3.7 Max one shot a python script to send this question to every model available through groq.
>Found 24 models: llama3-70b-8192, llama-3.2-3b-preview, meta-llama/llama-4-scout-17b-16e-instruct, allam-2-7b, llama-guard-3-8b, qwen-qwq-32b, llama-3.2-1b-preview, playai-tts-arabic, deepseek-r1-distill-llama-70b, llama-3.1-8b-instant, llama3-8b-8192, qwen-2.5-coder-32b, distil-whisper-large-v3-en, qwen-2.5-32b, llama-3.2-90b-vision-preview, deepseek-r1-distill-qwen-32b, whisper-large-v3, llama-3.3-70b-specdec, llama-3.3-70b-versatile, playai-tts, whisper-large-v3-turbo, llama-3.2-11b-vision-preview, mistral-saba-24b, gemma2-9b-it
Excluding the ones that do not support chat completions, all but one (qwen-qwq-32b) answered in the affirmative. The answer from qwen-qwq-32b said:
Using lack of progress in a specialized field as a barometer for overall progress is kind of silly. I just spent the last few days 'vibe coding' an application and I have to say that it's pretty remarkable how capable it is now relative to my experience last year.
It took three minutes for me to do the above from the time I created my API key to when I had an answer.
I find that everyone who replies with examples like this is an expert using expert skills to get the LLM to perform. Which makes me think why is this a skill that is useful to general public as opposed to another useful skill for technical knowledge workers to add to their tool belt?
I agree. But I will say that at least in my social circles I'm finding that a lot of people outside of tech are using these tools, and almost all of them seem to have a healthy skepticism about the information they get back. The ones that don't will learn one way or the other.
Gemini 2.5 Pro
Yes, Paul Newman was known for being a heavy drinker, particularly of beer. 1 He acknowledged his high consumption levels himself. 1. Review: Paul Newman memoir stuns with brutal honesty - AP News
apnews.com
While he maintained an incredibly successful career and public life, accounts and biographies note his significant alcohol intake, often describing it as a functional habit rather than debilitating alcoholism, although the distinction can be debated. He reportedly cut back significantly in his later years.
I just asked 4o:
Yes, Paul Newman did experience significant struggles with alcohol. In his posthumously published memoir, The Extraordinary Life of an Ordinary Man, Newman candidly discusses his drinking habits and acknowledges his long-term battle with alcoholism. He describes himself as a "functioning alcoholic," a trait he noted was shared with his father. At one point, Newman was reported to consume a case of beer daily, followed by spirits, until he eventually gave up hard liquor.
My own markov chains follow up "until he" with fundamentally distinct follow-ups.
These models are not reliable sources of information. They are either out of date, subject to hallucination, or just plain wrong for a variety of reasons. They are untrustworthy to ask facts like this.
I appreciate your consideration of a subjective question and how you explained it and understand these nuances. But please - do not trust chatgpt etc. I continue to be frustrated at the endless people claiming something is true from chatgpt. I support the conclusions of this author.
this seems less like an issue with accuracy and more like an issue with model providers making sure they don't get sued for libel
I thought about that too.
For questions like this, you need to tell the bot to search the web.
> Hi, can you search the web and find out if Paul Newman has any alcohol problems?
> Paul Newman, the acclaimed actor and philanthropist, had a well-documented fondness for alcohol, particularly beer. He was known to consume significant quantities, with some accounts noting he would drink a case of beer in a day. This level of consumption led some to describe him as a functioning alcoholic. His drinking habits reportedly caused strain in his marriage to Joanne Woodward. In the late 1970s, an incident occurred where Woodward did not allow Newman back into their house because of his drinking. He promised to give up hard liquor as a compromise, but continued to drink beer, sometimes consuming a case during dinner.
Perplexity had no problem with the question, replying in part, "Paul Newman is known to have struggled with alcohol throughout his life, as revealed in various accounts, including his own memoir and documentaries about his life...."
I don't think this reflects a flaw in LLM's at all. It is true that Paul Newman is not "known for" having alcohol problems. Paul Newman is known for acting, and salad dressing. Paul Newman has been "known to" have had alcohol problems, which formed as a question correctly one-shots accurate results for me across a few common LLMs. Someone like Betty Ford could be "known for" having alcohol problems. Paul Newman's alcohol problems are relatively specialized knowledge.
good point, though a great LLM answer would DWIM here and explain both interpretations as you have done here, as opposed to (what presumably happened) focus on one and ignore the other.
And yet the New Yorker article was the first thing that my Google search for “Paul Newman alcoholism” returned.
Apparently it isn’t so specialized that a pretty obvious old fashioned web search on Google wouldn’t immediately return an authoritative source.
LLMs aren't good at being search engines, they're good at understanding things. Put an LLM on top of a search engine, and that's the appropriate tool for this use case.
I guess the problem with LLMs is that they're too usable for their own good, so people don't realizing that they can't perfectly know all the trivia in the world, exactly the same as any human.
> LLMs aren't good at being search engines, they're good at understanding things.
LLMs are literally fundamentally incapable of understanding things. They are stochastic parrots and you've been fooled.
A stochastic parrot with a sufficiently tiny residual error rate needs a stochastic model so precisely compressing the world and sophisticated decompression algorithms that it could be called reasoning.
Take two 4K frames of a falling vase, ask a model to predict the next token... I mean the following images. Your model now needs include some approximations of physics - and the ability to apply it correctly - to produce a realistic outcome. I'm not aware of any model capable of doing that, but that's what it would mean to predict the unseen with high enough fidelity.
We're talking about a stochastic parrot which in many circumstances responds in a way which is indistinguishable from actual understanding.
8 replies →
For them to work at all they need to have some representation of concepts. Recent research at anthropic has shown a surprising complexity in their reasoning behavior. Perhaps the parrot here is you.
What do you call someone that mentions "stochastic parrots" every time LLMs are mentioned?
12 replies →
What does the word "understand" mean to you?
2 replies →
> I guess the problem with LLMs is that they're too usable for their own good, so people don't realizing that they can't perfectly know all the trivia in the world, exactly the same as any human.
They're quite literally being sold as a replacement for human intellectual labor by people that have received uncountable sums of investment money towards that goal.
The author of the post even says this:
"These machines will soon become the beating hearts of the society in which we live. The social and political structures they create as they compose and interact with each other will define everything we see around us."
Can't blame people "fact checking" something that's supposed to fill these shoes.
People should be (far) more critical of LLMs given all of these style of bold claims, not less.
Also, telling people they're "holding it wrong" when they interact with alleged "Ay Gee Eye" "superintelligence" really is a poor selling point, and no way to increase confidence in these offerings.
These people and these companies don't get to make these claims that threaten the livelihood of millions of people, inflate a massive bubble, impact hiring decisions and everything else we've seen and then get excused cause "whoops you're not supposed to use it like that, dummy."
Nah.
Your point is still trivially disproven by the fact that not even humans are expected to know all the world's trivia off the top of their heads.
We can discuss whether LLMs live up to the hype, or we can discuss how to use this new tool in the best way. I'm really tired of HN insisting on discussing the former, and I don't want to take part in that. I'm happy to discuss the latter, though.
> Put an LLM on top of a search engine, and that's the appropriate tool for this use case.
Hm nope, now that the web if flooded by LLM generated content it's game over. I can't tell how many times I almost got fooled by recipes &co which seem legit at first but are utter non sense. And now we're feeding that garbage back to where it came from
That's not relevant to what we're discussing, though.
Ironically though an LLM powered search engine (some word about being perplexed) is becoming way better than the undisputed king of traditional search engines (something oogle)
That's because they put an LLM over a traditional search engine.
6 replies →
Perplexity and open-webui+ollama in web search mode answer this question correctly.
LLMs will never be good at specific knowledge unless specifically trained for with narrow "if else" statements.
Its good for broad general overview such as most popular categories of books in the world.
Really? Open-AI says PhD intelligence is just around the corner!
If we were to survey 100 PhDs how many would know correctly that Paul Newman had an alcohol problem.
3 replies →
I don't think they mean "knowledge" when they talk about "intelligence." LLMs are definitely not knowledge bases. They can transform information given to them in impressive ways, but asking a raw (non-RAG-enabled) LLM to provide its own information will probably always be a mistake.
1 reply →
Looks like you are using the wrong models
https://g.co/gemini/share/ffa5a7cd6f46
I can’t reproduce. Maybe others reported the error and someone adjusted the expected answer, I do not know enough about OpenAI operations to say for sure.
The reason this bothers me is that comments like this reinforce the believes of people that could otherwise find value in these tools.
But I think points like this would be better made in shared chats or screenshots, since we do not have something like a core dump or stacktrace to attach.
And while I am not saying OP did this, I have seen technically skilled engineers asserting/implying that llm/chatbots aren’t good or not useful to them look at their chat log that a multitude of topics that I am sure would impact the result of the query.
Yes. It can be an UX problem. Yes. It can be an algorithmc problem. But they are just tools that can be used wrong and not a perfect mechanical brain.
Testing the query on Kagi
# Quick Answer
Yes, Paul Newman struggled with alcohol. His issues with alcohol were explored in the HBO Max documentary, The Last Movie Stars, and Shawn Levy's biography, Paul Newman: A Life. According to a posthumous memoir, Newman was tormented by self-doubt and insecurities and questioned his acting ability. His struggles with alcohol led to a brief separation from Joanne Woodward, though it had nothing to do with cheating.
(4x Source footnotes omitted for readability)
# Ki Multi-step Research Assistant
Paul Newman is known to have struggled with alcohol. According to his posthumous memoir, Newman candidly discussed his issues with drinking and self-doubt, describing himself as an alcoholic who was tormented by insecurities[^1][^2]. He reportedly drank a significant amount of beer daily and later moved on to stronger drinks like Scotch[^3][^4]. His drinking habits were a notable part of his life, and he was often identified by his beer drinking[^5][^6]. Despite these struggles, Newman was also recognized for his generosity and devotion to his family[^7].
[^1]: [Posthumous memoir reveals Paul Newman's struggle with alcohol](https://www.express.co.uk/entertainment/films/1688113/paul-n...) [^2]: [Review: Paul Newman memoir stuns with brutal honesty - AP News](https://apnews.com/article/entertainment-reviews-movies-paul...) [^3]: [Was Paul Newman Really an Alcoholic? - The Cinemaholic](https://thecinemaholic.com/was-paul-newman-really-an-alcohol...) [^4]: [Who Paul Newman Was—and Who He Wanted to Be](https://www.newyorker.com/magazine/2022/10/24/who-paul-newma...) [^5]: [Paul Newman's Beer Drinking - Psychology Today](https://www.psychologytoday.com/us/blog/addiction-in-society...) [^6]: [Paul Newman's Beer Drinking | Psychology Today United Kingdom](https://www.psychologytoday.com/gb/blog/addiction-in-society...) [^7]: [The troubled life of Paul Newman | The Spectator](https://www.spectator.co.uk/article/the-troubled-life-of-pau...)
> "According to a posthumous memoir, Newman was tormented by self-doubt and insecurities and questioned his acting ability. His struggles with alcohol led to a brief separation from Joanne Woodward, though it had nothing to do with cheating."
'though it had nothing to do with cheating' is a weird inclusion.
We all know this isn't the kind of question you ask an LLM...
You may know this, but that doesn't mean we all do. LLMs are marketed towards many different groups, including folks who don't have much tech knowledge.
"known to" !== "known for"
So, in other words, are you saying that AI model progress is the real deal and is not bullshit?
That is, as you point out, "all of the models up to o3-mini-high" give an incorrect answer, while other comments say that OpenAIs later models give correct answers, with web citations. So it would seem to follow that "recent AI model progress" actually made a verifiable improvement in this case.
I am pretty sure that they must have meant "up through", not "up to", as the answer from o3-mini-high is also wrong in a way which seems to fit the same description, no?
I tried with 4o and it gave me what I thought was a correct answer:
> Paul Newman was not publicly known for having major problems with alcohol in the way some other celebrities have been. However, he was open about enjoying drinking, particularly beer. He even co-founded a line of food products (Newman’s Own) where profits go to charity, and he once joked that he consumed a lot of the product himself — including beer when it was briefly offered.
> In his later years, Newman did reflect on how he had changed from being more of a heavy drinker in his youth, particularly during his time in the Navy and early acting career, to moderating his habits. But there’s no strong public record of alcohol abuse or addiction problems that significantly affected his career or personal life.
> So while he liked to drink and sometimes joked about it, Paul Newman isn't generally considered someone who had problems with alcohol in the serious sense.
As other's have noted, LLMs are much more likely to be cautious in providing information that could be construed as libel. While Paul Newman may have been an alcoholic, I couldn't find any articles about it being "public" in the same way as others, e.g. with admitted rehab stays.
oh no. LLMs aren't up on the latest celebrity gossip. whatever shall we do.
Unless you're expecting an LLM to have access to literally all information on earth at all times I find it really hard to care about this particular type of complaint.
My calculator can't conjugate German verbs. That's fine IMO. It's just a tool
Your calculator won't confidently pretend to conjugate German verbs while doing so incorrectly, though.
Yes but a tool for what? When asked a question individuals that don't already have detailed knowledge of a topic are left with no way to tell if the AI generated response is complete bullshit, uselessly superficial, or detailed and on point. The only way to be sure is to then go do the standard search engine grovel looking for authoritative sources.