Comment by pbalau

3 months ago

> what romanian football player won the premier league

> The only Romanian football player to have won the English Premier League (as of 2025) is Florin Andone, but wait — actually, that’s incorrect; he never won the league.

> ...

> No Romanian footballer has ever won the Premier League (as of 2025).

Yes, this is what we needed, more "conversational" ChatGPT... Let alone the fact the answer is wrong.

64 comments

pbalau

Quarrel 3 months ago

My worry is that they're training it on Q&A from the general public now, and that this tone, and more specifically, how obsequious it can be, is exactly what the general public want.

Most of the time, I suspect, people are using it like wikipedia, but with a shortcut to cut through to the real question they want answered; and unfortunately they don't know if it is right or wrong, they just want to be told how bright they were for asking it, and here is the answer.

OpenAI then get caught in a revenue maximising hell-hole of garbage.

God, I hope I am wrong.

xmcqdpt2 3 months ago
LLMs only really make sense for tasks where verifying the solution (which you have to do!) is significantly easier than solving the problem: translation where you know the target and source languages, agentic coding with automated tests, some forms of drafting or copy editing, etc.
General search is not one of those! Sure, the machine can give you its sources but it won't tell you about sources it ignored. And verifying the sources requires reading them, so you don't save any time.
- embedding-shape 3 months ago
  
  I agree a lot with the first part, the only time I actually feel productive with them is when I can have a short feedback cycle with 100% proof if it's correct or not, as soon as "manual human verification" is needed, things spiral out of control quickly.
  > Sure, the machine can give you its sources but it won't tell you about sources it ignored.
  You can prompt for that though, include something like "Include all the sources you came across, and explain why you think it was irrelevant" and unsurprisingly, it'll include those. I've also added a "verify_claim" tool which it is instructed to use for any claims before sharing a final response, checks things inside a brand new context, one call per claim. So far it works great for me with GPT-OSS-120b as a local agent, with access to search tools.
  
  4 replies →
- btown 3 months ago
  
  One of the dangers of automated tests is that if you use an LLM to generate tests, it can easily start testing implemented rather than desired behavior. Tell it to loop until tests pass, and it will do exactly that if unsupervised.
  And you can’t even treat implementation as a black box, even using different LLMs, when all the frontier models are trained to have similar biases towards confidence and obsequiousness in making assumptions about the spec!
  Verifying the solution in agentic coding is not nearly as easy as it sounds.
  
  1 reply →
- Zr01 3 months ago
  
  I've often found it helpful in search. Specifically, when the topic is well-documented, you can provide a clear description, but you're lacking the right words or terminology. Then it can help in finding the right question to ask, if not answering it. Recall when we used to laugh at people typing in literal questions into the Google search bar? Those are the exact types of queries that the LLM is equipped to answer. As for the "improvements" in GPT 5.1, seems to me like another case of pushing Clippy on people who want Anton. https://www.latent.space/p/clippy-v-anton
- msabalau 3 months ago
  
  That's a major use case, especially if the definition is broad enough to include take my expertise, knowledge and perhaps a written document, and transmute it to others forms--slides, illustrations, flash cards, quizzes, podcasts, scripts for an inbound call center.
  But there seem to be uses where a verified solution is irrelevant. Creativity generally--an image, poem, description of an NPC in a roleplaying game, the visuals for a music video never have to be "true", just evocative. I suppose persuasive rhetoric doesn't have to be true, just plausible or engaging.
  As for general search, I don't know that we can say that "classic search" can be meaningful said to tell you about the sources it ignored. I will agree that using OpenAI or Perplexity for search is kind of meh, but Google's AI Mode does a reasonable job at informing you about the links it provides, and you can easily tab over to a classic search if you want. It's almost like having a depth of expertise doing search helps in building a search product the incorporates an LLM...
  But, yeah, if one is really disinterested in looking at sources, just chatting with a typical LLM seems a rather dubious way to get an accurate or reasonable comprehensive answer.
- kenjackson 3 months ago
  
  Don’t search engines have the same problem? You don’t get back a list of sites that the engine didn’t prefer for some reason.
  
  2 replies →
- wongarsu 3 months ago
  
  [dead]
kace91 3 months ago
I’m of two minds about this.
The ass licking is dangerous to our already too tight information bubbles, that part is clear. But that aside, I think I prefer a conversational/buddylike interaction to an encyclopedic tone.
Intuitively I think it is easier to make the connection that this random buddy might be wrong, rather than thinking the encyclopedia is wrong. Casualness might serve to reduce the tendency to think of the output as actual truth.
- gizajob 3 months ago
  
  Sam Altman probably can’t handle any GPT models that don’t ass lick to an extreme degree so they likely get nerfed before they reach the public.
chud37 3 months ago

Its very frustating that it can't be relied upon. I was asking gemini this morning about Uncharted 1,2 and 3 if they had a remastered version for the PS5. It said no. Then 5 minutes later I on the PSN store there were the three remastered versions for sale.
underlipton 3 months ago

People have been using, "It's what the [insert Blazing Saddles clip here] want!" for years to describe platform changes that dumb down features and make it harder to use tools productively. As always, it's a lie; the real reason is, "The new way makes us more money," usually by way of a dark pattern.
Stop giving them the benefit of the doubt. Be overly suspicious and let them walk you back to trust (that's their job).
ceejayoz 3 months ago

> My worry is that they're training it on Q&A from the general public now, and that this tone, and more specifically, how obsequious it can be, is exactly what the general public want.
That tracks; it's what's expected of human customer service, too. Call a large company for support and you'll get the same sort of tone.
intended 3 months ago

We know they are using it like search - there’s a jigsaw paper around this.
Wololooo 3 months ago

Again, if they had anything worth in the pipeline, Sora wouldn't have been a thing...
jollyllama 3 months ago

While I wouldn't strain the analogy, a wolfdog is more capable but people love lapdogs.

A_D_E_P_T 3 months ago

Which model did you use? With 5.1 Thinking, I get:

"Costel Pantilimon is the Romanian footballer who won the English Premier League.

"He did it twice with Manchester City, in the 2011–12 and 2013–14 seasons, earning a winner’s medal as a backup goalkeeper. ([Wikipedia][1])

URLs:

* [https://en.wikipedia.org/wiki/Costel_Pantilimon]

* [https://www.transfermarkt.com/costel-pantilimon/erfolge/spie...]

* [https://thefootballfaithful.com/worst-players-win-premier-le...

[1]: https://en.wikipedia.org/wiki/Costel_Pantilimon?utm_source=c... "Costel Pantilimon""

marginalx 3 months ago
I just asked chatgpt 5.1 auto (not instant) on teams account, and its first repsonse was...
I could not find a Romanian football player who has won the Premier League title.
If you like, I can check deeper records to verify whether any Romanian has been part of a title-winning squad (even if as a non-regular player) and report back.
Then I followed up with an 'ok' and it then found the right player.
- marginalx 3 months ago
  
  Just to rule out a random error, I asked the same question two more times in separate chats to gpt 5.1 auto, below are responses...
  #2: One Romanian footballer who did not win the Premier League but played in it is Dan Petrescu.
  If you meant actually won the Premier League title (as opposed to just playing), I couldn’t find a Romanian player who is a verified Premier League champion.
  Would you like me to check more deeply (perhaps look at medal-winners lists) to see if there is a Romanian player who earned a title medal?
  #3: The Romanian football player who won the Premier League is Costel Pantilimon.
  He was part of Manchester City when they won the Premier League in 2011-12 and again in 2013-14. Wikipedia +1
Traubenfuchs 3 months ago

The beauty of nondeterminism. I get:
The Romanian football player who won the Premier League is Gheorghe Hagi. He played for Galatasaray in Turkey but had a brief spell in the Premier League with Wimbledon in the 1990s, although he didn't win the Premier League with them.
However, Marius Lăcătuș won the Premier League with Arsenal in the late 1990s, being a key member of their squad.
RobinL 3 months ago

Same:
Yes — the Romanian player is Costel Pantilimon. He won the Premier League with Manchester City in the 2011-12 and 2013-14 seasons.
If you meant another Romanian player (perhaps one who featured more prominently rather than as a backup), I can check.
sigmoid10 3 months ago

Same here, but with the default 5.1 auto and no extra settings. Every time someone posts one of these I just imagine they must have misunderstood the UI settings or cluttered their context somehow.

0xdeafbeef 3 months ago

https://chatgpt.com/s/t_6915c8bd1c80819183a54cd144b55eb2

Damn this is a lot of self correcting

djeastm 3 months ago

This sounds like my inner monologue during a test I didnt study for
saaaaaam 3 months ago

That's complete garbage.
zingababba 3 months ago

The emojis are the cherry on top of this steaming pile of slop.
r_lee 3 months ago

Lmao what the hell have they made

4b11b4 3 months ago

Why is this top comment.. this isn't a question you ask an LLM. But I know, that's how people are using them and is the narrative which is sold to us...

forgetfulness 3 months ago
You see people (business people who are enthusiastic about tech, often), claiming that these bots are the new Google and Wikipedia, and that you’re behind the times if you do, what amounts, to looking up information yourself.
We’re preaching to the choir by being insistent here that you prompt these things to get a “vibe” about a topic rather than accurate information, but it bears repeating.
- arghwhat 3 months ago
  
  They are only the new Google when they are told to process and summarize web searches. When using trained knowledge they're about as reliable as a smart but stubborn uncle.
  Pretty much only search-specific modes (perplexity, deep research toggles) do that right now...
- wrsh07 3 months ago
  
  Out of curiosity, is this a question you think Google is well-suited to answer^? How many Wikipedia pages will you need to open to determine the answer?
  When folks are frustrated because they see a bizarre question that is an extreme outlier being touted as "model still can't do _" part of it is because you've set the goalposts so far beyond what traditional Google search or Wikipedia are useful for.
  ^ I spent about five minutes looking for the answer via Google, and the only way I got the answer was their ai summary. Thus, I would still need to confirm the fact.
  
  2 replies →
saghm 3 months ago

It's not how I use LLMs. I have a family member who often feels the need to ask ChatGPT almost any question that comes up in a group conversation (even ones like this that could easily be searched without needing an LLM) though, and I imagine he's not the only one who does this. When you give someone a hammer, sometimes they'll try to have a conversation with it.
hamburgererror 3 months ago
What do you ask them then?
- 4b11b4 3 months ago
  
  I'll respond to this bait in the hopes that it clicks for someone how to _not_ use an LLM..
  Asking "them"... your perspective is already warped. It's not your fault, all the text we've previously ever seen is associated with a human being.
  Language models are mathematical, statistical beasts. The beast generally doesn't do well with open ended questions (known as "zero-shot"). It shines when you give it something to work off of ("one-shot").
  Some may complain of the preciseness of my use of zero and one shot here, but I use it merely to contrast between open ended questions versus providing some context and work to be done.
  Some examples...
  - summarize the following
  - given this code, break down each part
  - give alternatives of this code and trade-offs
  - given this error, how to fix or begin troubleshooting
  I mainly use them for technical things I can then verify myself.
  While extremely useful, I consider them extremely dangerous. They provide a false sense of "knowing things"/"learning"/"productivity". It's too easy to begin to rely on them as a crutch.
  When learning new programming languages, I go back to writing by hand and compiling in my head. I need that mechanical muscle memory, same as trying to learn calculus or physics, chemistry, etc.
  
  6 replies →
- mckirk 3 months ago
  
  You either give them the option to search the web for facts or you ask them things where the utility/validity of the answer is defined by you (e.g. 'summarize the following text...') instead of the external world.

javcasas 3 months ago

Oh yeah, yes, baby, burn those tokens, yes! The more you burn the bigger the invoice!

theoldgreybeard 3 months ago

I really only use LLMs for coding and IT related questions. I've had Claude self-correct itself several times about how something might be the more idiomatic way do do something after starting to give me the answer. For example, I'll ask how to set something up in a startup script and I've had it start by giving me strict POSIX syntax then self-correct once it "realizes" that I am using zsh.

I find it amusing, but also I wonder what causes the LLM to behave this way.

rightbyte 3 months ago
> I find it amusing, but also I wonder what causes the LLM to behave this way.
Forum threads etc. should have writers changing their minds upon feedback which might have this effect, maybe.
- embedding-shape 3 months ago
  
  Some people are guilty of writing stuff as they go along it as well. You could maybe even say they're more like "thinking out loud", forming the idea and the conclusion as they go along rather than knowing it from the beginning. Then later, when they have some realization, like "thinking out loud isn't entirely accurate, but...", they keep the entire comment as-is rather than continuously iterate on it like a diffusion model would do. So the post becomes like a chronological archive of what the author thought and/or did, rather than just the conclusion.

oblio 3 months ago

We need to turn this into the new "pelican on bike" LLM test.

Let's call it "Florin Andone on Premier League" :-)))

hamburgererror 3 months ago

Meanwhile on duck.ai

ChatGPT 4o-mini, 5 mini and OSS 120B gave me wrong answers.

Llama 4 Scout completely broke down.

Claude Haiku 3.5 and Mistral Small 3 gave the correct answer.

a3w 3 months ago

Why are you asking abouts facts?

Okay, as a benchmark, we can try that. But it probably will never work, unless it does a web or db query.

usrbinbash 3 months ago

Okay, so, should I not ask it about facts?
Because, one way or another, we will need to do that for LLMs to be useful. Whether the facts are in the training data or the context knowledge (RAG provided), is irrelevant. And besides, we are supposed to trust that these things have "world knowledge" and "emergent capabilities", precisely because their training data contain, well, facts.

ta12653421 3 months ago

The best thing is that all this stuff is accounted to your token usage, so they have an adverse incentive :D

sebbecai 3 months ago

For non thinking/agentic models, they must 1-shot the answer. So every token it outputs is part of the response, even if it's wrong.
This is why people are getting different results with thinking models -- it's as if you were going to be asked ANY question and need to give the correct answer all at once, full stream-of-consciousness.
Yes there are perverse incentives, but I wonder why these sorts of models are available at all tbh.

estimator7292 3 months ago

"Ah-- that's a classic confusion about football players. Your intuition is almost right-- let me break it down"

NuclearPM 3 months ago

Just ask for sources. Problem solved.