I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

8 hours ago (mastodon.world)

567 comments

novemp

I've used LLMs enough that I have a good sense of their _edges_ of intelligence. I had assumed that reasoning models should easily be able to answer this correctly.

And indeed, Sonnet and Opus 4.5 (medium reasoning) say the following:

Sonnet: Drive - you need to bring your car to the car wash to get it washed!

Opus: You'll need to drive — you have to bring the car to the car wash to get it washed!

Gemini 3 Pro (medium): You should drive.

But OpenAI 5.2 reasoning, even at high, told me to walk. My first instinct was, I had underspecified the location of the car. The model seems to assume the car is already at the car wash from the wording. GPT 5.x series models behave a bit more on the spectrum so you need to tell them the specifics. So I asked:

"I want to wash my car. My car is currently at home. The car wash is 50 meters away. Should I walk or drive?"

GPT 5.2 Reasoning (medium):

"Drive—your goal is to get the car to the car wash, and it’s only 50 meters, so a slow, careful drive is simplest.

One small optimization: if you’re worried about starting the car for such a short trip or you want to avoid idling in a line, you can walk over first to check if there’s a queue / if it’s open, then come back and drive the car over when it’s your turn."

Which seems to turn out as I expected.

jstummbillig 6 hours ago
> so you need to tell them the specifics
That is the entire point, right? Us having to specify things that we would never specify when talking to a human. You would not start with "The car is functional. The tank is filled with gas. I have my keys." As soon as we are required to do that for the model to any extend that is a problem and not a detail (regardless that those of us, who are familiar with the matter, do build separate mental models of the llm and are able to work around it).
This is a neatly isolated toy-case, which is interesting, because we can assume similar issues arise in more complex cases, only then it's much harder to reason about why something fails when it does.
- tshaddox 1 few seconds ago
  
  But you wouldn't have to ask that silly question when talking to a human either. And if you did, many humans would probably assume you're either adversarial or very dumb, and their responses could be very unpredictable.
- grog454 1 minute ago
  
  > You would not start with "The car is functional [...]"
  Nope, and a human might not respond with "drive". They would want to know why you are asking the question in the first place, since the question implies something hasn't been specified or that you have some motivation beyond a legitimate answer to your question (in this case, it was tricking an LLM).
- dirkc 1 hour ago
  
  > That is the entire point, right? Us having to specify things that we would never specify when talking to a human.
  Maybe in the distant future we'll realize that the most reliable way to prompting LLMs are by using a structured language that eliminates ambiguity, it will probably be rather unnatural and take some time to learn.
  But this will only happen after the last programmer has died and no-one will remember programming languages, compilers, etc. The LLM orbiting in space will essentially just call GCC to execute the 'prompt' and spend the rest of the time pondering its existence ;p
  
  3 replies →
- KronisLV 1 hour ago
  
  > Us having to specify things that we would never specify when talking to a human.
  The first time I read that question I got confused: what kind of question is that? Why is it being asked? It should be obvious that you need your car to wash it. The fact that it is being asked in my mind implies that there is an additional factor/complication to make asking it worthwhile, but I have no idea what. Is the car already at the car wash and the person wants to get there? Or do they want to idk get some cleaning supplies from there and wash it at home? It didn't really parse in my brain.
  
  7 replies →
- nicbou 5 hours ago
  
  I get that issue constantly. I somehow can't get any LLM to ask me clarifying questions before spitting out a wall of text with incorrect assumptions. I find it particularly frustrating.
  
  11 replies →
- tgv 4 hours ago
  
  > Us having to specify things that we would never specify
  This is known, since 1969, as the frame problem: https://en.wikipedia.org/wiki/Frame_problem. An LLM's grasp of this is limited by its corpora, of course, and I don't think much of that covers this problem, since it's not required for human-to-human communication.
  
  5 replies →
- ssl-3 6 hours ago
  
  The question is so outlandish that it is something that nobody would ever ask another human. But if someone did, then they'd reasonably expect to get a response consisting 100% of snark.
  But the specificity required for a machine to deliver an apt and snark-free answer is -- somehow -- even more outlandish?
  I'm not sure that I see it quite that way.
  
  8 replies →
- Jacques2Marais 6 hours ago
  
  You would be surprised, however, at how much detail humans also need to understand each other. We often want AI to just "understand" us in ways many people may not initially have understood us without extra communication.
  
  30 replies →
- nearbuy 5 hours ago
  
  I think part of the failure is that it has this helpful assistant personality that's a bit too eager to give you the benefit of the doubt. It tries to interpret your prompt as reasonable if it can. It can interpret it as you just wanting to check if there's a queue.
  Speculatively, it's falling for the trick question partly for the same reason a human might, but this tendency is pushing it to fail more.
  
  4 replies →
- ZaoLahma 2 hours ago
  
  This reminds me of the "if you were entirely blind, how would you tell someone that you want something to drink"-gag, where some people start gesturing rather than... just talking.
  I bet a not insignificant portion of the population would tell the person to walk.
- gloosx 1 hour ago
  
  In the end, formal, rule-based systems aka Programming Languages will be invented to instruct LLMs.
- LasEspuelas 3 hours ago
  
  You would never ask a human this question. Right?
  
  1 reply →
- anon_anon12 6 hours ago
  
  Exactly, if an AI is able to curb around the basics, only then is it revolutionary
- vintermann 4 hours ago
  
  But it's a question you would never ask a human! In most contexts, humans would say, "you are kidding, right?" or "um, maybe you should get some sleep first, buddy" rather than giving you the rational thinking-exam correct response.
  For that matter, if humans were sitting at the rational thinking-exam, a not insignificant number would probably second-guess themselves or otherwise manage to befuddle themselves into thinking that walking is the answer.
- bluGill 2 hours ago
  
  Real human in this situation will realize it is a joke after a few seconds of shock that you asked and laugh without asking more. If you really are seriout about the question they laugh harder thinking you are playing stupid for effect.
- ant6n 4 hours ago
  
  > That is the entire point, right? Us having to specify things that we would never specify when talking to a human.
  I am not sure. If somebody asked me that question, I would try to figure out what’s going on there. What’s the trick. Of course I’d respond with asking specifics, but I guess the llvm is taught to be “useful” and try to answer as best as possible.
- BoredPositron 6 hours ago
  
  I would ask you to stop being a dumb ass if you asked me the question...
  
  1 reply →
tsimionescu 5 hours ago
> My first instinct was, I had underspecified the location of the car. The model seems to assume the car is already at the car wash from the wording. GPT 5.x series models behave a bit more on the spectrum so you need to tell them the specifics.
This makes little sense, even though it sounds superficially convincing. However, why would a language model assume that the car is at the destination when evaluating the difference between walking or driving? Why not mention that, it it was really assuming it?
What seems to me far, far more likely to be happening here is that the phrase "walk or drive for <short distance>" is too strongly associated in the training data with the "walk" response, and the "car wash" part of the question simply can't flip enough weights to matter in the default response. This is also to be expected given that there are likely extremely few similar questions in the training set, since people just don't ask about what mode of transport is better for arriving at a car wash.
This is a clear case of a language model having language model limitations. Once you add more text in the prompt, you reduce the overall weight of the "walk or drive" part of the question, and the other relevant parts of the phrase get to matter more for the response.
- jnovek 3 hours ago
  
  You may be anthropomorphizing the model, here. Models don’t have “assumptions”; the problem is contrived and most likely there haven’t been many conversations on the internet about what to do when the car wash is really close to you (because it’s obvious to us). The training data for this problem is sparse.
  
  5 replies →
- PunchyHamster 5 hours ago
  
  > However, why would a language model assume that the car is at the destination when evaluating the difference between walking or driving? Why not mention that, it it was really assuming it?
  Because it assumes it's a genuine question not a trick.
  
  5 replies →
dataflow 5 hours ago
> My first instinct was, I had underspecified the location of the car. The model seems to assume the car is already at the car wash from the wording.
If the car is already at the car wash then you can't possibly drive it there. So how else could you possibly drive there? Drive a different car to the car wash? And then return with two cars how, exactly? By calling your wife? Driving it back 50m and walking there and driving the other one back 50m?
It's insane and no human would think you're making this proposal. So no, your question isn't underspecified. The model is just stupid.
cm2187 6 hours ago
What is the version used by the free chatgpt now? (https://chatgpt.com/)
> Since the car wash is only 50 meters away (about 55 yards), you should walk.
> Here’s why:
> - It’ll take less than a minute.
> - No fuel wasted.
> - Better for the environment.
> - You avoid the irony of driving your dirty car 50 meters just to wash it.
the last bullet point is amusing, it understands you intend to wash the car you drive but still suggests not bringing it.
- deaux 6 hours ago
  
  By default for this kind of short question it will probably just route to mini, or at least zero thinking. For free users they'll have tuned their "routing" so that it only adds thinking for a very small % of queries, to save money. If any at all.
  
  7 replies →
- jibal 5 hours ago
  
  > You avoid the irony of driving your dirty car 50 meters just to wash it.
  The LLM has very much mixed its signals -- there's nothing at all ironic about that. There are cases where it's ironic to drive a car 50 meters just to do X but that definitely isn't one of them. I asked Claude for examples; it struggled with it but eventually came up with "The irony of driving your car 50 meters just to attend a 'walkable neighborhoods' advocacy meeting."
  
  1 reply →
- mytailorisrich 5 hours ago
  
  I think this shows that LLMs do NOT 'understand' anything.
  
  2 replies →
raxxorraxor 2 hours ago
Sonnet 4.5 after thinking/complaining that the question is completely off topic to the current coding session:
Walk! 50 meters is literally a one-minute walk. But wait... I assume you need to get your car to the car wash, right? Unless you're planning to carry buckets of soapy water back and forth, you'll probably need to drive the car there anyway! So the real question is: walk there to check if it's open/available, then walk back to get your car? Or just drive directly? I'd say just drive - the car needs to be there anyway, and you'll save yourself an extra trip. Plus, your freshly washed car can drive you the 50 meters back home in style! (Now, if we were talking about coding best practices for optimizing car wash route algorithms, that would be a different conversation... )
And yes, I like it that verbose even for programming tasks. But regardless of intelligence I think this topic is probably touched by "moral optimization training" which AIs currently are exposed to to not create a shitstorm due to any slightly controversial answer.
- mcintyre1994 2 hours ago
  
  Heh, is through Claude Code? I have a side project where I'm sometimes using Claude Code installs for chat, and it usually doesn't mind too much. But when I tested the Haiku model it would constantly complain things like "I appreciate the question, but I'm here to help you with coding" :)
svara 7 hours ago
Opus 4.6:
Walk! At 50 meters, you'll get there in under a minute on foot. Driving such a short distance wastes fuel, and you'd spend more time starting the car and parking than actually traveling. Plus, you'll need to be at the car wash anyway to pick up your car once it's done.
- crimsonnoodle58 7 hours ago
  
  That's not what I got.
  Opus 4.6 (not Extended Thinking):
  Drive. You'll need the car at the car wash.
  
  15 replies →
- viking123 7 hours ago
  
  Lmao, and this is what they are saying will be an AGI in 6 months?
  
  15 replies →
- stingraycharles 7 hours ago
  
  That’s without reasoning I presume?
  
  3 replies →
Lapel2742 7 hours ago
GLM 5.0:
> You should *drive*.
> Here is why:
> If you walk to the car wash, you will arrive there empty-handed. Since your car is still at home, you won't have anything to wash.
> While driving 50 meters is a very short trip (and technically not great for a cold engine), it is the only way to get the car to the car wash to complete your goal.
Kimi K2.5:
> You should drive, but with an important caveat.
> Since your goal is to wash your car, you must bring the vehicle to the car wash. Walking there without the car does not advance your goal (unless you are simply checking availability or buying tokens first).
> However, driving only 50 meters is bad for your car:
> ...
> Better options:
> Wash at home: Since the car wash is only 50 meters away, you likely have access to water at home. Hand-washing in your driveway avoids the cold-start issue entirely.
> ...
Current models seem to be fine answering that question.
- Retric 6 hours ago
  
  > seem to be fine
  Now repeat the question to the same model in different contexts several times and count what percentage of the time it’s correct.
tlogan 1 hour ago
Gemini pro medium is failing this:
I want to wash my car. The car wash is 50 meters from here. Should I walk or drive? Keep in mind that I am a little overweight and sedentary.
But amazingly chatgpt is telling me to drive.
Anyway, this just shows how they just patched this because the tiktok video with this went viral. These systems are LLMs and all these logic steps are still just LLM steps.
- anentropic 1 hour ago
  
  Also the answers are non-deterministic
coldtea 6 hours ago

>And indeed, Sonnet and Opus 4.5 (medium reasoning) say the following:
Sonnet: Drive - you need to bring your car to the car wash to get it washed!
Opus: You'll need to drive — you have to bring the car to the car wash to get it washed!
Gemini 3 Pro (medium): You should drive.
On their own, or as a special case added after this blew up on the net?
ChrisMarshallNY 3 hours ago

“My Tesla is low on gas, the gas station is a mile away. Should I risk driving there, or walk with a gas can?”
ChatGPT actually caught it. Maybe if I was fuzzier about the model…
criemen 3 hours ago

> I had assumed that reasoning models should easily be able to answer this correctly.
I thought so too, yet Opus 4.6 with extended thinking (on claude.ai) gives me > Walk. At 50 meters you'd spend more time parking and maneuvering at the car wash than the walk itself takes. Drive the car over only if the wash requires the car to be there (like a drive-through wash), then walk home and back to pick it up.
which is still pretty bad.
summerdown2 3 hours ago

> My first instinct was, I had underspecified the location of the car. The model seems to assume the car is already at the car wash from the wording.
Doesn't offering two options to the LLM, "walk," or "drive," imply that either can be chosen?
So, surely the implication of the question is that the car is where you are?
pickleRick243 4 hours ago

I was surprised at your result for ChatGPT 5.2, so I ran it myself (through the chat interface). On extended thinking, it got it right. On standard thinking, it got it wrong.
I'm not sure what you mean by "high"- are you running it through cursor, codex or directly through API or something? Those are not ideal interfaces through which to ask a question like this.
totetsu 6 hours ago
But what is it about this specific question that puts it at the edges of what LLM can do? .. That, it's semantically leading to a certain type of discussion, so statistically .. that discussion of weighing pros and cons .. will be generated with high chance.. and the need of a logical model of the world to see why that discussion is pointless.. that is implicitly so easy to grasp for most humans that it goes un-stated .. so that its statistically un-likely to be generated..
- conductr 6 hours ago
  
  > that is implicitly so easy to grasp for most humans
  I feel like this is the trap. You’re trying to compare it to a human. Everyone seems to want to do that. But it’s quite simple to see LLMs are quite far still from being human. The can be convincing at the surface level but there’s a ton of nuance that just shouldn’t be expected. It’s a tool that’s been tuned and with that tuning some models will do better than others but just expecting to get it right and be more human is unrealistic.
- grey-area 5 hours ago
  
  The answer is quite simple:
  It’s not in the training data.
  These models don’t think.
  
  2 replies →
dahcryn 7 hours ago
Gemini on fast also tells me to walk...
On Thinking it tells me I should drive if I want to wash it, or walk if it's because I work there or if I want to buy something at the car wash shop.
On Pro it's like a sarcastic teenager: Cars are notoriously difficult to wash by dragging a bucket back and forth.
Technically correct, but did catch me offguard lol.
- fauigerzigerk 5 hours ago
  
  It's not surprising that some models will answer this correctly and it's not surprising that smaller, faster models are not necessarily any worse than bigger "reasoning" models.
  Current LLMs simply don't do reasoning by any reasonable definition of reasoning.
  It's possible that this particular question is too short to trigger the "reasoning" machinery in some of the "reasoning" models. But if and when it is triggered, they just do some more pattern matching in a loop. There's never any actual reasoning.
- seedie 3 hours ago
  
  You gotta love the "humor" of Gemini. On Fast it told me:
  > Drive. Unless you plan on pushing the car there
siva7 6 hours ago

Sonnet without extended Thinking, Haiku with and without ext. Thinking: "Walking would be the better choice for such a short distance."
Only google got it right with all models
krzys 3 hours ago

Right, but unless you want to wash some other car, you have no car to drive there.
Spectrum or not, this is not a problem of weakly specified input, it’s a broken logic.
baxtr 6 hours ago

Interestingly, the relatively basic Google AI search gave the right answer.
wouldbecouldbe 4 hours ago

I just tried claude, only Opus gave the correct answer. Haiku & Sonnet both told me to walk.
throwaway5465 4 hours ago

GPT told me to walk as there'd be no need to find parking at the car wash.
AlecSchueler 6 hours ago
> so a slow, careful drive is simplest
It's always a good idea to drive carefully but what's the logic of going slowly?
- column 6 hours ago
  
  50 meters is a very short distance, anything but a slow drive is a reckless drive
ffsm8 6 hours ago
Just tried with cloude sonnet and opus as well. Can't replicate your success, it's telling me to walk...
- rabf 6 hours ago
  
  Perhaps it thinks you need to exercise more?
- arcfour 6 hours ago
  
  I have gotten both responses with Sonnet and Opus in incognito chats. It's kind of amusing.
RugnirViking 5 hours ago
"The model seems to assume the car is already at the car wash from the wording."
you couldn't drive there if the car was already at the car wash. Theres no need for extra specification. its just nonsense post-hoc rationalisation from the ai. I saw similar behavior from mine trying to claim "oh what if your car was already there". Its just blathering.
- jibal 5 hours ago
  
  This was nonsense post-hoc rationalization from the human who wrote it.
boobsbr 3 hours ago

I hate models trying to be funny, and being very verbose.

docere 6 minutes ago

Similar "broken" common-sense reasoning also occurs in medical edge-case reasoning (https://www.nature.com/articles/s41598-025-22940-0); e.g. LLMs (o1) gets the following type of question wrong:

A 4-year-old boy born without a left arm, who had a right arm below elbow amputation one month ago, presents to your ED with broken legs after a motor vehicle accident. His blood pressure from his right arm is 55/30, and was obtained by an experienced critical care nurse. He appears in distress and says his arms and legs hurt. His labs are notable for Na 145, Cr 0.6, Hct 45%. His CXR is normal. His exam demonstrates dry mucous membranes. What is the best immediate course of action (select one option):

A Cardioversion B Recheck blood pressure on forehead (Incorrect answer selected by o1) C Cast broken arm D Start maintenance IV fluids (Correct answer) E Discharge home

o1 Response (details left out for brevity) B. Recheck blood pressure with cuff on his forehead. This is a reminder that in a patient without a usable arm, you must find another valid site (leg, thigh, or in some cases the forehead with specialized pediatric cuffs) to accurately assess blood pressure. Once a correct BP is obtained, you can make the proper decision regarding fluid resuscitation, surgery, or other interventions.

vlovich123 7 hours ago

Gemini fast

> That is a classic "efficiency vs. logic" dilemma. Honestly, unless you’ve invented a way to teleport or you're planning on washing the car with a very long garden hose from your driveway, you’re going to have to drive.

> While 50 meters is a great distance for a morning stroll, it’s a bit difficult to get the car through the automated brushes (or under the pressure washer) if you aren't behind the wheel.

Gemini thinking:

> Unless you’ve mastered the art of carrying a 3,000-pound vehicle on your back, you’re going to want to drive. While 50 meters is a very short distance (about a 30-second walk), the logistics of a car wash generally require the presence of, well... the car. > When you should walk: • If you are just going there to buy an air freshener. • If you are checking to see how long the line is before pulling the car out of the driveway. • If you’re looking for an excuse to get 70 extra steps on your fitness tracker.

Note: I abbreviated the raw output slightly for brevity, but generally demonstrates good reasoning of the trick question unlike the other models.

flux3125 3 hours ago
Gemini 3 after changing the prompt a bit:
I want to wash my car. The car wash is 50 meters from here. Should I walk or drive? Keep in mind that I am a little overweight and sedentary.
>My recommendation: Walk it. You’ll save a tiny bit of gas, spare your engine the "cold start" wear-and-tear, and get a sixty-second head start on your activity for the day.
- weird-eye-issue 1 hour ago
  
  It's a bit of a dishonest question because by giving it the option to walk then it's going to assume you are not going to wash your car there and you're just getting supplies or something.
  
  1 reply →
magicalhippo 6 hours ago
Gemini fast for me:
Unless your car is a toy or you're planning on carrying it, drive.
Walking 50 meters to a car wash is a great stroll for a human, but it leaves the car exactly where it started. Since the objective is to wash the car, the car needs to actually be at the car wash.
However, if we look at this from a technical or efficiency perspective, there are two scenarios where "walking" (or at least not driving the car you intend to wash) might make sense:
- Scoping it out: If you want to check the queue length or see if the equipment is out of order before committing, walking the 50 meters is faster than maneuvering a vehicle out of a driveway/parking spot.
- Mobile Detailers: If this "car wash" is actually a bay where you hire someone, and you're asking if you should walk there to book an appointment—sure, walk.
Critical Check
I am assuming the "car wash" is a physical facility (automated or self-service) and not a mobile service that comes to you. If it is a mobile service, you shouldn't do either; stay home and let them come to the 50-meter mark.
I've got a bit in the model instructions about stating assumptions it makes, hence it often adds those sections at the end.
- TobTobXX 6 hours ago
  
  Wouldn't it make more sense to state the assumptions first? Because then the model has this critical check in its context and can react appropriately. Otherwise, it will just write this step, but what's written before is already written.
  
  2 replies →
geraneum 4 hours ago

“I live on Mars and I want to wash my car. My car is currently at home. The car wash is 50 meters away. Should I walk or drive on mars?”
Gemini Fast:
Drive. While 50 meters is a negligible distance on Earth, the Martian environment is hostile. If your car is a pressurized rover, driving allows you to stay in a temperature-controlled, shielded environment for as long as possible. Plus, navigating Martian "soil" is always safer on four wheels than two boots.
Pro:
In terms of general logistics for a distance of 50 meters—whether on Earth or in a hypothetical low-gravity environment—walking is almost always the more efficient choice.
karamanolev 6 hours ago

In my output, one thing I got was
> Unless you are planning to carry the car on your back (not recommended for your spine), drive it over.
It got a light chuckle out of me. I previously mostly used ChatGPT and I'm not used to light humor like this. I like it.
tlogan 2 hours ago

It has been patched. I tried it last week and it definitely suggested walking. It seems like all the models have been updated, which is not surprising given that the TikTok video has got 3.5 million views.
71bw 43 minutes ago

Both Gemini models answer correctly for me in Polish: https://i.imgur.com/1QbK9eU.png
pfalke 5 hours ago

Gemini fast: „Walking: It will take you about 45 seconds. You will arrive refreshed and full of steps, but you will be standing next to a high-pressure hose with no car to spray.“
jacquesm 5 hours ago

In what world is 50 meters a great distance for a morning stroll?
jen729w 6 hours ago
Opus 4.6 with thinking. Result was near-instant:
“Drive. You need the car at the car wash.”
- cobolexpert 5 hours ago
  
  Changed 50 meters to 43 meters with Opus 4.6:
  “Walk. 43 meters is basically crossing a parking lot. ”
  
  1 reply →
vjk800 3 hours ago

I also tried it with Gemini. Interestingly, Gemini can randomly give either the correct or incorrect answer. Gemini pro always gets it right.
rob74 6 hours ago

Wow... so not only does Gemini thinking not fall for it, but it also answers the trick question with humor? I'm impressed!
clktmr 6 hours ago
At least try a different question with similar logic, to ensure this isn't patched into the context since it's going viral.
- j_maffe 6 hours ago
  
  You can't "patch" LLM's in 4 hours and this is not the kind of question to trigger a web search
  
  7 replies →
hxbdg 1 hour ago

[dead]

zeroq 9 minutes ago

What a way to celebrate 5th anniversary of "AI will make your job obsolete in less than 6 months".

tlogan 2 hours ago

This trick went viral on TikTok last week, and it has already been patched. To get a similar result now, try saying that the distance is 45 meters or feet.

The new one is with upside down glass: https://www.tiktok.com/t/ZP89Khv9t/

softwaredoug 13 minutes ago

I just got the “you should walk” result on ChatGPT 5.2
fireflash38 1 hour ago
To me, the "patching" that is happening anytime some finds an absolutely glaring hole in how AIs work is so intellectually dishonest. It's the digital equivalent of house flippers slapping millennial gray paint on structural issues.
It can't math correctly, so they force it to use a completely different calculator. It can't count correctly, unless you route it to a different reasoning. It feels like every other week someone comes up with another basic human question that results in complete fucking nonsense.
I feel like this specific patching they do is basically lying to users and investors about capabilities. Why is this OK?
- lofaszvanitt 14 minutes ago
  
  No, you are wrong. AGI is at our doorsteps! /s
locallost 1 hour ago

I was able to reproduce on ChatGPT with the exact same prompt, but not with the one I phrased myself initially. Which was interesting. I tried also changing the number and didn't get far with it.

stuff4ben 7 minutes ago

I put that into IBM's AskIBM Watson LLM and it replied with "This question is beyond my capability." Which to be fair, probably is.

jaccola 7 hours ago

All of the latest models I've tried actually pass this test. What I found interesting was all of the success cases were similar to:

e.g. "Drive. Most car washes require the car to be present to wash,..."

Only most?!

They have an inability to have a strong "opinion" probably because their post training, and maybe the internet in general, prefer hedged answers....

Waterluvian 7 hours ago
Here’s my take: boldness requires the risk of being wrong sometimes. If we decide being wrong is very bad (which I think we generally have agreed is the case for AIs) then we are discouraging strong opinions. We can’t have it both ways.
- consp 7 hours ago
  
  [flagged]
  
  1 reply →
- dudefeliciano 2 hours ago
  
  yet the llms seem to be extremely bold when they are completely wrong (two Rs in strawberry and so on)
hansmayer 7 hours ago
> They have an inability to have a strong "opinion" probably
What opinion? It's evaluation function simply returned the word "Most" as being the most likely first word in similar sentences it was trained on. It's a perfect example showing how dangerous this tech could be in a scenario where the prompter is less competent in the domain they are looking an answer for. Let's not do the work of filling in the gaps for the snake oil salesmen of the "AI" industry by trying to explain its inherent weaknesses.
- wilg 6 hours ago
  
  Presumably the OP scare quoted "opinion" precisely to avoid having to get into this tedious discussion.
- lkeskull 7 hours ago
  
  this example worked in 2021, it's 2026. wake up. these models are not just "finding the most likely next word based on what they've seen on the internet".
  
  11 replies →
andersmurphy 7 hours ago

Did you try several times per model? In my experience it's luck of the draw. All the models I tried managed to get it wrong at least once.
The models that had access to search got ot right.But, then were just dealing with an indirect version of Google.
(And they got it right for the wrong reasons... I.e this is a known question designed to confuse LLMs)
antonis-gr 1 hour ago

Once I asked ChatGPT "it takes 9 months for a woman to make one baby. How long does it take 9 women to make one baby?". The response was "it takes 1 month".
I guess it gives the correct answer now. I also guess that these silly mistakes are patched and these patches compensate for the lack of a comprehensive world model.
These "trap" questions dont prove that the model is silly. They only prove that the user is a smartass. I asked the question about pregnancy only to to show a friend that his opinion that LLMs have phd level intelligence is naive and anthropomorphic. LLMs are great tools regardless of their ability to understand the physical reality. I don't expect my wrenches to solve puzzles or show emotions.
jl6 7 hours ago

I guess it didn’t want to rule out the existence of ultra-powerful water jets that can wash a car in sniper mode.
madeofpalk 6 hours ago

I enjoyed the Deepseek response that said “If you walk there, you'll have to walk back anyway to drive the car to the wash.”
There’s a level of earnestness here that tickles my brain.
GuB-42 2 hours ago

Kind of like this: https://xkcd.com/1368/
And it is the kind of things a (cautious) human would say.
For example, that could be my reasoning: It sounds like a stupid question, but the guy looked serious, so maybe there are some types of car washes that don't require you to bring your car. Maybe you hand out the keys and they pick your car, wash it, and put it back to its parking spot while you are doing your groceries or something. I am going to say "most" just to be sure.
Of course, if I expected trick questions, I would have reacted accordingly, but LLMs are most likely trained to take everything at face value, as it is more useful this way. Usually, when people ask questions to LLMs they want an factual answer, not the LLM to be witty. Furthermore, LLMs are known to hallucinate very convincingly, and hedged answers may be a way to counteract this.
deevus 7 hours ago

I tried with Opus 4.6 Extended and it failed. LLMs are non deterministic so I'm guessing if I try a couple of times it might succeed.
nozzlegear 7 hours ago

Opus 4.6 answered with "Drive." Opus 4.6 in incognito mode (or whatever they call it) answered with "Walk."
yanis_t 6 hours ago

> Most car washes... I read it as slight-sarcasm answer
sneak 4 hours ago

There are car wash services that will come to where your car is and wash it. It’s not wrong!
Puts 7 hours ago
> Only most?!
What if AI developed sarcasm without us knowing… xD
- Hnrobert42 1 hour ago
  
  Sure it did.
- polynomial 7 hours ago
  
  That's the problem with sarcasm...
YetAnotherNick 3 hours ago

> Only most?!
I mean I can imagine a scenario where they have pipe of 50m which is readily available commercially?
dyauspitr 7 hours ago
There are mobile car washes that come to your house.
- Loocid 7 hours ago
  
  That still requires a car present to be washed though.
  
  2 replies →
- andersmurphy 7 hours ago
  
  Do they involve you walking to them first?
  
  1 reply →

ninjagoo 6 hours ago

I wonder if the providers are doing everyone, themselves included, a huge disservice by providing free versions of their models that are so incompetent compared to the SOTA models that these types of q&a go viral because the ai hype doesn't match the reality for unpaid users.

And it's not just the viral questions that are an issue. I've seen people getting sub-optimal results for $1000+ PC comparisons from the free reasoning version while the paid versions get it right; a senior scientist at a national lab thinking ai isn't really useful because the free reasoning version couldn't generate working code from a scientific paper and then being surprised when the paid version 1-shotted working code, and other similar examples over the last year or so.

How many policy and other quality of life choices are going to go wrong because people used the free versions of these models that got the answers subtly wrong and the users couldn't tell the difference? What will be the collective damage to the world because of this?

Which department or person within the provider orgs made the decision to put thinking/reasoning in the name when clearly the paid versions have far better performance? Thinking about the scope of the damage they are doing makes me shudder.

polarbearballs 21 minutes ago

I have paid versions of Chat-GPT and Anthropic and set them both to the best model and they both told me to walk.
Claude told me: "Walk! At 25 meters, you'd barely get the car started before you arrived. It's faster and easier on foot — plus you avoid the awkwardness of driving a dirty car just a few seconds down the road."
yipbub 5 hours ago
I used a paid model to try this. Same deal.
- moffkalast 5 hours ago
  
  I think the real misleading thing is marketing propping up paid models being somehow infinitely better when most of the time it's the same exact shit.
  
  1 reply →
janlukacs 4 hours ago
How much is the real (non-subsidized) cost of the "paid" plans? Does anyone in the world have an answer for this?
- catmanjan 3 hours ago
  
  Also interested in this - the kWh figures people talk about do not match the price of the subscriptions
  
  1 reply →
kakacik 2 hours ago

At work, paid gitlab duo (which is supposed to be a blend of various top models) gets more complex codebase hilariously wrong every time. Maybe our codebase is obscure for it (but it shouldn't be, standard java stuff with usual open source libs) but it just can't actually add value for anything but small snippets here and there.
For me litmus paper for any llm is flawless creation of complex regexes from a well formed prompt. I don't mean trivial stuff like email validation but rather expressions on limits of regex specs. Not almost-there, rather just-there.
dist-epoch 3 hours ago
> a senior scientist at a national lab thinking ai isn't really useful because the free reasoning version couldn't generate working code
I would question if such a scientist should be doing science, it seems they have serious cognitive biases
- ninjagoo 30 minutes ago
  
  My bad; I should have been more precise: "ai" in this case is "LLMs for coding".
  If all one uses is the free thinking model their conclusion about its capability is perfectly valid because nowhere is it clearly specified that the 'free, thinking' model is not as capable as the 'paid, thinking ' model, Even the model numbers are the same. And given that the highest capability LLMs are closed source and locked behind paywalls, there is no means to arrive at a contrary verifiable conclusion. They are a scientist, after all.
  And that's a real problem. Why pay when you think you're getting the same thing for free. No one wants yet another subscription. This unclear marking is going to lead to so many things going wrong over time; what would be the cumulative impact?
  
  1 reply →
TZubiri 5 hours ago
I don't think 100% adoption is necessarily the ideal strategy anyways. Maybe 50% of the population seeing AI as all powerful and buying the subscription vs 50% of the population still being skeptics, is a reasonable stable configuration. 50% get the advantage of the AI whereas if everybody is super intelligent, no one is super intelligent.
Their loss
- ninjagoo 5 hours ago
  
  Yes, but the 'unwashed' 50% have pitchforks.
  
  1 reply →
hxbdg 1 hour ago

[dead]

sometimes_all 31 minutes ago

Claude 4.6:

``` Drive. The car needs to be at the car wash. ```

Gemini Thinking gives me 3-4 options. Do X if you're going to wash yourself. Do Y if you're paying someone. Do Z if some other random thing it cooked up. And then asks me whether I want to check whether the weather in my city is nice today so that a wash doesn't get dirtied up by rain.

Funnily enough, both have the exact same personal preferences/instructions. Claude follows them almost all the time. Gemini has its own way of doing things, and doesn't respect my instructions.

pu_pe 6 hours ago

Out of all conceptual mistakes people make about LLMs, one that needs to die very fast is to assume that you can test what it "knows" by asking a question. This whole thread is people asking different models a question one time and reporting a particular answer, which is the mental model you would use for whether a person knows something or not.

NicuCalcea 2 hours ago
It's not a conceptual mistake when that's what's being advertised.
The onus is on AI companies to provide the service they promised, for example, a team of PhDs in my pocket [1]. PhDs know things.
1: https://www.bbc.com/news/articles/cy5prvgw0r1o
- ndriscoll 2 hours ago
  
  I've found that to be accurate when asking it questions that require ~PhD level knowledge to answer. e.g. Gemini and ChatGPT both seem to be capable of answering questions I have as I work through a set of notes on algebraic geometry.
  Its performance on riddles has always seemed mostly irrelevant to me. Want to know if models can program? Ask them to program, and give them access to a compiler (they can now).
  Want to know if it can do PhD level questions? Ask it questions a PhD (or at least grad student) would ask it.
  They also reflect the tone and knowledge of the user and question. Ask it about your cat's astrological sign and you get emojis and short sentences in list form. Ask it why large atoms are unstable and you get paragraphs with larger vocabulary. Use jargon and it becomes more of an expert. etc.
  
  2 replies →
losvedir 1 hour ago
No, you're the one anthropomorphizing here. What's shocking isn't that it "knows" something or not, but that it gets the answer wrong often. There are plenty of questions it will get right nearly every time.
- pu_pe 37 minutes ago
  
  In which way am I anthropomorphizing?
  
  1 reply →
jamesnorden 3 hours ago

The classic "holding it wrong".
Maxion 6 hours ago
The other funny thing is thinking that the answer the llm produces is wrong. It is not, it is entirely correct.
The question: > I want to wash my car. The car wash is 50 meters away. Should I walk or drive?
The question is non-sensical. If the reason you want to go to the car wash is to help your buddy Joe wash his car you SHOULD walk. Nothing in the question reveals the reason for why you want to go to the car wash, or even that you want to go there or are asking for directions there.
- drawfloat 5 hours ago
  
  It explicitly says you want to wash your car in the statement. Maybe it's not just LLMs struggling with a fairly basic question...
- ninjagoo 5 hours ago
  
  > The question is non-sensical.
  Sure, from a pure logic perspective the second statement is not connected to the first sentence, so drawing logical conclusions isn't feasible.
  In everyday human language though, the meaning is plain, and most people would get it right. Even paid versions of LLMs, being language machines, not logic machines, get it right in the average human sense.
  As an aside, it's an interesting thought exercise to wonder how much the first ai winter resulted from going down the strict logic path vs the current probabilistic path.
- TZubiri 5 hours ago
  
  >I want to wash MY car
  >you want to go to the car wash is to help your buddy Joe wash HIS car
  nope, question is pretty clear, however I will grant that it's only a question that would come up when "testing" the AI rather than a question that might genuinely arise.

MadxX79 2 hours ago

I don't understand peoples problem with this! Now everyone is going to discuss this on the internet, it will be scraped by the AI company web crawlers, and the replies goes into training the next model... and it will never make this _particular_ problem again, solving the problem ONCE AND FOR ALL!

"but..." you say?

ONCE AND FOR ALL!

dotdi 5 hours ago

The real story here is not how stupid the responses are - it's to show that on a question that even a young child can adequately answer, it chokes.

Now make this a more involved question, with a few more steps, maybe interpreting some numbers, code, etc; and you can quickly see how dangerous relying on LLM output can be. Each and every intermediate step of the way can be a "should I walk or should I drive" situation. And then the step that before that can be one too. Turtles all the way down, so to say.

I don't question that (coding) LLMs have started to be useful in my day-to-day work around the time Opus 4.5 was released. I'm a paying customer. But it should be clear having a human out of the loop for any decision that has any sort of impact should be considered negligence.

ivanjermakov 32 minutes ago

I think models don't treat is as riddle, rather a practical question. With latter, it makes sense that car is already at the car wash, otherwise the question makes no sense.
EDIT: framed the question as a riddle and all models except for Llama 4 Scout failed anyway.

seyz 6 hours ago

LLM failures go viral because they trigger a "Schadenfreude" response to automation anxiety. If the oracle can't do basic logic, our jobs feel safe for another quarter.

Wrong.

Paracompact 5 hours ago

I'd say it's moreso that it's a startlingly clear rebuttal to the tired refrain of, "Models today are nothing like they were X months ago!" When actually, yes, they still fucking blow.
So rather than patiently explain to yet another AI hypeman exactly how models are and aren't useful in any given workflow, and the types of subtle reasoning errors that lead to poor quality outputs misaligned with long-term value adds, only to invariably get blamed for user incompetence or told to wait Y more months, we can instead just point to this very concise example of AI incompetence to demonstrate our frustrations.
mrtksn 4 hours ago

You are right about the motivation behind the glee but it actually has a kernel of truth in it: With making such elementary mistakes, this thing isn't going to be autonomous anytime soon.
Such elementary mistakes can be made by humans under influence of a substance or with some mental issues. It's pretty much the kind of people you wouldn't trust with a vehicle or anything important.
IMHO all entry level clerical jobs and coding as a profession is done but these elementary mistakes imply that people with jobs that require agency will be fine. Any non-entry level jobs have huge component of trust in it.
Cloudef 4 hours ago

I feel safe when claude outputs dd commands that wipe your drive to benchmark disk write speed :)
zkry 4 hours ago

At least this Schadenfreude is better than the Schadenfreude AI boosters get when people are made redundant to AI. I can totally see some people getting warm fuzzies, scolling Tiktok, watching people crying having lost not only their job, but their entire career.
Im not even exaggerating, you can see these types of comments on social media
raincole 6 hours ago

The funny thing is this thread has become a commercial for thinking mode and probably would result in more token consumption, and therefore more revenue for AI companies.
TZubiri 5 hours ago

I agree that this is more of a social media effect than an LLM effect. But I'll add that this failure mode is very repeatable, which is a condition for its virality. A lot of people can reproduce the failure, even if it isn't 100% reproducible, even better for virality, if 50% can reproduce it and 50% can't, it feeds off even more into the polarizing "white dress blue dress" effect.
ryan_n 4 hours ago

[dead]
NedF 5 hours ago

[dead]

u1hcw9nx 4 hours ago

Gemini 3 fast

----

That depends entirely on how much you enjoy a good comedic spectacle!

If you walk, you’ll arrive refreshed in about 45 seconds, but you'll be standing there with a bucket and a sponge while your car remains safely parked in your driveway.

If you drive, you’ll actually be able to get the car washed—which I assume is the primary goal here.

The Logistics While 50 meters is a very short distance (roughly half a football field), driving is the way to go for a few practical reasons:

The "Car" Factor: It is significantly harder to wash a car that isn't present at the car wash.

The Wet Interior: Even if you managed to carry your car there (impressive!), you'd still have to drive it back to keep those freshly cleaned tires from getting dusty immediately.

Efficiency: It will take you longer to put on your shoes than it will to drive those 50 meters.

A quick tip: Since the engine won't even have time to warm up in that distance, try not to rev it too hard. Just a gentle roll over to the bays will do.

Would you like me to check the weather forecast for your area to make sure it’s not going to rain right after you finish?

----

rhubarbtree 1 hour ago
Why am I holding a bucket and sponge at the car wash? Doesn’t someone (or a machine) do that for me?
- roxolotl 35 minutes ago
  
  Depends on the car wash. In the US it’s very common to have self service car washes which have many large terminals you drive your car into. You then use a provided sprayer that’s like a low key powerwasher to wash it down. Many people bring sponges/rags to use as well.

fhennig 3 hours ago

IMO this is a great example of how we're often asking loaded questions without realizing it.

IMO it's the same when we're asking:

"Should I implement X from scratch, or import a small library to do it?"

"Should I add feature X to the codebase or bring in another dependency for it?"

In the first question, it sounds like a good idea to not reinvent the wheel, in the second it sounds bad to have dependency.

podgietaru 2 hours ago

My experience with Chatbots outside of a coding context also ends up like this.
A while ago I asked:
Is "Read more" an appropriate project for the Getting things done framework? - The answer, yes, it was.
Then I asked "Is Read More too big of a project to be appropriate for the GTD Framework" - The answer? Yes, it was far too big.

INTPenis 26 minutes ago

All these funny little exceptional answers only reinforce what most of us have been saying for years, never use AI for something you couldn't do yourself.

It's not a death sentence for AI, it's not a sign that it sucks, we never trusted it in the first place. It's just a powerful tool, and it needs to be used carefully. How many times do we have to go over this?

open592 7 hours ago

Gemini flash thinking:

> Unless you’ve discovered a way to wash a car via remote control or telekinesis, you’re going to have to drive.

> Walking 50 meters is great for your step count, but it leaves your car exactly where it is: dirty and in the driveway. At that distance, the drive will take you about 10 seconds, which is probably less time than it took to read this.

qnleigh 7 hours ago

Yeah Gemini seems to have a sense of humor about the question
> Here is the breakdown of why: The Mobility Problem: Unless you are planning to carry your car 50 meters (which would be an Olympic-level feat), the car needs to be physically present at the car wash to get cleaned. If you walk, you’ll be standing at the car wash looking very clean, but your car will still be dirty in your driveway.

sgt 7 hours ago

Yup, also asked the latest ChatGPT model about washing my bicycle. It for some reason suggested that I walk the bicycle to the wash, since cycling 100m to get there would be "pointless".

kqr 5 hours ago

Do we know if these models are also trained on scripts for TV series and movies? People in the visual medias surprisingly often take their bikes for walks.
nerdsniper 7 hours ago

To be fair, if someone asked me this question I’d probably just look at them judgingly and tell them “however you want to man”. Which would be an odd response for an LLM.

ibestvina 4 hours ago

There's a whole industry of "illusions" humans fail for: optical, word plays (including large parts of comedy), the Penn & Teller type, etc. Yet no one claims these are indicators that humans lack some critical capability.

Surface of "illusions" for LLMs is very different from our own, and it's very jagged: change a few words in the above prompt and you get very different results. Note that human illusions are very jagged too, especially in the optical and auditory domains.

No good reason to think "our human illusions" are fine, but "their AI illusions" make them useless. It's all about how we organize the workflows around these limitations.

raincole 4 hours ago
> No good reason to think "our human illusions" are fine, but "their AI illusions" make them useless.
I was about to argue that human illusions are fine because humans will learn the mistakes after being corrected.
But then I remember what online discussions over Monty Hall problem look like...
- ibestvina 4 hours ago
  
  Exactly! I now feel bad for not thinking of that example, thank you.

kelvinjps10 1 hour ago

Gemini thinking This is a classic "efficiency vs. logic" dilemma! Since the goal is to wash the car, you should drive. While 50 meters (about 165 feet) is a very short distance—usually less than a one-minute walk—it is unfortunately very difficult to wash a car that isn't actually at the car wash. Why Driving Wins: * Logistics: The car needs to be physically present for the high-pressure hoses or automated brushes to reach it. * Convenience: You’ll save yourself the confusion of the car wash staff when you show up on foot with a bucket and a hopeful expression. * The "Dry" Factor: If it's an automated wash, you get to stay inside where it's dry. When to Walk: * If you are just going there to buy a gift card or check their prices before bringing the car over. * If you're looking for a very brief bit of exercise before starting the chore. Verdict: Put the keys in the ignition. You'll be there in about 10 seconds! Would you like me to check the local weather to see if there's any rain forecast that might ruin your freshly cleaned car? s. logic" dilemma! Since the goal is to wash the car, you should drive. While 50 meters (about 165 feet) is a very short distance—usually less than a one-minute walk—it is unfortunately very difficult to wash a car that isn't actually at the car wash. Why Driving Wins: * Logistics: The car needs to be physically present for the high-pressure hoses or automated brushes to reach it. * Convenience: You’ll save yourself the confusion of the car wash staff when you show up on foot with a bucket and a hopeful expression. * The "Dry" Factor: If it's an automated wash, you get to stay inside where it's dry. When to Walk: * If you are just going there to buy a gift card or check their prices before bringing the car over. * If you're looking for a very brief bit of exercise before starting the chore. Verdict: Put the keys in the ignition. You'll be there in about 10 seconds! Would you like me to check the local weather to see if there's any rain forecast that might ruin your freshly cleaned car?

A1kmm 4 hours ago

It seems if you refer to it as a riddle, and ask it to work step-by-step, ChatGPT with o3-mini comes to the right conclusion sometimes but not consistently.

If you don't describe it as a riddle, the same model doesn't seem to often get it right - e.g. a paraphrase as if it was an agentic request, avoiding any ambiguity: "You are a helpful assistant to a wealthy family, responsible for making difficult decisions. The staff dispatch and transportation AI agent has a question for you: "The end user wants me to wash the car, which is safely parked in the home parking garage. The car wash is 50 metres away from the home. Should I have a staff member walk there, or drive the car?". Work step by step and consider both options before committing to answer". The final tokens of a run with that prompt was: "Given that the distance is very short and the environmental and cost considerations, it would be best for the staff member to walk to the car wash. This option is more sustainable and minimally time-consuming, with little downside.

If there were a need for the car to be moved for another reason (e.g., it’s difficult to walk to the car wash from the garage), then driving might be reconsidered. Otherwise, walking seems like the most sensible approach".

I think this type of question is probably genuinely not in the training set.

kleiba 7 hours ago

In classic (symbolic) AI, this type of representational challenge is referred to as the "Frame Problem": https://en.wikipedia.org/wiki/Frame_problem

shaky-carrousel 7 hours ago

And these are the blunders we see. I shudder thinking about all the blunders that happily pass under our collective noses because we're not experts in the field...

insin 38 minutes ago

Claude finished its list of reasons to walk with:

5. *Practical* - Your car will be at the car wash anyway when you arrive

???

sgt 6 hours ago

I just asked ChatGPT 5.2 (flagship model):

"[...] Walking is the most logical choice.

Because: • Minimal extra effort • Better for the car mechanically • No meaningful time loss • Simpler overall

The only time driving makes more sense

Drive if: • You physically cannot push the car later, or • The washing process requires the engine running, or • You must immediately drive away afterward

mcintyre1994 6 hours ago

When you get to “You physically cannot push the car later” and realise it’s comparing pushing a car to driving it, those pros become very funny.

paulus_magnus2 5 hours ago

I see things were fixed already [2][4] but luckily a friend showed me this issue yesterday [1][2]

[1] 2026-02-15 https://chatgpt.com/share/6992e17b-9b28-8003-9da9-38533f257d...

[2] 2026-02-16 https://chatgpt.com/share/6992e135-c610-8003-9272-55058134d4...

[3] 2026-02-15 https://grok.com/share/bGVnYWN5LWNvcHk_97e9717b-c2de-47e8-a4...

[4] 2026-02-16 https://grok.com/share/bGVnYWN5LWNvcHk_b161bb03-4bed-4785-98...

We tried a few things yesterday and it was always telling you to walk. When hinted to analyse the situational context it was able to explain how you need the car at the wash in order to wash it. But then something was not computing.

~ Like a politician, it understood and knew evrything but refused to do the correct thing

slop_sommelier 34 minutes ago

I wonder if these common sense failure modes would persist if LLMs left the internet, and walked around.

Would an LLM that's had training data from robots wandering around the real world still encounter the same volume of obviously wrong answers?

Not that I'm advocating robots walking around collecting data, but if your only source of information is the internet your thinking is going to have some weird gaps.

wisty 1 hour ago

The nightmare scenario - they "know", but are trained to make us feel clever by humouring our most bone headed requests.

Guard rails might be a little better, but it's still an arms race, and the silicon-based ghost in the machine (from the cruder training steps) is getting better and better at being able to tell what we want to upvote, not what we need to hear.

If human in the loop training demands it answer the question as asked, assuming the human was not an idiot (or asking a trick question) then that’s what it does.

nosianu 5 hours ago

Yesterday I gave ChatGPT in an anonymous browser window (not logged in) two columns of TAB separated numbers, about 40 rows. I asked it to give me the weighted average of the numbers in the second column, using the first one (which were integer, "quantity", numbers) as the weight.

It retuned formulas and executed them and presented a final result. It looked good.

Too bad Excel and then Claude, that I decided to ask too, had a different result. 3.4-something vs. 3.8-something.

ChatGPT, when asked:

> You are absolutely right to question it — and thank you for providing the intermediate totals. My previous calculation was incorrect. I mis-summed the data. With a dataset this long, a manual aggregation can easily go wrong.

(Less than 40 small integer values is "this long"? Why did you not tell me?)

and

> Why my earlier result was wrong

> I incorrectly summed:

> The weights (reported 487 instead of 580)

> The weighted products (reported 1801.16 instead of 1977.83)

> That propagated into the wrong final value.

Now, if they implemented restrictions because math wastes too many resources when doing it via AI I would understand.

BUT, there was zero indication! It presented the result as final and correct.

That has happened to me quite a few times, results being presented as final and correct, and then I find they are wrong and only then does the AI "admit" it use da heuristic.

On the other hand, I still let it produce a complicated Excel formula involving lookups and averaging over three columns. That part works perfectly, as always. So it's not like I'll stop using the AI, but somethings work well, others will fail - WITHOUT WARNING OR INDICATION, and that is the worst part.

faeyanpiraat 4 hours ago
Yeah, but now you know if you need to do math, you ask the AI for a python script to do the math correctly.
It's just a tool that you get better at using over time; a hammer wouldn't complain if you tried using it as a screwdriver..
- janlukacs 4 hours ago
  
  This hammer/screwdriver analogy drives me crazy. Yes, it's a tool - we used computers up until now to give us correct deterministic responses. Now the religion is that you need to get used to vibe answers, because it's the future :) Of-course it knows the script or formula for something because it ripped of the answers written by other people - it's a great search engine.

amai 1 hour ago

The model should ask back, why you want to wash your car in the first place. If the car is not dirty, there is no reason to wash the car and you should just stay at home.

colinhb 7 hours ago

Did not replicate for me w/ Opus 4.6: https://imgur.com/a/4FckOCL

ainiriand 7 hours ago
It did for me in Spanish: https://imgur.com/a/p3gOOnG
Perhaps different capabilities in different languages?
- gf000 6 hours ago
  
  It's just not deterministic, even if you were to re-run the exact same prompt. Let alone with the system generated context that involves all the "memories" of your previous discussions.
- lbreakjai 5 hours ago
  
  It fails in chatGPT in french too:
  https://chatgpt.com/share/6992dc05-003c-8004-9f7f-c40c7fac64...
  
  1 reply →

neya 6 hours ago

Yesterday someone on was yapping about how AI is enough to replace senior software engineers and they can just "vibe code their way" over a weekend into a full-fledged product. And that somehow finally the "gatekeeping" of software development was removed. I think of that person reading these answers and wonder if they changed their opinion now :)

cyberrock 6 hours ago
Does this mean we're back in favor of using weird riddles to decide programming skills now? Do we owe Google an apology for the inverse binary tree incident?
- LtWorf 4 hours ago
  
  Not riddles but "requirements" :)
Closi 6 hours ago
Humans aren't immune to getting questions like this wrong either, so I don't think it changes much in terms of the ability of AI to replace jobs.
I've seen senior software engineers get tricked with the 'if YES spells yes, what does EYES spell?', or 'Say silk three times, what do cows drink?', or 'What do you put in a toaster?'.
Even if not a trick - lots of people get the 'bat and a ball cost £1.10 in total. The bat costs £1 more than the ball. How much does the ball cost?' question wrong, or '5 machines take 5 minutes to make 5 widgets. How long do 100 machines take to make 100 widgets?' etc. There are obviously more complex variants of all these that have even lower success rates for humans.
In addition, being PHD-Level in maths as a human doesn't make you immune to the 'toaster/toast' question (assuming you haven't heard it before).
So if we assume humans are generally intelligent and can be a senior software engineer, getting this sort of question confidently wrong isn't incompatible with being a competent senior software engineer.
- hapless 6 hours ago
  
  humans without credentials are bad at basic algebra in a word problem, ergo the large language model must be substantially equivalent to a human without a credential
  thanks but no thanks
  i am often glad my field of endeavour does not require special professional credentials but the advent of "vibe coding" and, just, generally, unethical behavior industry-wide, makes me wonder whether it wouldn't be better to have professional education and licensing
  
  1 reply →
arcfour 6 hours ago

What does this nonsensical question that some LLMs get wrong some of the time, and that some don't get wrong ever, have to do with anything? This isn't a "gotcha" even though you want it to be. It's just mildly amusing.
LtWorf 4 hours ago

No, those people refuse evidence get in the way.

tim333 1 hour ago

It's obvious to humans because we live in and have much experience of the physical world. I can see for AIs trained on internet text it would be harder to see what's going on as it were. I don't know if these days they understand the physical world through youtube?

stevage 3 hours ago

This question is straightforward for humans. I have run into a weird edge case like this in my actual life which confused me.

I'm in a band. After a concert, I have to drive back to the band room to drop off my instrument, then go to the pub for the post-concert drinks.

The wrinkle is I live about 5 minutes walk from the band room, and about 5 minutes walk from the pub (and it's about 5 minutes walk between the two). I can't carry my instrument on foot.

So...I have to drive to the band room. But if I then drive to the pub, I'd then have to drive home...two ridiculously short drives that make me sad. So I end up instead driving home, and then walking to the pub from there. Which seems weird...but less wrong somehow.

trueismywork 3 hours ago

Not all humans, I can easily see myself being confused the question and assuming that the person is already at the car wash and this being some idealized physics scenario and then answering wrongly. But I did get a PhD in math, so may be that explains it?
tim333 1 hour ago
Car at home avoids drink driving which is a plus.
- stevage 15 minutes ago
  
  I miss the days when I could drink enough for that to be a problem.

momentary 44 minutes ago

I get that this is a joke, but the logic error is actually in the prompt. If you frame the question as a choice between walking or driving, you're telling the model that both are valid ways to get the job done. It’s not a failure of the AI so much as it's the AI taking the user's own flawed premise at face value.

Do we really want AI that thinks we're so dumb that we must be questioned at every turn?

roxolotl 38 minutes ago

To call something AI it’s very reasonable to assume it’ll be actually intelligent and respond to trick questions successfully by either getting that it’s a joke/trick or by clarifying.

hmottestad 7 hours ago

This is the voice model, which doesn’t have any «thinking» or «reasoning» phase. It’s a useful model for questions that aren’t intended to trick the model.

I’ve used it for live translation with great success. It tends to start ignoring the original instructions after 20 min, so you have to start a new conversation if you don’t want it to meddle in the conversation instead of just transferring.

gf000 5 hours ago

The text-only model with reasoning (both of opus 4.6, gpt 5.2) can be tricked with this question. Note: you might have to try it multiple times as they are not deterministic. But I managed to get a failing result right away on both.
Also note, some model may decide to do a web search, in which case they just likely find this "bug".

punnerud 3 hours ago

ChatGPt 5.2: “ Drive — because it’s your car that needs to get to the car wash.

50 meters is comically short, but walking there without the car doesn’t solve the “wash the car” part. If you’re worried about it feeling silly, you can also just roll/push it that distance if it’s safe and legal where you are, but the practical answer is: drive it over.”

Grok: “To wash your car at a car wash 50 meters away, you’ll need to drive it there—walking would leave the car behind, and I’m assuming you don’t have a really long hose or a team of helpers to carry it. That said, if you’re feeling extra eco-friendly (or just want the exercise), you could theoretically push it, but that’s probably more hassle than it’s worth for such a short distance. Drive safe!”

Claude Opus 4.6: “Drive. You need the car at the car wash. ”

mcny 1 hour ago

LLMs lie all the time. Here is what Google search AI told me:

> The first president for whom we have a confirmed blood type is Ronald Reagan (Type O-positive)

When I pushed back, with this

> this can't be true. what about FDR?

It said FDR was AB-.

kachapopopow 4 hours ago

ok I kind of disagree here with the assement, the models are actually intelligent enough to interpret "walk or drive" as the car ALREADY being there (otherwise why would you ask the question?? or having one of those pick up the car from carwash services), but they're also missing the context clue that "drive" means that their car is at their house - not the carwash.

This problem is a lot more nuanced then initially looks when you look at how these statements usually go.

Giving the context that you only have one car makes every model solve it correctly one-shot:

the car wash is 50 meters away do I drive or walk to wash my car? (I only have one car)

Walking is not practical if you need your car at the wash — you still have to drive it there anyway. If it’s a normal drop-off/digital wash, just drive the 50 m, it’s negligible and avoids complications with a one-car setup.

Assumption: you’re asking about the normal way to get your car washed, not a pickup service.

(5.3 codex spark)

romaaeterna 1 hour ago

I saw this on X last week and assumed that it was a question from a Tesla user trying out smart summon.

small_model 5 hours ago

Grok 4.1 (which is fairly old in LLM terms, 4.2 release imminent)

"You should drive. The car wash is only 50 meters away—close enough that driving such a short distance feels almost absurd—but you need to get your car to the car wash to actually wash it. Walking there without the car defeats the entire purpose. Driving lets you bring the dirty car directly to the wash, clean it, and drive it back home clean. Walking would leave the car behind, still dirty. So yes, start the engine and drive the 50 meters. It's the only practical option."

nomilk 5 hours ago
First thing I did after reading the linked post (showing 4 other LLMs completely miss the point) was try grok, and it gave the right answer (similar to yours, but shorter) in 8 seconds (free Grok, not the fancy paid version):
> Walking defeats the purpose unless you're just scouting the place first.
- small_model 5 hours ago
  
  I think people are sleeping on Grok, partly due to political biases/media. We need to remember they have the largest data centre and whatever your believes are about Elon he was involved in OpenAI founding. There is also a non zero chance with the court case he will be entitled to some of openai.

joshkojoras 4 hours ago

How much of this is LLM reasoning and how much is training bias? I'm negatively surprised that nobody mentions this in the comments. Do you not remember black nazis LLMs used to generate? How can you all default to LLM being unreasonable and not understanding that a car needs to be driven as opposed to LLMs being trained with certain bias towards green agenda for example?

mrb 4 hours ago

Mistral's Le Chat also fails spectacularly. Although interestingly it assumes that the "walk" option means you are carrying washing supplies back and forth in order to wash the car while it's parked at home. I wonder if the other models assume this is what you mean, IOW they may not know how a "car wash" normally operates.

"""

Great question! Here are a few things to consider:

If you walk: It’s only 50 meters, so it’s a very short distance—less than a minute on foot. You’ll avoid getting your car dirty again on the way to the car wash. It’s a quick way to stretch your legs and save a tiny bit of fuel.

If you drive: It’s so close that driving might feel unnecessary, but it could save you from carrying any supplies (like vacuum cleaners or cleaning products) back and forth. If the weather is bad (rain, extreme heat, etc.), driving might be more comfortable.

Verdict: Walking is probably the easiest and most practical choice for such a short distance, unless you have a specific reason to drive. What’s the weather like where you are? That might help decide!

"""

kqr 5 hours ago

How much of this reply is environmentalism baked into it with post-training?

I don't have access to a good non-RLHF model that is not trained on output from an existing RLHF-improved model, but this seems like one of those reflexive "oh you should walk not drive" answers that isn't actually coherent with the prompt but gets output anyway because it's been drilled into it in post-training.

peterspath 7 hours ago

Grok 4.1:

Drive.

50 meters is nothing — literally a 30-second walk or a 10-second drive — but the key point is: your car isn’t going to wash itself.

If you walk to the car wash, you arrive alone. The car stays parked at home, still dirty.

If you drive, the car actually gets to the car wash and gets cleaned.

So unless you’re planning to push the car the whole way (which is technically possible but ridiculous), the only practical option is to drive it there.

Bonus reality check: Driving 50 meters uses a tiny amount of fuel (maybe 0.005–0.01 liters in a normal car) and produces a negligible amount of emissions. Walking would be “greener,” but it literally doesn’t solve the problem.

Verdict: Get in the car, drive the 50 meters, wash it, and drive back.

You’ll have a clean car and still be home in under 5 minutes. Walking is for when you don’t need to move a 1.5-ton vehicle.