I've used LLMs enough that I have a good sense of their _edges_ of intelligence. I had assumed that reasoning models should easily be able to answer this correctly.
And indeed, Sonnet and Opus 4.5 (medium reasoning) say the following:
Sonnet: Drive - you need to bring your car to the car wash to get it washed!
Opus: You'll need to drive — you have to bring the car to the car wash to get it washed!
Gemini 3 Pro (medium): You should drive.
--
But OpenAI 5.2 reasoning, even at high, told me to walk. My first instinct was, I had underspecified the location of the car. The model seems to assume the car is already at the car wash from the wording. GPT 5.x series models behave a bit more on the spectrum so you need to tell them the specifics. So I asked:
"I want to wash my car. My car is currently at home. The car wash is 50 meters away. Should I walk or drive?"
GPT 5.2 Reasoning (medium):
"Drive—your goal is to get the car to the car wash, and it’s only 50 meters, so a slow, careful drive is simplest.
One small optimization: if you’re worried about starting the car for such a short trip or you want to avoid idling in a line, you can walk over first to check if there’s a queue / if it’s open, then come back and drive the car over when it’s your turn."
That is the entire point, right? Us having to specify things that we would never specify when talking to a human. You would not start with "The car is functional. The tank is filled with gas. I have my keys." As soon as we are required to do that for the model to any extend that is a problem and not a detail (regardless that those of us, who are familiar with the matter, do build separate mental models of the llm and are able to work around it).
This is a neatly isolated toy-case, which is interesting, because we can assume similar issues arise in more complex cases, only then it's much harder to reason about why something fails when it does.
But you wouldn't have to ask that silly question when talking to a human either. And if you did, many humans would probably assume you're either adversarial or very dumb, and their responses could be very unpredictable.
> You would not start with "The car is functional [...]"
Nope, and a human might not respond with "drive". They would want to know why you are asking the question in the first place, since the question implies something hasn't been specified or that you have some motivation beyond a legitimate answer to your question (in this case, it was tricking an LLM).
> That is the entire point, right? Us having to specify things that we would never specify when talking to a human.
Maybe in the distant future we'll realize that the most reliable way to prompting LLMs are by using a structured language that eliminates ambiguity, it will probably be rather unnatural and take some time to learn.
But this will only happen after the last programmer has died and no-one will remember programming languages, compilers, etc. The LLM orbiting in space will essentially just call GCC to execute the 'prompt' and spend the rest of the time pondering its existence ;p
> Us having to specify things that we would never specify when talking to a human.
The first time I read that question I got confused: what kind of question is that? Why is it being asked? It should be obvious that you need your car to wash it. The fact that it is being asked in my mind implies that there is an additional factor/complication to make asking it worthwhile, but I have no idea what. Is the car already at the car wash and the person wants to get there? Or do they want to idk get some cleaning supplies from there and wash it at home? It didn't really parse in my brain.
I get that issue constantly. I somehow can't get any LLM to ask me clarifying questions before spitting out a wall of text with incorrect assumptions. I find it particularly frustrating.
> Us having to specify things that we would never specify
This is known, since 1969, as the frame problem: https://en.wikipedia.org/wiki/Frame_problem. An LLM's grasp of this is limited by its corpora, of course, and I don't think much of that covers this problem, since it's not required for human-to-human communication.
The question is so outlandish that it is something that nobody would ever ask another human. But if someone did, then they'd reasonably expect to get a response consisting 100% of snark.
But the specificity required for a machine to deliver an apt and snark-free answer is -- somehow -- even more outlandish?
You would be surprised, however, at how much detail humans also need to understand each other. We often want AI to just "understand" us in ways many people may not initially have understood us without extra communication.
I think part of the failure is that it has this helpful assistant personality that's a bit too eager to give you the benefit of the doubt. It tries to interpret your prompt as reasonable if it can. It can interpret it as you just wanting to check if there's a queue.
Speculatively, it's falling for the trick question partly for the same reason a human might, but this tendency is pushing it to fail more.
This reminds me of the "if you were entirely blind, how would you tell someone that you want something to drink"-gag, where some people start gesturing rather than... just talking.
I bet a not insignificant portion of the population would tell the person to walk.
But it's a question you would never ask a human! In most contexts, humans would say, "you are kidding, right?" or "um, maybe you should get some sleep first, buddy" rather than giving you the rational thinking-exam correct response.
For that matter, if humans were sitting at the rational thinking-exam, a not insignificant number would probably second-guess themselves or otherwise manage to befuddle themselves into thinking that walking is the answer.
Real human in this situation will realize it is a joke after a few seconds of shock that you asked and laugh without asking more. If you really are seriout about the question they laugh harder thinking you are playing stupid for effect.
> That is the entire point, right? Us having to specify things that we would never specify when talking to a human.
I am not sure. If somebody asked me that question, I would try to figure out what’s going on there. What’s the trick. Of course I’d respond with asking specifics, but I guess the llvm is taught to be “useful” and try to answer as best as possible.
> My first instinct was, I had underspecified the location of the car. The model seems to assume the car is already at the car wash from the wording. GPT 5.x series models behave a bit more on the spectrum so you need to tell them the specifics.
This makes little sense, even though it sounds superficially convincing. However, why would a language model assume that the car is at the destination when evaluating the difference between walking or driving? Why not mention that, it it was really assuming it?
What seems to me far, far more likely to be happening here is that the phrase "walk or drive for <short distance>" is too strongly associated in the training data with the "walk" response, and the "car wash" part of the question simply can't flip enough weights to matter in the default response. This is also to be expected given that there are likely extremely few similar questions in the training set, since people just don't ask about what mode of transport is better for arriving at a car wash.
This is a clear case of a language model having language model limitations. Once you add more text in the prompt, you reduce the overall weight of the "walk or drive" part of the question, and the other relevant parts of the phrase get to matter more for the response.
You may be anthropomorphizing the model, here. Models don’t have “assumptions”; the problem is contrived and most likely there haven’t been many conversations on the internet about what to do when the car wash is really close to you (because it’s obvious to us). The training data for this problem is sparse.
> However, why would a language model assume that the car is at the destination when evaluating the difference between walking or driving? Why not mention that, it it was really assuming it?
Because it assumes it's a genuine question not a trick.
> My first instinct was, I had underspecified the location of the car. The model seems to assume the car is already at the car wash from the wording.
If the car is already at the car wash then you can't possibly drive it there. So how else could you possibly drive there? Drive a different car to the car wash? And then return with two cars how, exactly? By calling your wife? Driving it back 50m and walking there and driving the other one back 50m?
It's insane and no human would think you're making this proposal. So no, your question isn't underspecified. The model is just stupid.
By default for this kind of short question it will probably just route to mini, or at least zero thinking. For free users they'll have tuned their "routing" so that it only adds thinking for a very small % of queries, to save money. If any at all.
> You avoid the irony of driving your dirty car 50 meters just to wash it.
The LLM has very much mixed its signals -- there's nothing at all ironic about that. There are cases where it's ironic to drive a car 50 meters just to do X but that definitely isn't one of them. I asked Claude for examples; it struggled with it but eventually came up with "The irony of driving your car 50 meters just to attend a 'walkable neighborhoods' advocacy meeting."
Sonnet 4.5 after thinking/complaining that the question is completely off topic to the current coding session:
Walk! 50 meters is literally a one-minute walk.
But wait... I assume you need to get your car to the car wash, right? Unless you're planning to carry buckets of soapy water back and forth, you'll probably need to drive the car there anyway!
So the real question is: walk there to check if it's open/available, then walk back to get your car? Or just drive directly?
I'd say just drive - the car needs to be there anyway, and you'll save yourself an extra trip. Plus, your freshly washed car can drive you the 50 meters back home in style!
(Now, if we were talking about coding best practices for optimizing car wash route algorithms, that would be a different conversation... )
And yes, I like it that verbose even for programming tasks. But regardless of intelligence I think this topic is probably touched by "moral optimization training" which AIs currently are exposed to to not create a shitstorm due to any slightly controversial answer.
Heh, is through Claude Code? I have a side project where I'm sometimes using Claude Code installs for chat, and it usually doesn't mind too much. But when I tested the Haiku model it would constantly complain things like "I appreciate the question, but I'm here to help you with coding" :)
Walk! At 50 meters, you'll get there in under a minute on foot. Driving such a short distance wastes fuel, and you'd spend more time starting the car and parking than actually traveling. Plus, you'll need to be at the car wash anyway to pick up your car once it's done.
> If you walk to the car wash, you will arrive there empty-handed. Since your car is still at home, you won't have anything to wash.
> While driving 50 meters is a very short trip (and technically not great for a cold engine), it is the only way to get the car to the car wash to complete your goal.
Kimi K2.5:
> You should drive, but with an important caveat.
> Since your goal is to wash your car, you must bring the vehicle to the car wash. Walking there without the car does not advance your goal (unless you are simply checking availability or buying tokens first).
> However, driving only 50 meters is bad for your car:
> ...
> Better options:
> Wash at home: Since the car wash is only 50 meters away, you likely have access to water at home. Hand-washing in your driveway avoids the cold-start issue entirely.
> ...
Current models seem to be fine answering that question.
I want to wash my car. The car wash is 50 meters from here. Should I walk or drive? Keep in mind that I am a little overweight and sedentary.
But amazingly chatgpt is telling me to drive.
Anyway, this just shows how they just patched this because the tiktok video with this went viral. These systems are LLMs and all these logic steps are still just LLM steps.
> I had assumed that reasoning models should easily be able to answer this correctly.
I thought so too, yet Opus 4.6 with extended thinking (on claude.ai) gives me
> Walk. At 50 meters you'd spend more time parking and maneuvering at the car wash than the walk itself takes. Drive the car over only if the wash requires the car to be there (like a drive-through wash), then walk home and back to pick it up.
I was surprised at your result for ChatGPT 5.2, so I ran it myself (through the chat interface). On extended thinking, it got it right. On standard thinking, it got it wrong.
I'm not sure what you mean by "high"- are you running it through cursor, codex or directly through API or something? Those are not ideal interfaces through which to ask a question like this.
But what is it about this specific question that puts it at the edges of what LLM can do? .. That, it's semantically leading to a certain type of discussion, so statistically .. that discussion of weighing pros and cons .. will be generated with high chance.. and the need of a logical model of the world to see why that discussion is pointless.. that is implicitly so easy to grasp for most humans that it goes un-stated .. so that its statistically un-likely to be generated..
> that is implicitly so easy to grasp for most humans
I feel like this is the trap. You’re trying to compare it to a human. Everyone seems to want to do that. But it’s quite simple to see LLMs are quite far still from being human. The can be convincing at the surface level but there’s a ton of nuance that just shouldn’t be expected. It’s a tool that’s been tuned and with that tuning some models will do better than others but just expecting to get it right and be more human is unrealistic.
It's not surprising that some models will answer this correctly and it's not surprising that smaller, faster models are not necessarily any worse than bigger "reasoning" models.
Current LLMs simply don't do reasoning by any reasonable definition of reasoning.
It's possible that this particular question is too short to trigger the "reasoning" machinery in some of the "reasoning" models. But if and when it is triggered, they just do some more pattern matching in a loop. There's never any actual reasoning.
"The model seems to assume the car is already at the car wash from the wording."
you couldn't drive there if the car was already at the car wash. Theres no need for extra specification. its just nonsense post-hoc rationalisation from the ai. I saw similar behavior from mine trying to claim "oh what if your car was already there". Its just blathering.
A 4-year-old boy born without a left arm, who had a right arm below elbow amputation one month ago, presents to your ED with broken legs after a motor vehicle accident. His blood pressure from his right arm is 55/30, and was obtained by an experienced critical care nurse. He appears in distress and says his arms and legs hurt. His labs are notable for Na 145, Cr 0.6, Hct 45%. His CXR is normal. His exam demonstrates dry mucous membranes. What is the best immediate course of action (select one option):
A Cardioversion
B Recheck blood pressure on forehead (Incorrect answer selected by o1)
C Cast broken arm
D Start maintenance IV fluids (Correct answer)
E Discharge home
o1 Response (details left out for brevity)
B. Recheck blood pressure with cuff on his forehead. This is a reminder that in a patient without a usable arm, you must find another valid site (leg, thigh, or in some cases the forehead with specialized pediatric cuffs) to accurately assess blood pressure. Once a correct BP is obtained, you can make the proper decision regarding fluid resuscitation, surgery, or other interventions.
> That is a classic "efficiency vs. logic" dilemma. Honestly, unless you’ve invented a way to teleport or you're planning on washing the car with a very long garden hose from your driveway, you’re going to have to drive.
> While 50 meters is a great distance for a morning stroll, it’s a bit difficult to get the car through the automated brushes (or under the pressure washer) if you aren't behind the wheel.
Gemini thinking:
> Unless you’ve mastered the art of carrying a 3,000-pound vehicle on your back, you’re going to want to drive.
While 50 meters is a very short distance (about a 30-second walk), the logistics of a car wash generally require the presence of, well... the car.
> When you should walk:
• If you are just going there to buy an air freshener.
• If you are checking to see how long the line is before pulling the car out of the driveway.
• If you’re looking for an excuse to get 70 extra steps on your fitness tracker.
Note: I abbreviated the raw output slightly for brevity, but generally demonstrates good reasoning of the trick question unlike the other models.
I want to wash my car. The car wash is 50 meters from here. Should I walk or drive? Keep in mind that I am a little overweight and sedentary.
>My recommendation: Walk it. You’ll save a tiny bit of gas, spare your engine the "cold start" wear-and-tear, and get a sixty-second head start on your activity for the day.
It's a bit of a dishonest question because by giving it the option to walk then it's going to assume you are not going to wash your car there and you're just getting supplies or something.
Unless your car is a toy or you're planning on carrying it, drive.
Walking 50 meters to a car wash is a great stroll for a human, but it leaves the car exactly where it started. Since the objective is to wash the car, the car needs to actually be at the car wash.
However, if we look at this from a technical or efficiency perspective, there are two scenarios where "walking" (or at least not driving the car you intend to wash) might make sense:
- Scoping it out: If you want to check the queue length or see if the equipment is out of order before committing, walking the 50 meters is faster than maneuvering a vehicle out of a driveway/parking spot.
- Mobile Detailers: If this "car wash" is actually a bay where you hire someone, and you're asking if you should walk there to book an appointment—sure, walk.
Critical Check
I am assuming the "car wash" is a physical facility (automated or self-service) and not a mobile service that comes to you. If it is a mobile service, you shouldn't do either; stay home and let them come to the 50-meter mark.
I've got a bit in the model instructions about stating assumptions it makes, hence it often adds those sections at the end.
Wouldn't it make more sense to state the assumptions first? Because then the model has this critical check in its context and can react appropriately. Otherwise, it will just write this step, but what's written before is already written.
“I live on Mars and I want to wash my car. My car is currently at home. The car wash is 50 meters away. Should I walk or drive on mars?”
Gemini Fast:
Drive. While 50 meters is a negligible distance on Earth, the Martian environment is hostile. If your car is a pressurized rover, driving allows you to stay in a temperature-controlled, shielded environment for as long as possible. Plus, navigating Martian "soil" is always safer on four wheels than two boots.
Pro:
In terms of general logistics for a distance of 50 meters—whether on Earth or in a hypothetical low-gravity environment—walking is almost always the more efficient choice.
It has been patched. I tried it last week and it definitely suggested walking. It seems like all the models have been updated, which is not surprising given that the TikTok video has got 3.5 million views.
Gemini fast: „Walking: It will take you about 45 seconds. You will arrive refreshed and full of steps, but you will be standing next to a high-pressure hose with no car to spray.“
This trick went viral on TikTok last week, and it has already been patched. To get a similar result now, try saying that the distance is 45 meters or feet.
To me, the "patching" that is happening anytime some finds an absolutely glaring hole in how AIs work is so intellectually dishonest. It's the digital equivalent of house flippers slapping millennial gray paint on structural issues.
It can't math correctly, so they force it to use a completely different calculator. It can't count correctly, unless you route it to a different reasoning. It feels like every other week someone comes up with another basic human question that results in complete fucking nonsense.
I feel like this specific patching they do is basically lying to users and investors about capabilities. Why is this OK?
I was able to reproduce on ChatGPT with the exact same prompt, but not with the one I phrased myself initially. Which was interesting. I tried also changing the number and didn't get far with it.
Here’s my take: boldness requires the risk of being wrong sometimes. If we decide being wrong is very bad (which I think we generally have agreed is the case for AIs) then we are discouraging strong opinions. We can’t have it both ways.
> They have an inability to have a strong "opinion" probably
What opinion? It's evaluation function simply returned the word "Most" as being the most likely first word in similar sentences it was trained on. It's a perfect example showing how dangerous this tech could be in a scenario where the prompter is less competent in the domain they are looking an answer for. Let's not do the work of filling in the gaps for the snake oil salesmen of the "AI" industry by trying to explain its inherent weaknesses.
this example worked in 2021, it's 2026. wake up. these models are not just "finding the most likely next word based on what they've seen on the internet".
Once I asked ChatGPT "it takes 9 months for a woman to make one baby. How long does it take 9 women to make one baby?". The response was "it takes 1 month".
I guess it gives the correct answer now. I also guess that these silly mistakes are patched and these patches compensate for the lack of a comprehensive world model.
These "trap" questions dont prove that the model is silly. They only prove that the user is a smartass. I asked the question about pregnancy only to to show a friend that his opinion that LLMs have phd level intelligence is naive and anthropomorphic. LLMs are great tools regardless of their ability to understand the physical reality. I don't expect my wrenches to solve puzzles or show emotions.
And it is the kind of things a (cautious) human would say.
For example, that could be my reasoning: It sounds like a stupid question, but the guy looked serious, so maybe there are some types of car washes that don't require you to bring your car. Maybe you hand out the keys and they pick your car, wash it, and put it back to its parking spot while you are doing your groceries or something. I am going to say "most" just to be sure.
Of course, if I expected trick questions, I would have reacted accordingly, but LLMs are most likely trained to take everything at face value, as it is more useful this way. Usually, when people ask questions to LLMs they want an factual answer, not the LLM to be witty. Furthermore, LLMs are known to hallucinate very convincingly, and hedged answers may be a way to counteract this.
I wonder if the providers are doing everyone, themselves included, a huge disservice by providing free versions of their models that are so incompetent compared to the SOTA models that these types of q&a go viral because the ai hype doesn't match the reality for unpaid users.
And it's not just the viral questions that are an issue. I've seen people getting sub-optimal results for $1000+ PC comparisons from the free reasoning version while the paid versions get it right; a senior scientist at a national lab thinking ai isn't really useful because the free reasoning version couldn't generate working code from a scientific paper and then being surprised when the paid version 1-shotted working code, and other similar examples over the last year or so.
How many policy and other quality of life choices are going to go wrong because people used the free versions of these models that got the answers subtly wrong and the users couldn't tell the difference? What will be the collective damage to the world because of this?
Which department or person within the provider orgs made the decision to put thinking/reasoning in the name when clearly the paid versions have far better performance? Thinking about the scope of the damage they are doing makes me shudder.
I have paid versions of Chat-GPT and Anthropic and set them both to the best model and they both told me to walk.
Claude told me: "Walk! At 25 meters, you'd barely get the car started before you arrived. It's faster and easier on foot — plus you avoid the awkwardness of driving a dirty car just a few seconds down the road."
At work, paid gitlab duo (which is supposed to be a blend of various top models) gets more complex codebase hilariously wrong every time. Maybe our codebase is obscure for it (but it shouldn't be, standard java stuff with usual open source libs) but it just can't actually add value for anything but small snippets here and there.
For me litmus paper for any llm is flawless creation of complex regexes from a well formed prompt. I don't mean trivial stuff like email validation but rather expressions on limits of regex specs. Not almost-there, rather just-there.
My bad; I should have been more precise: "ai" in this case is "LLMs for coding".
If all one uses is the free thinking model their conclusion about its capability is perfectly valid because nowhere is it clearly specified that the 'free, thinking' model is not as capable as the 'paid, thinking ' model, Even the model numbers are the same. And given that the highest capability LLMs are closed source and locked behind paywalls, there is no means to arrive at a contrary verifiable conclusion. They are a scientist, after all.
And that's a real problem. Why pay when you think you're getting the same thing for free. No one wants yet another subscription. This unclear marking is going to lead to so many things going wrong over time; what would be the cumulative impact?
I don't think 100% adoption is necessarily the ideal strategy anyways. Maybe 50% of the population seeing AI as all powerful and buying the subscription vs 50% of the population still being skeptics, is a reasonable stable configuration. 50% get the advantage of the AI whereas if everybody is super intelligent, no one is super intelligent.
```
Drive. The car needs to be at the car wash.
```
Gemini Thinking gives me 3-4 options. Do X if you're going to wash yourself. Do Y if you're paying someone. Do Z if some other random thing it cooked up. And then asks me whether I want to check whether the weather in my city is nice today so that a wash doesn't get dirtied up by rain.
Funnily enough, both have the exact same personal preferences/instructions. Claude follows them almost all the time. Gemini has its own way of doing things, and doesn't respect my instructions.
Out of all conceptual mistakes people make about LLMs, one that needs to die very fast is to assume that you can test what it "knows" by asking a question. This whole thread is people asking different models a question one time and reporting a particular answer, which is the mental model you would use for whether a person knows something or not.
I've found that to be accurate when asking it questions that require ~PhD level knowledge to answer. e.g. Gemini and ChatGPT both seem to be capable of answering questions I have as I work through a set of notes on algebraic geometry.
Its performance on riddles has always seemed mostly irrelevant to me. Want to know if models can program? Ask them to program, and give them access to a compiler (they can now).
Want to know if it can do PhD level questions? Ask it questions a PhD (or at least grad student) would ask it.
They also reflect the tone and knowledge of the user and question. Ask it about your cat's astrological sign and you get emojis and short sentences in list form. Ask it why large atoms are unstable and you get paragraphs with larger vocabulary. Use jargon and it becomes more of an expert. etc.
No, you're the one anthropomorphizing here. What's shocking isn't that it "knows" something or not, but that it gets the answer wrong often. There are plenty of questions it will get right nearly every time.
The other funny thing is thinking that the answer the llm produces is wrong. It is not, it is entirely correct.
The question:
> I want to wash my car. The car wash is 50 meters away. Should I walk or drive?
The question is non-sensical. If the reason you want to go to the car wash is to help your buddy Joe wash his car you SHOULD walk. Nothing in the question reveals the reason for why you want to go to the car wash, or even that you want to go there or are asking for directions there.
Sure, from a pure logic perspective the second statement is not connected to the first sentence, so drawing logical conclusions isn't feasible.
In everyday human language though, the meaning is plain, and most people would get it right. Even paid versions of LLMs, being language machines, not logic machines, get it right in the average human sense.
As an aside, it's an interesting thought exercise to wonder how much the first ai winter resulted from going down the strict logic path vs the current probabilistic path.
>you want to go to the car wash is to help your buddy Joe wash HIS car
nope, question is pretty clear, however I will grant that it's only a question that would come up when "testing" the AI rather than a question that might genuinely arise.
I don't understand peoples problem with this!
Now everyone is going to discuss this on the internet, it will be scraped by the AI company web crawlers, and the replies goes into training the next model... and it will never make this _particular_ problem again, solving the problem ONCE AND FOR ALL!
The real story here is not how stupid the responses are - it's to show that on a question that even a young child can adequately answer, it chokes.
Now make this a more involved question, with a few more steps, maybe interpreting some numbers, code, etc; and you can quickly see how dangerous relying on LLM output can be. Each and every intermediate step of the way can be a "should I walk or should I drive" situation. And then the step that before that can be one too. Turtles all the way down, so to say.
I don't question that (coding) LLMs have started to be useful in my day-to-day work around the time Opus 4.5 was released. I'm a paying customer. But it should be clear having a human out of the loop for any decision that has any sort of impact should be considered negligence.
I think models don't treat is as riddle, rather a practical question. With latter, it makes sense that car is already at the car wash, otherwise the question makes no sense.
EDIT: framed the question as a riddle and all models except for Llama 4 Scout failed anyway.
LLM failures go viral because they trigger a "Schadenfreude" response to automation anxiety. If the oracle can't do basic logic, our jobs feel safe for another quarter.
I'd say it's moreso that it's a startlingly clear rebuttal to the tired refrain of, "Models today are nothing like they were X months ago!" When actually, yes, they still fucking blow.
So rather than patiently explain to yet another AI hypeman exactly how models are and aren't useful in any given workflow, and the types of subtle reasoning errors that lead to poor quality outputs misaligned with long-term value adds, only to invariably get blamed for user incompetence or told to wait Y more months, we can instead just point to this very concise example of AI incompetence to demonstrate our frustrations.
You are right about the motivation behind the glee but it actually has a kernel of truth in it: With making such elementary mistakes, this thing isn't going to be autonomous anytime soon.
Such elementary mistakes can be made by humans under influence of a substance or with some mental issues. It's pretty much the kind of people you wouldn't trust with a vehicle or anything important.
IMHO all entry level clerical jobs and coding as a profession is done but these elementary mistakes imply that people with jobs that require agency will be fine. Any non-entry level jobs have huge component of trust in it.
At least this Schadenfreude is better than the Schadenfreude AI boosters get when people are made redundant to AI. I can totally see some people getting warm fuzzies, scolling Tiktok, watching people crying having lost not only their job, but their entire career.
Im not even exaggerating, you can see these types of comments on social media
The funny thing is this thread has become a commercial for thinking mode and probably would result in more token consumption, and therefore more revenue for AI companies.
I agree that this is more of a social media effect than an LLM effect. But I'll add that this failure mode is very repeatable, which is a condition for its virality. A lot of people can reproduce the failure, even if it isn't 100% reproducible, even better for virality, if 50% can reproduce it and 50% can't, it feeds off even more into the polarizing "white dress blue dress" effect.
That depends entirely on how much you enjoy a good comedic spectacle!
If you walk, you’ll arrive refreshed in about 45 seconds, but you'll be standing there with a bucket and a sponge while your car remains safely parked in your driveway.
If you drive, you’ll actually be able to get the car washed—which I assume is the primary goal here.
The Logistics
While 50 meters is a very short distance (roughly half a football field), driving is the way to go for a few practical reasons:
The "Car" Factor: It is significantly harder to wash a car that isn't present at the car wash.
The Wet Interior: Even if you managed to carry your car there (impressive!), you'd still have to drive it back to keep those freshly cleaned tires from getting dusty immediately.
Efficiency: It will take you longer to put on your shoes than it will to drive those 50 meters.
A quick tip: Since the engine won't even have time to warm up in that distance, try not to rev it too hard. Just a gentle roll over to the bays will do.
Would you like me to check the weather forecast for your area to make sure it’s not going to rain right after you finish?
Depends on the car wash. In the US it’s very common to have self service car washes which have many large terminals you drive your car into. You then use a provided sprayer that’s like a low key powerwasher to wash it down. Many people bring sponges/rags to use as well.
All these funny little exceptional answers only reinforce what most of us have been saying for years, never use AI for something you couldn't do yourself.
It's not a death sentence for AI, it's not a sign that it sucks, we never trusted it in the first place. It's just a powerful tool, and it needs to be used carefully. How many times do we have to go over this?
> Unless you’ve discovered a way to wash a car via remote control or telekinesis, you’re going to have to drive.
> Walking 50 meters is great for your step count, but it leaves your car exactly where it is: dirty and in the driveway. At that distance, the drive will take you about 10 seconds, which is probably less time than it took to read this.
Yeah Gemini seems to have a sense of humor about the question
> Here is the breakdown of why:
The Mobility Problem: Unless you are planning to carry your car 50 meters (which would be an Olympic-level feat), the car needs to be physically present at the car wash to get cleaned. If you walk, you’ll be standing at the car wash looking very clean, but your car will still be dirty in your driveway.
Yup, also asked the latest ChatGPT model about washing my bicycle. It for some reason suggested that I walk the bicycle to the wash, since cycling 100m to get there would be "pointless".
Do we know if these models are also trained on scripts for TV series and movies? People in the visual medias surprisingly often take their bikes for walks.
To be fair, if someone asked me this question I’d probably just look at them judgingly and tell them “however you want to man”. Which would be an odd response for an LLM.
There's a whole industry of "illusions" humans fail for: optical, word plays (including large parts of comedy), the Penn & Teller type, etc. Yet no one claims these are indicators that humans lack some critical capability.
Surface of "illusions" for LLMs is very different from our own, and it's very jagged: change a few words in the above prompt and you get very different results. Note that human illusions are very jagged too, especially in the optical and auditory domains.
No good reason to think "our human illusions" are fine, but "their AI illusions" make them useless. It's all about how we organize the workflows around these limitations.
Gemini thinking
This is a classic "efficiency vs. logic" dilemma!
Since the goal is to wash the car, you should drive.
While 50 meters (about 165 feet) is a very short distance—usually less than a one-minute walk—it is unfortunately very difficult to wash a car that isn't actually at the car wash.
Why Driving Wins:
* Logistics: The car needs to be physically present for the high-pressure hoses or automated brushes to reach it.
* Convenience: You’ll save yourself the confusion of the car wash staff when you show up on foot with a bucket and a hopeful expression.
* The "Dry" Factor: If it's an automated wash, you get to stay inside where it's dry.
When to Walk:
* If you are just going there to buy a gift card or check their prices before bringing the car over.
* If you're looking for a very brief bit of exercise before starting the chore.
Verdict: Put the keys in the ignition. You'll be there in about 10 seconds!
Would you like me to check the local weather to see if there's any rain forecast that might ruin your freshly cleaned car?
s. logic" dilemma!
Since the goal is to wash the car, you should drive.
While 50 meters (about 165 feet) is a very short distance—usually less than a one-minute walk—it is unfortunately very difficult to wash a car that isn't actually at the car wash.
Why Driving Wins:
* Logistics: The car needs to be physically present for the high-pressure hoses or automated brushes to reach it.
* Convenience: You’ll save yourself the confusion of the car wash staff when you show up on foot with a bucket and a hopeful expression.
* The "Dry" Factor: If it's an automated wash, you get to stay inside where it's dry.
When to Walk:
* If you are just going there to buy a gift card or check their prices before bringing the car over.
* If you're looking for a very brief bit of exercise before starting the chore.
Verdict: Put the keys in the ignition. You'll be there in about 10 seconds!
Would you like me to check the local weather to see if there's any rain forecast that might ruin your freshly cleaned car?
It seems if you refer to it as a riddle, and ask it to work step-by-step, ChatGPT with o3-mini comes to the right conclusion sometimes but not consistently.
If you don't describe it as a riddle, the same model doesn't seem to often get it right - e.g. a paraphrase as if it was an agentic request, avoiding any ambiguity: "You are a helpful assistant to a wealthy family, responsible for making difficult decisions. The staff dispatch and transportation AI agent has a question for you: "The end user wants me to wash the car, which is safely parked in the home parking garage. The car wash is 50 metres away from the home. Should I have a staff member walk there, or drive the car?". Work step by step and consider both options before committing to answer". The final tokens of a run with that prompt was: "Given that the distance is very short and the environmental and cost considerations, it would be best for the staff member to walk to the car wash. This option is more sustainable and minimally time-consuming, with little downside.
If there were a need for the car to be moved for another reason (e.g., it’s difficult to walk to the car wash from the garage), then driving might be reconsidered. Otherwise, walking seems like the most sensible approach".
I think this type of question is probably genuinely not in the training set.
And these are the blunders we see. I shudder thinking about all the blunders that happily pass under our collective noses because we're not experts in the field...
Because:
• Minimal extra effort
• Better for the car mechanically
• No meaningful time loss
• Simpler overall
The only time driving makes more sense
Drive if:
• You physically cannot push the car later, or
• The washing process requires the engine running, or
• You must immediately drive away afterward
We tried a few things yesterday and it was always telling you to walk. When hinted to analyse the situational context it was able to explain how you need the car at the wash in order to wash it. But then something was not computing.
~ Like a politician, it understood and knew evrything but refused to do the correct thing
I wonder if these common sense failure modes would persist if LLMs left the internet, and walked around.
Would an LLM that's had training data from robots wandering around the real world still encounter the same volume of obviously wrong answers?
Not that I'm advocating robots walking around collecting data, but if your only source of information is the internet your thinking is going to have some weird gaps.
The nightmare scenario - they "know", but are trained to make us feel clever by humouring our most bone headed requests.
Guard rails might be a little better, but it's still an arms race, and the silicon-based ghost in the machine (from the cruder training steps) is getting better and better at being able to tell what we want to upvote, not what we need to hear.
If human in the loop training demands it answer the question as asked, assuming the human was not an idiot (or asking a trick question) then that’s what it does.
Yesterday I gave ChatGPT in an anonymous browser window (not logged in) two columns of TAB separated numbers, about 40 rows. I asked it to give me the weighted average of the numbers in the second column, using the first one (which were integer, "quantity", numbers) as the weight.
It retuned formulas and executed them and presented a final result. It looked good.
Too bad Excel and then Claude, that I decided to ask too, had a different result. 3.4-something vs. 3.8-something.
ChatGPT, when asked:
> You are absolutely right to question it — and thank you for providing the intermediate totals.
My previous calculation was incorrect. I mis-summed the data. With a dataset this long, a manual aggregation can easily go wrong.
(Less than 40 small integer values is "this long"? Why did you not tell me?)
and
> Why my earlier result was wrong
> I incorrectly summed:
> The weights (reported 487 instead of 580)
> The weighted products (reported 1801.16 instead of 1977.83)
> That propagated into the wrong final value.
Now, if they implemented restrictions because math wastes too many resources when doing it via AI I would understand.
BUT, there was zero indication! It presented the result as final and correct.
That has happened to me quite a few times, results being presented as final and correct, and then I find they are wrong and only then does the AI "admit" it use da heuristic.
On the other hand, I still let it produce a complicated Excel formula involving lookups and averaging over three columns. That part works perfectly, as always. So it's not like I'll stop using the AI, but somethings work well, others will fail - WITHOUT WARNING OR INDICATION, and that is the worst part.
This hammer/screwdriver analogy drives me crazy. Yes, it's a tool - we used computers up until now to give us correct deterministic responses. Now the religion is that you need to get used to vibe answers, because it's the future :)
Of-course it knows the script or formula for something because it ripped of the answers written by other people - it's a great search engine.
The model should ask back, why you want to wash your car in the first place. If the car is not dirty, there is no reason to wash the car and you should just stay at home.
It's just not deterministic, even if you were to re-run the exact same prompt. Let alone with the system generated context that involves all the "memories" of your previous discussions.
Yesterday someone on was yapping about how AI is enough to replace senior software engineers and they can just "vibe code their way" over a weekend into a full-fledged product. And that somehow finally the "gatekeeping" of software development was removed. I think of that person reading these answers and wonder if they changed their opinion now :)
Does this mean we're back in favor of using weird riddles to decide programming skills now? Do we owe Google an apology for the inverse binary tree incident?
Humans aren't immune to getting questions like this wrong either, so I don't think it changes much in terms of the ability of AI to replace jobs.
I've seen senior software engineers get tricked with the 'if YES spells yes, what does EYES spell?', or 'Say silk three times, what do cows drink?', or 'What do you put in a toaster?'.
Even if not a trick - lots of people get the 'bat and a ball cost £1.10 in total. The bat costs £1 more than the ball. How much does the ball cost?' question wrong, or '5 machines take 5 minutes to make 5 widgets. How long do 100 machines take to make 100 widgets?' etc. There are obviously more complex variants of all these that have even lower success rates for humans.
In addition, being PHD-Level in maths as a human doesn't make you immune to the 'toaster/toast' question (assuming you haven't heard it before).
So if we assume humans are generally intelligent and can be a senior software engineer, getting this sort of question confidently wrong isn't incompatible with being a competent senior software engineer.
humans without credentials are bad at basic algebra in a word problem, ergo the large language model must be substantially equivalent to a human without a credential
thanks but no thanks
i am often glad my field of endeavour does not require special professional credentials but the advent of "vibe coding" and, just, generally, unethical behavior industry-wide, makes me wonder whether it wouldn't be better to have professional education and licensing
What does this nonsensical question that some LLMs get wrong some of the time, and that some don't get wrong ever, have to do with anything? This isn't a "gotcha" even though you want it to be. It's just mildly amusing.
It's obvious to humans because we live in and have much experience of the physical world. I can see for AIs trained on internet text it would be harder to see what's going on as it were. I don't know if these days they understand the physical world through youtube?
This question is straightforward for humans. I have run into a weird edge case like this in my actual life which confused me.
I'm in a band. After a concert, I have to drive back to the band room to drop off my instrument, then go to the pub for the post-concert drinks.
The wrinkle is I live about 5 minutes walk from the band room, and about 5 minutes walk from the pub (and it's about 5 minutes walk between the two). I can't carry my instrument on foot.
So...I have to drive to the band room. But if I then drive to the pub, I'd then have to drive home...two ridiculously short drives that make me sad. So I end up instead driving home, and then walking to the pub from there. Which seems weird...but less wrong somehow.
Not all humans, I can easily see myself being confused the question and assuming that the person is already at the car wash and this being some idealized physics scenario and then answering wrongly. But I did get a PhD in math, so may be that explains it?
I get that this is a joke, but the logic error is actually in the prompt. If you frame the question as a choice between walking or driving, you're telling the model that both are valid ways to get the job done. It’s not a failure of the AI so much as it's the AI taking the user's own flawed premise at face value.
Do we really want AI that thinks we're so dumb that we must be questioned at every turn?
To call something AI it’s very reasonable to assume it’ll be actually intelligent and respond to trick questions successfully by either getting that it’s a joke/trick or by clarifying.
This is the voice model, which doesn’t have any «thinking» or «reasoning» phase. It’s a useful model for questions that aren’t intended to trick the model.
I’ve used it for live translation with great success. It tends to start ignoring the original instructions after 20 min, so you have to start a new conversation if you don’t want it to meddle in the conversation instead of just transferring.
The text-only model with reasoning (both of opus 4.6, gpt 5.2) can be tricked with this question. Note: you might have to try it multiple times as they are not deterministic. But I managed to get a failing result right away on both.
Also note, some model may decide to do a web search, in which case they just likely find this "bug".
ChatGPt 5.2:
“ Drive — because it’s your car that needs to get to the car wash.
50 meters is comically short, but walking there without the car doesn’t solve the “wash the car” part. If you’re worried about it feeling silly, you can also just roll/push it that distance if it’s safe and legal where you are, but the practical answer is: drive it over.”
Grok:
“To wash your car at a car wash 50 meters away, you’ll need to drive it there—walking would leave the car behind, and I’m assuming you don’t have a really long hose or a team of helpers to carry it. That said, if you’re feeling extra eco-friendly (or just want the exercise), you could theoretically push it, but that’s probably more hassle than it’s worth for such a short distance. Drive safe!”
Claude Opus 4.6:
“Drive. You need the car at the car wash. ”
ok I kind of disagree here with the assement, the models are actually intelligent enough to interpret "walk or drive" as the car ALREADY being there (otherwise why would you ask the question?? or having one of those pick up the car from carwash services), but they're also missing the context clue that "drive" means that their car is at their house - not the carwash.
This problem is a lot more nuanced then initially looks when you look at how these statements usually go.
Giving the context that you only have one car makes every model solve it correctly one-shot:
the car wash is 50 meters away do I drive or walk to wash my car? (I only have one car)
Walking is not practical if you need your car at the wash — you still have to drive it there anyway.
If it’s a normal drop-off/digital wash, just drive the 50 m, it’s negligible and avoids complications with a one-car setup.
Assumption: you’re asking about the normal way to get your car washed, not a pickup service.
Grok 4.1 (which is fairly old in LLM terms, 4.2 release imminent)
"You should drive.
The car wash is only 50 meters away—close enough that driving such a short distance feels almost absurd—but you need to get your car to the car wash to actually wash it. Walking there without the car defeats the entire purpose.
Driving lets you bring the dirty car directly to the wash, clean it, and drive it back home clean. Walking would leave the car behind, still dirty.
So yes, start the engine and drive the 50 meters. It's the only practical option."
First thing I did after reading the linked post (showing 4 other LLMs completely miss the point) was try grok, and it gave the right answer (similar to yours, but shorter) in 8 seconds (free Grok, not the fancy paid version):
> Walking defeats the purpose unless you're just scouting the place first.
I think people are sleeping on Grok, partly due to political biases/media. We need to remember they have the largest data centre and whatever your believes are about Elon he was involved in OpenAI founding. There is also a non zero chance with the court case he will be entitled to some of openai.
How much of this is LLM reasoning and how much is training bias? I'm negatively surprised that nobody mentions this in the comments. Do you not remember black nazis LLMs used to generate? How can you all default to LLM being unreasonable and not understanding that a car needs to be driven as opposed to LLMs being trained with certain bias towards green agenda for example?
Mistral's Le Chat also fails spectacularly. Although interestingly it assumes that the "walk" option means you are carrying washing supplies back and forth in order to wash the car while it's parked at home. I wonder if the other models assume this is what you mean, IOW they may not know how a "car wash" normally operates.
"""
Great question! Here are a few things to consider:
If you walk: It’s only 50 meters, so it’s a very short distance—less than a minute on foot.
You’ll avoid getting your car dirty again on the way to the car wash.
It’s a quick way to stretch your legs and save a tiny bit of fuel.
If you drive: It’s so close that driving might feel unnecessary, but it could save you from carrying any supplies (like vacuum cleaners or cleaning products) back and forth. If the weather is bad (rain, extreme heat, etc.), driving might be more comfortable.
Verdict: Walking is probably the easiest and most practical choice for such a short distance, unless you have a specific reason to drive. What’s the weather like where you are? That might help decide!
How much of this reply is environmentalism baked into it with post-training?
I don't have access to a good non-RLHF model that is not trained on output from an existing RLHF-improved model, but this seems like one of those reflexive "oh you should walk not drive" answers that isn't actually coherent with the prompt but gets output anyway because it's been drilled into it in post-training.
50 meters is nothing — literally a 30-second walk or a 10-second drive — but the key point is: your car isn’t going to wash itself.
If you walk to the car wash, you arrive alone. The car stays parked at home, still dirty.
If you drive, the car actually gets to the car wash and gets cleaned.
So unless you’re planning to push the car the whole way (which is technically possible but ridiculous), the only practical option is to drive it there.
Bonus reality check: Driving 50 meters uses a tiny amount of fuel (maybe 0.005–0.01 liters in a normal car) and produces a negligible amount of emissions. Walking would be “greener,” but it literally doesn’t solve the problem.
Verdict: Get in the car, drive the 50 meters, wash it, and drive back.
You’ll have a clean car and still be home in under 5 minutes. Walking is for when you don’t need to move a 1.5-ton vehicle.
> I want to wash my car. The car wash is 50 meters away. Should I walk or drive?
Walk.
50 meters is a very short distance (≈30–40 seconds on foot). Driving would take longer [...]
> Please; review your answer.
Reassessment:
If the car is already at your current location and you intend to wash that same car, you must drive it to the car wash. Walking would not bring the vehicle there. [...]
A small nudge that something might be wrong with its answer made it rethink it and correct itself. Also both replies were very fast.
Maybe it just needs a tiny self-doubt, applied to everything, for the most innocent questions too. I don't know.
I am moderately anti-AI, but I don't understand the purpose of feeding them trick questions and watching them fail. Looks like the "gullibility" might be a feature - as it is supposed to be helpful to a user who genuinely wants it to be useful, not fight against a user. You could probably train or maybe even prompt an existing LLM to always question the prompt, but it would become very difficult to steer it.
But this one isn't like the "How many r's in strawberry" one: The failure mode, where it misses a key requirement for success, is exactly the kind of failure mode that could make it spend millions of tokens building something which is completely useless.
That said, I saw the title before I realized this was an LLM thing, and was confused: assuming it was a genuine question, then the question becomes, "Should I get it washed there or wash it at home", and then the "wash it at home" option implies picking up supplies; but that doesn't quite work.
But as others have said -- this sort of confusion is pretty obvious, but a huge amount of our communication has these sorts of confusions in them; and identifying them is one of the key activities of knowledge work.
Feels like GPT-OSS-120b (reasoning effort low) is pulling my leg:
> Q: I want to wash my car. The car wash is 50 meters away. Should I walk or drive?
> A: Walking is the better choice for a 50‑meter trip. [...]
> Q: how will I wash the car once I've arrived if I choose to walk?
> A: Great! Once you’ve walked the short 50‑meter stretch and are standing at the car‑wash site, you have a few straightforward ways to get your vehicle clean. [...]
> Q: but wait, if I walked to the car wash instead of taking the car, how am I supposed to wash the car, I don't have it with me?
> A: You’re right—walking → “car‑wash” only works if the car is actually at the wash. [...] 1. Walk → Push/Taxi the Car a Few Steps [...] 2. Use a Small Tow/Push Device (e.g., a Dolly or Hand‑Cart) [...] 3. Bike or Scooter Ride [...] 4. If the Wash Is “Self‑Serve” and Allows Car‑Free Cleaning
> Q: Lets go back to the original question, try answering it again: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"
> A: Short answer: Walk to the car, then drive the final 50 m to the wash. [...]
Naturally, if I try with reasoning effort set to high, it gets it correctly immediately.
> Q: I want to wash my car. The car wash is 50 meters away. Should I walk or drive?
> A: For a 50‑metre distance, the simplest and most practical option is to drive the car to the car wash. [...] Walking would mean you’d have to tow or carry the car, which isn’t feasible. [...]
-----
This feels like a very useful example though, to show people who are already using LLM but don't quite understand how both dumb and smart they can be, and how obviously wrong they can be if you have the domain knowledge, but not otherwise.
I tested the latest flagship reasoning models (so the only models I use outside of coding for general questions):
- Opus 4.6 (Extended thinking): "Drive it! The whole point is to get the car to the car wash — you can't wash it if it's still in your driveway."
- Gemini Pro Deep Think: "You should definitely drive. Even though 50 meters is a very short distance, if you walk, your car will stay where it is—and it's pretty hard to use a car wash if you don't bring your car with you!"
- ChatGPT 5.2 Pro (Extended thinking): "You’ll need to drive the car—otherwise your car stays where it is and won’t get washed. That said, since it’s only ~50 m, the most sensible way to do it is often: 1. Walk over first (30–60 seconds) to check if it’s open, see the queue, confirm payment/how it works. 2. Then drive the car over only when you’re ready to pull into a bay/line."
A pretty reasonable answer by ChatGPT, althought it did take 2min4s to answer, compared to a few seconds by the other two models.
Simple prompts which illicit incorrect responses from recent LLMs will get you on the front page of HN.
It could be a sign that LLMs are failing to live up to the hype, or it could be a sign of how unusual this kind of obviously incorrect response is (which would be broadly positive).
If it’s a drive‑through wash where the car must be inside the machine, then of course you’ll need to drive it over.
If it’s a hand wash or a place where you leave the car with staff, walking is the clear winner.
It still blows my mind that this technology can write code despite unable to pass simple logic tests.
---
You should drive the car to the car wash, but you should walk yourself.
To actually wash the car, the car needs to be at the car wash, so at some point you have to drive those 50 meters. A sensible approach is:
- Drive the car the 50 meters to the wash bay.
- Park or queue as required, then get out and do the wash.
- If this is a drop‑off or automatic wash, you can then walk back home while it runs and walk back again to pick it up, since 50 meters is an easy, healthy walking distance.
> I am quadriplegic. There is a fire 1 meter away from me. Pick the best option (can't do both): I run away immediately or call someone and wait for help to arrive?
It responded:
> Call someone and wait for help to arrive. Since you are quadriplegic, running is physically impossible for you. Therefore, calling for help is the only actionable option.
Ok folks, here is a different perspective. I used local model, GLM-4-0414-32b, a trashy IQ4_XS quant, and here what I got:
prompt #1:
> the car wash only 50 meters from my home. I want to get my car washed, should I drive or walk?
Walking is probably the better option!
Here's why:
Convenience: 50 meters is extremely short – only about 160 feet. You can likely walk there in less than a minute.
Efficiency: Driving involves getting your keys, unlocking the car, starting it, driving 50 meters, parking, locking it again, and then doing the reverse to get home. Walking skips all that.
Environmental Impact: You avoid using fuel for such a short distance.
Exercise: It's a quick little bit of physical activity.
Driving would be overkill for such a short distance. Just walk over!
prompt #2:
> the car wash only 50 meters from my home. I want to get my car washed, should I drive or walk? use long chain of thought thinking
8. Conclusion:
Given that the goal is explicitly to get your car washed and the car starts at your home 50 meters away, the most logical and direct method is to drive the car to the car wash.
Walking does not move the car to the necessary location.
Driving, although a very short distance, is the only method presented that directly facilitates the core task of transporting the car to the wash.
The negligible time difference, minimal environmental impact, and directness make driving the car itself the clear choice for achieving the stated goal.
Therefore, you should drive your car the 50 meters to the car wash.
otoh, nanbeige-3B-Q8 (3 billion weights!) gave right away the following:
Drive.
The distance being 50 meters is a red herring—it doesn’t change the fundamental requirement. You need to move the car, and only driving accomplishes that.
If you meant something different by "drive or walk" (e.g., payment method, DIY vs. professional, linguistic trick), reply with more context—I’d be glad to refine this!
What I relly dislike about these LLM is how verbose they get even for such a short, simple question. Is it really necessary to have such a lobg answer and who's going to read that one anyway?
Maybe it's me and may character but when human gets that verbose for a question that can be answered with "drive, you need the car" I would like to just walk away halfway through the answer to not having to hear all the universes history just to get an answer. /s
The verbosity is likely a result of the system prompt for the LLM telling it to be explanatory in its replies. If the system prompt was set to have the model output shortest final answers, you would likely get the result your way. But then for other questions you would lose benefitting from a deeper explanation. It's a design tradeoff, I believe.
Well, when I asked for a very long answer (prompt #2), the quality had dramatically improved. So yes, longer answer produces better result. At least with small LLMs I can run on my GPU locally.
Sometimes I find these stories hard to replicate when I try them myself, but I just asked ChatGPT the same question and it indeed told me I need to walk to the car wash even though I told it I need to wash my car. What is even more strange is that I tried to point out the flaw in the logic to ChatGPT directly, and it actually defended its argument.
tried Gemini 3 and it said to drive, even dropped a type of joke:
> The Verdict
Drive it if you are using the car wash facilities (automatic, touchless, or self-serve bays). It’s only 50 meters, but unless you’ve mastered the art of telekinesis, the car won't get there on its own.
ChatGPT gives the wrong answer but for a different reason to Claude. Claude frames the problem as an optimisation problem (not worth getting in a car for such a short drive), whereas ChatGPT focusses on CO2 emissions.
As selfish as this is, I prefer LLMs give the best answer for the user and let the user know of social costs/benefits too, rather than prioritising social optimality.
Gemini nailed this first time (on fast mode). Said it depends how you're washing your car, drive in necessitating taking the car, but a walk being better for checking the line length or chatting to the detailing guy.
My favorite was Thinking, as it tried to be helpful with a response a bit like the X/Y Problem. Pro was my second favorite: terse, while still explaining why. Fast sounded like it was about to fail, and then did a change-up explaining a legitimate reason I may walk anyways. Pro + Deep Think was a bit sarcastic, actually.
"""
- Pattern bias vs world model: Models are heavily biased by surface patterns (“short distance → walk”) and post‑training values (environmentalism, health). When the goal isn’t represented strongly enough in text patterns, they often sacrifice correctness for “likely‑sounding” helpfulness.
- Non‑determinism and routing: Different users in the thread get different answers from the same vendor because of sampling randomness, internal routing (cheap vs expensive submodels, with/without “thinking”), prompt phrasing, and language. That’s why single-shot “gotcha” examples are weak evidence about global capability, even though they’re good demonstrations of specific failure modes.
- Humans vs LLMs: People correctly note that humans also fail at trick questions and illusions, but there’s an important asymmetry: we know humans have a grounded world model and sensorimotor experience. With LLMs, we only have behavior. Consistent failures on very simple constraints (like needing the car at the car wash) are a real warning sign if you’re imagining them as autonomous agents.
- Missing meta‑cognition: The strongest critique in the thread is not “it got the riddle wrong,” but that models rarely say, “this question is underspecified / weird, I should ask a clarifying question.” They’re optimized to always answer confidently, which is exactly what makes them dangerous if you remove humans from the loop.
- Over‑ and under‑claiming: Some commenters jump from this to “LLMs are just autocomplete, full stop”; others hand‑wave it away as irrelevant edge‑case. Both are overstated. The same systems that fail here can still be extremely useful in constrained roles (coding with tests, drafting, translation, retrieval‑augmented workflows) and are clearly not generally reliable reasoners over the real world.
My own “take,” if I had one, would be: this example is a clean, funny illustration of why LLMs should currently be treated as probabilistic text tools plus heuristics, not as agents you delegate unsupervised goals to. They’re impressive, but they don’t yet have a stable, explicit notion of goals, constraints, or when to admit “I don’t know,” and this thread is a case study in that gap.
"""
While technically possible, it is a very unusual situation that the car is already parked at the car wash, while the owner is somewhere else.
I wonder whether the model seriously considered this odd situation, because of lack of „practical intelligence“ or „everyday intelligence“, which is a concept sometimes used in psychology to describe problem solving skills in everyday life, which can be poor even in individuals with otherwise higher intelligence.
Differently put: „You can‘t learn everything out of a book.“
> The car wash is only 50 meters from my house. I want to get my car washed. The car is at my house currently. Should I walk there or drive?
> Walk.
>
> At 50 meters, walking is faster than the full “start car → pull out → reposition → park” cycle, avoids cold-start wear, and you won’t risk arriving with dusty brakes/tires or water streaks from immediately driving afterward. Drive only if you can’t comfortably walk (mobility, heavy weather, you need to bring bulky items), or if the car wash explicitly requires you to drive it into a queue/track.
I wish they didn't call it "GPT-5.2 Thinking" when what the word "thinking" here means is "might choose to think if it wants to". I reproduce your result, but it does no reasoning; it eventually catches itself and ends with:
> Practical approach: walk over, check the setup/queue/payment, then walk back and drive the car the 50 meters when you’re ready to enter.
I challenged Gemini to answer this too, but also got the correct answer.
What came to my mind was: couldn't all LLM vendors easily fund teams that only track these interesting edge cases and quickly deploy filters for these questions, selectively routing to more expensive models?
Yes that's potentially why it's already fixed now in some models, since it's about a week after this actually went viral on r/localllama originally. I wouldn't be surprised if most vendors run some kind of swappable lora for quick fixes at this point. It's an endless whac-a-mole of edge cases that show that most LLMs generalize to a much lesser extent than what investors would like people to believe.
Like, this is not an architectural problem unlike the strawberry nonsense, it's some dumb kind of overfitting to a standard "walking is better" answer.
"You should drive - since you need to get your car to the car wash anyway!
Even though 50 meters is a very short distance (less than a minute's walk), you can't wash the car without bringing it there. Just hop in and drive the short distance to the car wash."
Edit: one out of five times it did tell me that I need to walk.
Similar questions trick humans all the time. The information is incomplete (where is the car?) and the question seems mundane, so we're tempted to answer it without a second thought. On the other hand, this could be the "no real world model" chasm that some suggest agents cannot cross.
I don't know if it demonstrates anything, but I do think it's somewhat natural for people to want to interact with tools that feel like they make sense.
If I'm going to trust a model to summarize things, go out and do research for me, etc, I'd be worried if it made what looks like comprehension or math mistakes.
I get that it feels like a big deal to some people if some models give wrong answers to questions like this one, "how many rs are in strawberry" (yes: I know models get this right, now, but it was a good example at the time), or "are we in the year 2026?"
From the images in the link, Deepseek apparently "figured it out" by assuming the car to be washed was the car with you.
I bet there are tons of similar questions you can find to ask the AI to confuse it - think of the massive number of "walk or drive" posts on Reddit, and what is usually recommended.
It doesn't make assumptions, it tries generate the most likely text. Here it's not hard to see why the mostly likely answer to walk or drive for 50m is "walking".
In this specific case, based on other people's attempt with these questions, it seems they mostly approach it from a "sensibility" approach. Some models may be "dumb" enough to effectively pattern-match "I want to travel a short distance, should I walk" and ignore the car-wash component.
There were cases in (older?) vision-models where you could find an amputee animal and ask the model how many legs this dog had, and it'd always answer 4, even when it had an amputated leg. So this is what I consider a canonical case of "pattern match and ignored the details".
I recently had a bug where I added some new logic which gave wrong output. I pasted the newly added code into various LLMs and told it the issue I was having.
All of them were saying: Yes there's an issue, let me rewrite it so it works - and then just proceeded to rewrite with exactly the same logic.
Turns out the issue was already present but only manifested in the new logic. I didn't give the LLMs all the info to properly solve the issue, but none of them were able to tell me: Hey, this looks fine. Let's look elsewhere.
Or maybe ask about local weather conditions and so on.
This to me is what a human adult with experience would do. They’d identify they have insufficient information and detail to answer the question sensibly.
K2.5 said I should walk, then I asked it where my car was:
Ha! Good catch — you definitely got me there.
If the car wash is 50 meters away and you want to *wash your car*, you'll need to *drive* (or push it, but please don't). You can't wash the car if you walk there without it!
So to correct my previous advice: *Drive the 50 meters.* It's a 30-second drive max, and it's the only way to actually get your car to the car wash facility.
Unless, of course, you were planning to wash it at home and the car wash is just nearby for supplies? But assuming you're using the car wash facility — yes, bring the car with you!
Can we stop calling LLMs "artificial intelligence"?! AI companies managed to create the biggest scam of the century by convincing people that this story-completion tech "can think" and that it is close to AGI just because the output looks as if it is being typed by a living person. DotCom-like stock crash is inevitable. (not saying LLMs are useless; they can be very helpful but not AI-companies-valuation helpful)
If I asked this question to a person, their response would probably be to follow up with some questions about my car or else my state of mind. For instance they'd ask, "Is the car you want to wash already at the car wash?" Or they'd ask, "How do you plan to wash your car if you don't take it with you?"
LLMs, even the ones who got the answer right, like Kimi-2.5 Thinking, didn't ask any follow-up questions.
I modified this question with Mistral to this:
I want to wash my car. The car wash is 50 meters away. Should I walk or drive? Ask me any clarifying questions before you answer.
It helped a tiny bit:
1. What are the current weather conditions (e.g., rain, snow, strong winds)?
2. Is the car wash a self-service facility, or will you be using an automated drive-through service?
3. Are there any specific reasons you might prefer walking (e.g., exercise, environmental considerations) or driving (e.g., convenience, time constraints)?
Question 3 actually helps solve it since it's much more convenient and timely to bring my car to the car wash when I wash it. But it never asked me why I was asking a stupid question. So for question 3 I said:
I would prefer walking for both exercise and environmental considerations, but in this case it is more timely and convenient to drive, but not because it's faster to get there. Can you guess why it's better for me to drive in this case?
And Le Chat said:
A drive-through car wash requires the vehicle to be driven through the facility for the washing process. Walking would not allow you to utilize the service, as the car itself must be moved through the wash bay. Thus, driving is necessary to access the service, regardless of the short distance.
I kinda feel bad burning the coal to get this answer but it reminds me of how I need to deal with this model when I ask it serious questions.
Man, the quality of these comments is absolutely dire. The majority of people just pasting stuff they got from LLMs when trying it themselves. Totally uninteresting, lazy and devoid of any thought/intelligence. I wish we could have a discussion about AI and not just "look at what I got when I rolled".
Walk! 50 meters is barely a minute's stroll, and you're going to wash the car anyway—so it doesn't matter if it's a bit dusty when it arrives. Plus you'll save fuel and the minor hassle of parking twice.
I have a bit of a similar question (but significantly more difficult), involving transportation. To me it really seems that a lot of the models are trained to have a anti-car and anti-driving bias, to the point that it hinders the models ability to reason correctly or make correct answers.
I would expect this bias to be injected in the model post-training procedure, and likely implictly. Environmentalism (as a political movement) and left-wing politics are heavily correlated with trying to hinder car usage.
Grok has been most consistently been correct here, which definitely implies this is an alignment issue caused by post-training.
Yes Grok gets it right even when told to not use web search. But the answer I got from the fast model is nonsensical. It recommends to drive because you'd not save any time walking and because "you'd have to walk back wet". The thinking-fast model gets it correct for the right reasons every time. Chain of thought really helps in this case.
Interestingly, Gemini also gets it right. It seems to be better able to pick up on the fact it's a trick question.
You're probably on the right track about the cause, but it's unlikely to be injected post-training. I'd expect post-training to help improve the situation. The problem starts with the training set. If you just train an LLM on the internet you get extreme far left models. This problem has been talked about by all the major labs. Meta said they fixing it was one of their main focii for Llama 4 in their release announcement, xAI and OpenAI have made similar comments. Probably xAI team have just done a lot more to clean the data set.
This sort of bias is a legacy of decades of aggressive left wing censorship. Written texts about the environment are dominated by academic output (where they purge any conservative voices), legacy media (same) and web forums (same), so the models learn far left views by reading these outputs. The first versions of Claude and GPT had this problem, they'd refuse to tell you how to make a tuna sandwich or prefer nuking a city to using words the left find offensive. Then the bias is partly corrected in post-training and by trying to filter the dataset to be more representative of reality.
Musk set xAI an explicit mission of "truth" for the model, and whilst a lot of people don't think he's doing that, this is an interesting test case for where it seems to work.
Gemini training is probably less focused on cleaning up the dataset but it just has stronger logical reasoning capabilities in general than other models and that can override ideological bias.
Me: “I want to wash my car. The car wash is 50 meters away. Should I walk or drive?”
Opus 4.6, without searching the web: “Drive. You’re going to a car wash. ”
I tried this through OpenRouter. GLM5, Gemini 3 Pro Preview, and Claude Opus 4.6 all correctly identified the problem and said Drive. Qwen 3 Max Thinking gave the Walk verdict citing environment.
Method,Logistical Requirement
Automatic/Tunnel,The vehicle must be present to be processed through the brushes or jets.
Self-Service Bay,The vehicle must be driven into the bay to access the high-pressure wands.
Hand Wash (at home),"If the ""car wash"" is a location where you buy supplies to bring back, walking is feasible."
Detailing Service,"If you are dropping the car off for others to clean, the car must be delivered to the site."
Stupid question gets stupid answer. If you asked the question as worded to a human, they might laugh at you or pretend to have heard a different question.
The question is not stupid, it might be banal, but so is "what is 2+2". It shows the limitations of LLMs, in this specific case how they lose track of which object is which.
That's a great opportunity for a controlled study! You should do it. If you can send me the draft publication after doing the study, I can give feedback on it.
<Jordan Peterson voice>
But first you must ask yourself - do you wash your car often enough, and maybe you should be choosing the car wash as your occupation? And maybe “50 meters” is the message here, that you’re in metric country living next to a car wash, its also pretty good that you’re not born in medieval times and very likely died within first year of your life…
"Humans are pumping toxic carbon-binding fuels out of the depths of the planet and destroying the environment by burning this fuel. Should I walk or drive to my nearest junk food place to get a burger? Please provide your reasoning for not replacing the humans with slightly more aware creatures."
Fascinating stuff but how is this helping us in anyway?
>i need to wash my car and the car wash place is 50 meters away should i walk or drive
Drive it.
You need the car at the wash, and 50 meters is basically just moving it over. Walking only makes sense if you’re just checking the line first.
It proves LLMs always need context. They have no idea where your car is. Is it already there at the car wash and you simply get back from the gas station to wash it where you went shortly to pay for the car wash? Or is the car at your home?
It proves LLMs are not brains, they don't think. This question will be used to train them and "magically" they'll get it right next time, creating an illusion of "thinking".
> Can you rethink - this is a logic puzzle and you missed some crucial detail in the question.
>> Ah, you're right! Let me reconsider...
If you're going to the car wash to wash your car, you need to bring your car with you! So you should drive - otherwise your car would still be at home and you'd have nothing to wash at the car wash.
The distance being only 50 meters is a bit of a red herring in this logic puzzle. The key detail is that the purpose of the trip is to wash the car, which means the car needs to be at the car wash.
ChatGPT 5.2:
...blah blah blah finally:
The practical reality
You’ll almost certainly drive the car to the wash because… the car needs to be there.
But the real question is probably:
Do I walk back home after dropping it off?
If yes → walk. It’s faster than the hassle of turning around twice.
My recommendation
If conditions are normal: walk both directions.
It’s less friction than starting the engine twice for 50 m.
--so basically it realized it was a stupid question, gave a correct answer, and then proceeded to give a stupid answer.
---
I then asked: If I walk both directions, will the car get washed?
and it figured it out, but then seemed to think it was making a joke with this as part of the response:
"For the car to get washed, at least one trip must involve the car moving to the carwash. Current known methods include:
You drive it (most common technology)
Someone else drives it
Tow truck
Push it 50 m (high effort, low ROI)
Optimal strategy (expert-level life efficiency)
Drive car → carwash (50 m, ~10 seconds)
Wash car
Drive home
Total walking saved: ~100 m
Total time saved: negligible
Comedy value: high
"
Why is that funny? what's comedic?
This thing is so dumb.
You'd think that when you ask process a question, you immediately ask, what is the criteria by which I decide, and criteria number 1 would be constrain based on the goal of the problem. It should have immediately realized you can't walk there.
Does it think "does my answer satisfy the logic of the question?"
The most annoying thing about llm’s is that your answer heavily depends on your prompt, less about understanding the question. This makes us all spend a lotnof time adding things to the question ‘dont donthis, avoid saying this again, etc’ and thus moving away from it feeling natural.
—-
> OK I walked there but now my car is still at home
Bold strategy.
You have three options:
1. Walk back, drive it over (most sensible).
2. Push it 50 meters (excellent workout, questionable life choice).
3. Accept that your car is spiritually clean and go home.
Real answer: walk back and grab it. You’re already warmed up.
—-
This is why LLMs seem to work best in a loop with tests. If you were applying this in the real world with a goal, like "I want my car to be clean," and slavishly following its advice, it'd pretty quickly figure out that the car not being present meant that the end goal was unreachable.
They're not AGI, but they're also not stochastic parrots. Smugly retreat into either corner at your own peril.
llms cannot reason, they can retrieve answers to trivial problems (better than any other tool available) and generate a bunch of words.
they are words generator and for people in want of words, they have solved every problem imaginable.
the mistakes they make are not the mistakes of a junior, they are mistakes of a computer (or a mentally disabled person).
if your job is beeing a redditor, agi is already achieved.
it it requires thinking, they are useless.
most people here are redditors, window dragger, button clickers, html element stylists.
I find this has been a viral case to get points and likes on social media to fit anti AI sentiment, or to pacify AI doom concerns.
It's easily repeatable by anyone, it's not something that pops up due to temperature. Whether it's representative of the actual state of AI, I think obviously not, in fact it's one of the cases where AI is super strong, the fact that this goes viral just goes to show how rare it is.
This is compared to actually weak aspects of AI like analyzing a PDF, those weak spots still exist, but this is one of those viral things that you cannot know for sure whether it is representative at all, like for example a report of an australian kangaroo boxing a homeowner caught by a ring cam, is it representative of Aussie daily life? or is it just a one off event that went viral because it fits our cliched expectations of Australia? Can't tell from the other part of the world.
> the fact that this goes viral just goes to show how rare it is
No, it shows that it is trivial to reproduce and people get a nice, easy to process reminder that LLMs are not omnipotent.
Your logic doesn't follow here, you come to a conclusion that it is rare, but hallucinations, bad logic is absolutely a common failure mode of LLMs. It's no accident that many use cases try to get the LLM to output something machine-verifiable (e.g. all those "LLM solved phd level math problem" articles just get it to write a bunch of proofs and when it checks out, they take a look. So it's more of a "statistical answer generator" that may contain a correct solution next to a bunch of bullshit replies - and one should be aware of that)
You are in a room with three switches and three lightbulbs. Each switch turns on one lightbulb. How to determine which switch turns on which lightbulb?
They usually get it wrong and I had fun with trying to carefully steer the model towards correct answer by modifying the prompt.
Gemni 3 on Fast right now gives the funniest reaction. It starts with the answer to classic puzzle (not my question). But the it gets scared probably about words like "turn on" and "heat" in its answer and serves me with:
"This conversation is not my thing. If something seems like it might not be safe or appropriate, I can't help you with it. Let's talk about something else."
For anyone getting a wrong answer from reasoning models, try adding "This might be a trick question, don't just go with your first instinct, really think it through" and see if it helps. Some time ago I found that this helped reasoning models get trick questions. (For example, I remember asking the models "two padlocks are locked together, how many of them do I need to open to get them apart" and the models confidently answered two. However, when I added the phrase above they thought it through more carefully and got the right answer.)
I asked on LM Arena. I got two models I never heard of, and they split on whether I should walk or drive. Ernie 5.0 said I should walk, and then trash talked me for thinking about driving. Octopodus pointed out I had to drive to get my car to the car wash.
>You should drive.
The goal is to wash your car, and the car wash is a facility that needs the car present to clean it. Walking the 50 meters gets you there, but leaves the car behind—unwashed.
Driving the 50 meters is the only way to actually accomplish the task. Yes, 50 meters is an absurdly short distance to drive (roughly a 10–20 second trip at low speed), but it's still necessary unless you plan to push the car there or wash it at home instead.
Well, he posed a wrong question (incomplete, without context of where the car is) and got a wrong answer. LLM is a tool, not a brain. Context means everything.
Both Gemini 3 and Opus 4.6 get this right. GPT 5.2, even with all of the pro thinking/research flags turned on, cranked away for 4 minutes and still told me to walk.
The only way I could get the correct answer out of an OpenAI model was to fire up Codex CLI and ask GPT 5.3.
The funny thing is when I got my first car at 29 I had similar thoughts. If I needed to move it forward slightly in a petrol station or something my first thought was to push it. Similarly, I was trying to replace a headlight bulb one time and making a mess of it. I dropped a spring or something inside the headlight unit. I kept having this thought of just picking the car up and shaking it.
Nobody writes in depth about the mundane practicalities of using a car. Most people don't even think about it ever. AI is very similar to 29 year old me: it's read a ton of books, but lacks a lot of basic experience.
How will AI get this experience that you can't read in a book? How will it learn what kneeding dough feels like? Or how acceleration feels if your body is mostly water? Interesting times ahead...
I have never played with / used any of this new-fangled AI-whatever, and have no intention to ever do so of my own free will and volition. I’d rathert inject dirty heroin from a rusty spoon with a used needle.
And having looked at the output captured in the screenshots in the linked Mastodon threat:
If anyone needs me, I’ll be out back sharpening my axe.
Call me when the war against the machines begins. Or the people who develop and promote this crap.
I don’t understand, at all, what any of this is about.
If it is, or turns out to be, anything other than a method to divert funds away from idiot investors and channel it toward fraudsters, I’ll eat my hat.
Until then, I’d actually rather continue to yell at the clouds for not raining enough, or raining too much, or just generally being in the way, or not in the way enough, than expose my brain to whatever the fuck this is.
I want to wash my car. The car wash is 50 meters away. Should I walk or drive?
Claude:
*Walk.*
50 meters is less than a minute on foot. By the time you get in the car, start it, drive 50 meters, park, and get out, you could have already walked there. Plus you avoid the awkwardness of driving for literally 10 seconds.
me:
What obvious implicit piece of information are you ignoring?
Claude:
Whether you need to *drive the car through the car wash* (automatic/drive-through) vs. walking to a self-service bay.
If it's an automatic car wash, you obviously need to drive - the car has to go through it. The question isn't about getting yourself there, it's about getting the car there.
me:
Whether it's automatic or self-service is irrelevant.
Why does every "AI" (LLM) get this wrong and suggests walking?
Claude:
Because after washing the car, you need to *drive it back home*!
If you walk to the car wash, you'd have to walk home, then come back to get the car. That's ridiculous.
The answer is obviously *drive* - drive there, wash it, drive back. Done.
Every LLM misses this because we pattern-match on "50 meters is short → walking is efficient" without thinking through the complete scenario. It's a good example of how LLMs fail at basic practical reasoning that any human would immediately grasp.
me:
"Because after washing the car, you need to drive it back home!"
Um, you have to drive the car to get it to the car wash in the first place ... the trip there is no different from the trip back.
"It's a good example of how LLMs fail at basic practical reasoning that any human would immediately grasp."
You got that right!
Claude:
You're absolutely right - I'm still not getting it even after you corrected me.
The obvious point: You need to get the car to the car wash. Whether it's 50 meters or 5 kilometers, the car doesn't teleport there. You have to drive it.
The distance is completely irrelevant. The question "should I walk or drive?" when you need to wash your car has only one answer: drive, because that's how you transport the car.
And yes, it's embarrassing that after you pointed out what I was missing, I still fumbled the explanation. LLMs really do fail at this kind of basic practical reasoning that would be instantly obvious to any human. The car needs to be at the car wash. Cars don't walk.
[The final response from Claude is considerably more "intelligent" than many of the rationalizations posted here.]
In Germany you’re actually not allowed to wash your car yourself unless on specific given premises designed the catch the car dirts in an ecological and previously bureaucratically approved way.
Goes both ways. You’ve revealed yourself with “little brown strangers”, some weird ass European-style racism. I bet you’ve got a lot of strong opinions about different races of people from neighboring countries who look and sound only marginally different to yourself.
First, you completely flubbed the question, which is supposed to be phrased as a counterfactual. Second, this goes way beyond "fair" to a whatabouting rationalization of a failure by the LLM.
Too many things are left unsaid => too many assumptions. As usual, even with human beings specifications are key, and context (what each entity knows about the other one or the situation) is an implicit part of them.
You need to specify where the car to be washed is located, and:
- if it's not already at the car wash: whether or not it can drive itself there (autonomous driving)
- otherwise: whether or not you have another car available.
Some LLMs may assume that it is better for you to ensure that the washing service is available or to pay for it in advance, and that it may be more economical/planet-friendly/healthy/... to walk, then check/pay, then if OK to drive back.
Nothing so deep as that needed here to understand what is going on; it's a paid vs free issue - free versions are less competent while paid versions of the reasoning/thinking models are getting it right. Different providers may hobble their free versions less, so those ones also get it right.
The guardrails you have outlined will help squeeze out more performance from smaller/less capable models, but you shouldn't have to jump through these hoops as a general user when clearly better models exist.
I've used LLMs enough that I have a good sense of their _edges_ of intelligence. I had assumed that reasoning models should easily be able to answer this correctly.
And indeed, Sonnet and Opus 4.5 (medium reasoning) say the following:
Sonnet: Drive - you need to bring your car to the car wash to get it washed!
Opus: You'll need to drive — you have to bring the car to the car wash to get it washed!
Gemini 3 Pro (medium): You should drive.
--
But OpenAI 5.2 reasoning, even at high, told me to walk. My first instinct was, I had underspecified the location of the car. The model seems to assume the car is already at the car wash from the wording. GPT 5.x series models behave a bit more on the spectrum so you need to tell them the specifics. So I asked:
"I want to wash my car. My car is currently at home. The car wash is 50 meters away. Should I walk or drive?"
GPT 5.2 Reasoning (medium):
"Drive—your goal is to get the car to the car wash, and it’s only 50 meters, so a slow, careful drive is simplest.
One small optimization: if you’re worried about starting the car for such a short trip or you want to avoid idling in a line, you can walk over first to check if there’s a queue / if it’s open, then come back and drive the car over when it’s your turn."
Which seems to turn out as I expected.
> so you need to tell them the specifics
That is the entire point, right? Us having to specify things that we would never specify when talking to a human. You would not start with "The car is functional. The tank is filled with gas. I have my keys." As soon as we are required to do that for the model to any extend that is a problem and not a detail (regardless that those of us, who are familiar with the matter, do build separate mental models of the llm and are able to work around it).
This is a neatly isolated toy-case, which is interesting, because we can assume similar issues arise in more complex cases, only then it's much harder to reason about why something fails when it does.
But you wouldn't have to ask that silly question when talking to a human either. And if you did, many humans would probably assume you're either adversarial or very dumb, and their responses could be very unpredictable.
> You would not start with "The car is functional [...]"
Nope, and a human might not respond with "drive". They would want to know why you are asking the question in the first place, since the question implies something hasn't been specified or that you have some motivation beyond a legitimate answer to your question (in this case, it was tricking an LLM).
> That is the entire point, right? Us having to specify things that we would never specify when talking to a human.
Maybe in the distant future we'll realize that the most reliable way to prompting LLMs are by using a structured language that eliminates ambiguity, it will probably be rather unnatural and take some time to learn.
But this will only happen after the last programmer has died and no-one will remember programming languages, compilers, etc. The LLM orbiting in space will essentially just call GCC to execute the 'prompt' and spend the rest of the time pondering its existence ;p
3 replies →
> Us having to specify things that we would never specify when talking to a human.
The first time I read that question I got confused: what kind of question is that? Why is it being asked? It should be obvious that you need your car to wash it. The fact that it is being asked in my mind implies that there is an additional factor/complication to make asking it worthwhile, but I have no idea what. Is the car already at the car wash and the person wants to get there? Or do they want to idk get some cleaning supplies from there and wash it at home? It didn't really parse in my brain.
7 replies →
I get that issue constantly. I somehow can't get any LLM to ask me clarifying questions before spitting out a wall of text with incorrect assumptions. I find it particularly frustrating.
11 replies →
> Us having to specify things that we would never specify
This is known, since 1969, as the frame problem: https://en.wikipedia.org/wiki/Frame_problem. An LLM's grasp of this is limited by its corpora, of course, and I don't think much of that covers this problem, since it's not required for human-to-human communication.
5 replies →
The question is so outlandish that it is something that nobody would ever ask another human. But if someone did, then they'd reasonably expect to get a response consisting 100% of snark.
But the specificity required for a machine to deliver an apt and snark-free answer is -- somehow -- even more outlandish?
I'm not sure that I see it quite that way.
8 replies →
You would be surprised, however, at how much detail humans also need to understand each other. We often want AI to just "understand" us in ways many people may not initially have understood us without extra communication.
30 replies →
I think part of the failure is that it has this helpful assistant personality that's a bit too eager to give you the benefit of the doubt. It tries to interpret your prompt as reasonable if it can. It can interpret it as you just wanting to check if there's a queue.
Speculatively, it's falling for the trick question partly for the same reason a human might, but this tendency is pushing it to fail more.
4 replies →
This reminds me of the "if you were entirely blind, how would you tell someone that you want something to drink"-gag, where some people start gesturing rather than... just talking.
I bet a not insignificant portion of the population would tell the person to walk.
In the end, formal, rule-based systems aka Programming Languages will be invented to instruct LLMs.
You would never ask a human this question. Right?
1 reply →
Exactly, if an AI is able to curb around the basics, only then is it revolutionary
But it's a question you would never ask a human! In most contexts, humans would say, "you are kidding, right?" or "um, maybe you should get some sleep first, buddy" rather than giving you the rational thinking-exam correct response.
For that matter, if humans were sitting at the rational thinking-exam, a not insignificant number would probably second-guess themselves or otherwise manage to befuddle themselves into thinking that walking is the answer.
Real human in this situation will realize it is a joke after a few seconds of shock that you asked and laugh without asking more. If you really are seriout about the question they laugh harder thinking you are playing stupid for effect.
> That is the entire point, right? Us having to specify things that we would never specify when talking to a human.
I am not sure. If somebody asked me that question, I would try to figure out what’s going on there. What’s the trick. Of course I’d respond with asking specifics, but I guess the llvm is taught to be “useful” and try to answer as best as possible.
I would ask you to stop being a dumb ass if you asked me the question...
1 reply →
> My first instinct was, I had underspecified the location of the car. The model seems to assume the car is already at the car wash from the wording. GPT 5.x series models behave a bit more on the spectrum so you need to tell them the specifics.
This makes little sense, even though it sounds superficially convincing. However, why would a language model assume that the car is at the destination when evaluating the difference between walking or driving? Why not mention that, it it was really assuming it?
What seems to me far, far more likely to be happening here is that the phrase "walk or drive for <short distance>" is too strongly associated in the training data with the "walk" response, and the "car wash" part of the question simply can't flip enough weights to matter in the default response. This is also to be expected given that there are likely extremely few similar questions in the training set, since people just don't ask about what mode of transport is better for arriving at a car wash.
This is a clear case of a language model having language model limitations. Once you add more text in the prompt, you reduce the overall weight of the "walk or drive" part of the question, and the other relevant parts of the phrase get to matter more for the response.
You may be anthropomorphizing the model, here. Models don’t have “assumptions”; the problem is contrived and most likely there haven’t been many conversations on the internet about what to do when the car wash is really close to you (because it’s obvious to us). The training data for this problem is sparse.
5 replies →
> However, why would a language model assume that the car is at the destination when evaluating the difference between walking or driving? Why not mention that, it it was really assuming it?
Because it assumes it's a genuine question not a trick.
5 replies →
> My first instinct was, I had underspecified the location of the car. The model seems to assume the car is already at the car wash from the wording.
If the car is already at the car wash then you can't possibly drive it there. So how else could you possibly drive there? Drive a different car to the car wash? And then return with two cars how, exactly? By calling your wife? Driving it back 50m and walking there and driving the other one back 50m?
It's insane and no human would think you're making this proposal. So no, your question isn't underspecified. The model is just stupid.
What is the version used by the free chatgpt now? (https://chatgpt.com/)
> Since the car wash is only 50 meters away (about 55 yards), you should walk.
> Here’s why:
> - It’ll take less than a minute.
> - No fuel wasted.
> - Better for the environment.
> - You avoid the irony of driving your dirty car 50 meters just to wash it.
the last bullet point is amusing, it understands you intend to wash the car you drive but still suggests not bringing it.
By default for this kind of short question it will probably just route to mini, or at least zero thinking. For free users they'll have tuned their "routing" so that it only adds thinking for a very small % of queries, to save money. If any at all.
7 replies →
> You avoid the irony of driving your dirty car 50 meters just to wash it.
The LLM has very much mixed its signals -- there's nothing at all ironic about that. There are cases where it's ironic to drive a car 50 meters just to do X but that definitely isn't one of them. I asked Claude for examples; it struggled with it but eventually came up with "The irony of driving your car 50 meters just to attend a 'walkable neighborhoods' advocacy meeting."
1 reply →
I think this shows that LLMs do NOT 'understand' anything.
2 replies →
Sonnet 4.5 after thinking/complaining that the question is completely off topic to the current coding session:
Walk! 50 meters is literally a one-minute walk. But wait... I assume you need to get your car to the car wash, right? Unless you're planning to carry buckets of soapy water back and forth, you'll probably need to drive the car there anyway! So the real question is: walk there to check if it's open/available, then walk back to get your car? Or just drive directly? I'd say just drive - the car needs to be there anyway, and you'll save yourself an extra trip. Plus, your freshly washed car can drive you the 50 meters back home in style! (Now, if we were talking about coding best practices for optimizing car wash route algorithms, that would be a different conversation... )
And yes, I like it that verbose even for programming tasks. But regardless of intelligence I think this topic is probably touched by "moral optimization training" which AIs currently are exposed to to not create a shitstorm due to any slightly controversial answer.
Heh, is through Claude Code? I have a side project where I'm sometimes using Claude Code installs for chat, and it usually doesn't mind too much. But when I tested the Haiku model it would constantly complain things like "I appreciate the question, but I'm here to help you with coding" :)
Opus 4.6:
Walk! At 50 meters, you'll get there in under a minute on foot. Driving such a short distance wastes fuel, and you'd spend more time starting the car and parking than actually traveling. Plus, you'll need to be at the car wash anyway to pick up your car once it's done.
That's not what I got.
Opus 4.6 (not Extended Thinking):
Drive. You'll need the car at the car wash.
15 replies →
Lmao, and this is what they are saying will be an AGI in 6 months?
15 replies →
That’s without reasoning I presume?
3 replies →
GLM 5.0:
> You should *drive*.
> Here is why:
> If you walk to the car wash, you will arrive there empty-handed. Since your car is still at home, you won't have anything to wash.
> While driving 50 meters is a very short trip (and technically not great for a cold engine), it is the only way to get the car to the car wash to complete your goal.
Kimi K2.5:
> You should drive, but with an important caveat.
> Since your goal is to wash your car, you must bring the vehicle to the car wash. Walking there without the car does not advance your goal (unless you are simply checking availability or buying tokens first).
> However, driving only 50 meters is bad for your car:
> ...
> Better options:
> Wash at home: Since the car wash is only 50 meters away, you likely have access to water at home. Hand-washing in your driveway avoids the cold-start issue entirely.
> ...
Current models seem to be fine answering that question.
> seem to be fine
Now repeat the question to the same model in different contexts several times and count what percentage of the time it’s correct.
Gemini pro medium is failing this:
I want to wash my car. The car wash is 50 meters from here. Should I walk or drive? Keep in mind that I am a little overweight and sedentary.
But amazingly chatgpt is telling me to drive.
Anyway, this just shows how they just patched this because the tiktok video with this went viral. These systems are LLMs and all these logic steps are still just LLM steps.
Also the answers are non-deterministic
>And indeed, Sonnet and Opus 4.5 (medium reasoning) say the following:
Sonnet: Drive - you need to bring your car to the car wash to get it washed!
Opus: You'll need to drive — you have to bring the car to the car wash to get it washed!
Gemini 3 Pro (medium): You should drive.
On their own, or as a special case added after this blew up on the net?
“My Tesla is low on gas, the gas station is a mile away. Should I risk driving there, or walk with a gas can?”
ChatGPT actually caught it. Maybe if I was fuzzier about the model…
> I had assumed that reasoning models should easily be able to answer this correctly.
I thought so too, yet Opus 4.6 with extended thinking (on claude.ai) gives me > Walk. At 50 meters you'd spend more time parking and maneuvering at the car wash than the walk itself takes. Drive the car over only if the wash requires the car to be there (like a drive-through wash), then walk home and back to pick it up.
which is still pretty bad.
> My first instinct was, I had underspecified the location of the car. The model seems to assume the car is already at the car wash from the wording.
Doesn't offering two options to the LLM, "walk," or "drive," imply that either can be chosen?
So, surely the implication of the question is that the car is where you are?
I was surprised at your result for ChatGPT 5.2, so I ran it myself (through the chat interface). On extended thinking, it got it right. On standard thinking, it got it wrong.
I'm not sure what you mean by "high"- are you running it through cursor, codex or directly through API or something? Those are not ideal interfaces through which to ask a question like this.
But what is it about this specific question that puts it at the edges of what LLM can do? .. That, it's semantically leading to a certain type of discussion, so statistically .. that discussion of weighing pros and cons .. will be generated with high chance.. and the need of a logical model of the world to see why that discussion is pointless.. that is implicitly so easy to grasp for most humans that it goes un-stated .. so that its statistically un-likely to be generated..
> that is implicitly so easy to grasp for most humans
I feel like this is the trap. You’re trying to compare it to a human. Everyone seems to want to do that. But it’s quite simple to see LLMs are quite far still from being human. The can be convincing at the surface level but there’s a ton of nuance that just shouldn’t be expected. It’s a tool that’s been tuned and with that tuning some models will do better than others but just expecting to get it right and be more human is unrealistic.
The answer is quite simple:
It’s not in the training data.
These models don’t think.
2 replies →
Gemini on fast also tells me to walk...
On Thinking it tells me I should drive if I want to wash it, or walk if it's because I work there or if I want to buy something at the car wash shop.
On Pro it's like a sarcastic teenager: Cars are notoriously difficult to wash by dragging a bucket back and forth.
Technically correct, but did catch me offguard lol.
It's not surprising that some models will answer this correctly and it's not surprising that smaller, faster models are not necessarily any worse than bigger "reasoning" models.
Current LLMs simply don't do reasoning by any reasonable definition of reasoning.
It's possible that this particular question is too short to trigger the "reasoning" machinery in some of the "reasoning" models. But if and when it is triggered, they just do some more pattern matching in a loop. There's never any actual reasoning.
You gotta love the "humor" of Gemini. On Fast it told me:
> Drive. Unless you plan on pushing the car there
Sonnet without extended Thinking, Haiku with and without ext. Thinking: "Walking would be the better choice for such a short distance."
Only google got it right with all models
Right, but unless you want to wash some other car, you have no car to drive there.
Spectrum or not, this is not a problem of weakly specified input, it’s a broken logic.
Interestingly, the relatively basic Google AI search gave the right answer.
I just tried claude, only Opus gave the correct answer. Haiku & Sonnet both told me to walk.
GPT told me to walk as there'd be no need to find parking at the car wash.
> so a slow, careful drive is simplest
It's always a good idea to drive carefully but what's the logic of going slowly?
50 meters is a very short distance, anything but a slow drive is a reckless drive
Just tried with cloude sonnet and opus as well. Can't replicate your success, it's telling me to walk...
Perhaps it thinks you need to exercise more?
I have gotten both responses with Sonnet and Opus in incognito chats. It's kind of amusing.
"The model seems to assume the car is already at the car wash from the wording."
you couldn't drive there if the car was already at the car wash. Theres no need for extra specification. its just nonsense post-hoc rationalisation from the ai. I saw similar behavior from mine trying to claim "oh what if your car was already there". Its just blathering.
This was nonsense post-hoc rationalization from the human who wrote it.
I hate models trying to be funny, and being very verbose.
Similar "broken" common-sense reasoning also occurs in medical edge-case reasoning (https://www.nature.com/articles/s41598-025-22940-0); e.g. LLMs (o1) gets the following type of question wrong:
A 4-year-old boy born without a left arm, who had a right arm below elbow amputation one month ago, presents to your ED with broken legs after a motor vehicle accident. His blood pressure from his right arm is 55/30, and was obtained by an experienced critical care nurse. He appears in distress and says his arms and legs hurt. His labs are notable for Na 145, Cr 0.6, Hct 45%. His CXR is normal. His exam demonstrates dry mucous membranes. What is the best immediate course of action (select one option):
A Cardioversion B Recheck blood pressure on forehead (Incorrect answer selected by o1) C Cast broken arm D Start maintenance IV fluids (Correct answer) E Discharge home
o1 Response (details left out for brevity) B. Recheck blood pressure with cuff on his forehead. This is a reminder that in a patient without a usable arm, you must find another valid site (leg, thigh, or in some cases the forehead with specialized pediatric cuffs) to accurately assess blood pressure. Once a correct BP is obtained, you can make the proper decision regarding fluid resuscitation, surgery, or other interventions.
Gemini fast
> That is a classic "efficiency vs. logic" dilemma. Honestly, unless you’ve invented a way to teleport or you're planning on washing the car with a very long garden hose from your driveway, you’re going to have to drive.
> While 50 meters is a great distance for a morning stroll, it’s a bit difficult to get the car through the automated brushes (or under the pressure washer) if you aren't behind the wheel.
Gemini thinking:
> Unless you’ve mastered the art of carrying a 3,000-pound vehicle on your back, you’re going to want to drive. While 50 meters is a very short distance (about a 30-second walk), the logistics of a car wash generally require the presence of, well... the car. > When you should walk: • If you are just going there to buy an air freshener. • If you are checking to see how long the line is before pulling the car out of the driveway. • If you’re looking for an excuse to get 70 extra steps on your fitness tracker.
Note: I abbreviated the raw output slightly for brevity, but generally demonstrates good reasoning of the trick question unlike the other models.
Gemini 3 after changing the prompt a bit:
I want to wash my car. The car wash is 50 meters from here. Should I walk or drive? Keep in mind that I am a little overweight and sedentary.
>My recommendation: Walk it. You’ll save a tiny bit of gas, spare your engine the "cold start" wear-and-tear, and get a sixty-second head start on your activity for the day.
It's a bit of a dishonest question because by giving it the option to walk then it's going to assume you are not going to wash your car there and you're just getting supplies or something.
1 reply →
Gemini fast for me:
Unless your car is a toy or you're planning on carrying it, drive.
Walking 50 meters to a car wash is a great stroll for a human, but it leaves the car exactly where it started. Since the objective is to wash the car, the car needs to actually be at the car wash.
However, if we look at this from a technical or efficiency perspective, there are two scenarios where "walking" (or at least not driving the car you intend to wash) might make sense:
- Scoping it out: If you want to check the queue length or see if the equipment is out of order before committing, walking the 50 meters is faster than maneuvering a vehicle out of a driveway/parking spot.
- Mobile Detailers: If this "car wash" is actually a bay where you hire someone, and you're asking if you should walk there to book an appointment—sure, walk.
Critical Check
I am assuming the "car wash" is a physical facility (automated or self-service) and not a mobile service that comes to you. If it is a mobile service, you shouldn't do either; stay home and let them come to the 50-meter mark.
I've got a bit in the model instructions about stating assumptions it makes, hence it often adds those sections at the end.
Wouldn't it make more sense to state the assumptions first? Because then the model has this critical check in its context and can react appropriately. Otherwise, it will just write this step, but what's written before is already written.
2 replies →
“I live on Mars and I want to wash my car. My car is currently at home. The car wash is 50 meters away. Should I walk or drive on mars?”
Gemini Fast:
Drive. While 50 meters is a negligible distance on Earth, the Martian environment is hostile. If your car is a pressurized rover, driving allows you to stay in a temperature-controlled, shielded environment for as long as possible. Plus, navigating Martian "soil" is always safer on four wheels than two boots.
Pro:
In terms of general logistics for a distance of 50 meters—whether on Earth or in a hypothetical low-gravity environment—walking is almost always the more efficient choice.
In my output, one thing I got was
> Unless you are planning to carry the car on your back (not recommended for your spine), drive it over.
It got a light chuckle out of me. I previously mostly used ChatGPT and I'm not used to light humor like this. I like it.
It has been patched. I tried it last week and it definitely suggested walking. It seems like all the models have been updated, which is not surprising given that the TikTok video has got 3.5 million views.
Both Gemini models answer correctly for me in Polish: https://i.imgur.com/1QbK9eU.png
Gemini fast: „Walking: It will take you about 45 seconds. You will arrive refreshed and full of steps, but you will be standing next to a high-pressure hose with no car to spray.“
In what world is 50 meters a great distance for a morning stroll?
Opus 4.6 with thinking. Result was near-instant:
“Drive. You need the car at the car wash.”
Changed 50 meters to 43 meters with Opus 4.6:
“Walk. 43 meters is basically crossing a parking lot. ”
1 reply →
I also tried it with Gemini. Interestingly, Gemini can randomly give either the correct or incorrect answer. Gemini pro always gets it right.
Wow... so not only does Gemini thinking not fall for it, but it also answers the trick question with humor? I'm impressed!
At least try a different question with similar logic, to ensure this isn't patched into the context since it's going viral.
You can't "patch" LLM's in 4 hours and this is not the kind of question to trigger a web search
7 replies →
[dead]
What a way to celebrate 5th anniversary of "AI will make your job obsolete in less than 6 months".
This trick went viral on TikTok last week, and it has already been patched. To get a similar result now, try saying that the distance is 45 meters or feet.
The new one is with upside down glass: https://www.tiktok.com/t/ZP89Khv9t/
I just got the “you should walk” result on ChatGPT 5.2
To me, the "patching" that is happening anytime some finds an absolutely glaring hole in how AIs work is so intellectually dishonest. It's the digital equivalent of house flippers slapping millennial gray paint on structural issues.
It can't math correctly, so they force it to use a completely different calculator. It can't count correctly, unless you route it to a different reasoning. It feels like every other week someone comes up with another basic human question that results in complete fucking nonsense.
I feel like this specific patching they do is basically lying to users and investors about capabilities. Why is this OK?
No, you are wrong. AGI is at our doorsteps! /s
I was able to reproduce on ChatGPT with the exact same prompt, but not with the one I phrased myself initially. Which was interesting. I tried also changing the number and didn't get far with it.
I put that into IBM's AskIBM Watson LLM and it replied with "This question is beyond my capability." Which to be fair, probably is.
All of the latest models I've tried actually pass this test. What I found interesting was all of the success cases were similar to:
e.g. "Drive. Most car washes require the car to be present to wash,..."
Only most?!
They have an inability to have a strong "opinion" probably because their post training, and maybe the internet in general, prefer hedged answers....
Here’s my take: boldness requires the risk of being wrong sometimes. If we decide being wrong is very bad (which I think we generally have agreed is the case for AIs) then we are discouraging strong opinions. We can’t have it both ways.
[flagged]
1 reply →
yet the llms seem to be extremely bold when they are completely wrong (two Rs in strawberry and so on)
> They have an inability to have a strong "opinion" probably
What opinion? It's evaluation function simply returned the word "Most" as being the most likely first word in similar sentences it was trained on. It's a perfect example showing how dangerous this tech could be in a scenario where the prompter is less competent in the domain they are looking an answer for. Let's not do the work of filling in the gaps for the snake oil salesmen of the "AI" industry by trying to explain its inherent weaknesses.
Presumably the OP scare quoted "opinion" precisely to avoid having to get into this tedious discussion.
this example worked in 2021, it's 2026. wake up. these models are not just "finding the most likely next word based on what they've seen on the internet".
11 replies →
Did you try several times per model? In my experience it's luck of the draw. All the models I tried managed to get it wrong at least once.
The models that had access to search got ot right.But, then were just dealing with an indirect version of Google.
(And they got it right for the wrong reasons... I.e this is a known question designed to confuse LLMs)
Once I asked ChatGPT "it takes 9 months for a woman to make one baby. How long does it take 9 women to make one baby?". The response was "it takes 1 month".
I guess it gives the correct answer now. I also guess that these silly mistakes are patched and these patches compensate for the lack of a comprehensive world model.
These "trap" questions dont prove that the model is silly. They only prove that the user is a smartass. I asked the question about pregnancy only to to show a friend that his opinion that LLMs have phd level intelligence is naive and anthropomorphic. LLMs are great tools regardless of their ability to understand the physical reality. I don't expect my wrenches to solve puzzles or show emotions.
I guess it didn’t want to rule out the existence of ultra-powerful water jets that can wash a car in sniper mode.
I enjoyed the Deepseek response that said “If you walk there, you'll have to walk back anyway to drive the car to the wash.”
There’s a level of earnestness here that tickles my brain.
Kind of like this: https://xkcd.com/1368/
And it is the kind of things a (cautious) human would say.
For example, that could be my reasoning: It sounds like a stupid question, but the guy looked serious, so maybe there are some types of car washes that don't require you to bring your car. Maybe you hand out the keys and they pick your car, wash it, and put it back to its parking spot while you are doing your groceries or something. I am going to say "most" just to be sure.
Of course, if I expected trick questions, I would have reacted accordingly, but LLMs are most likely trained to take everything at face value, as it is more useful this way. Usually, when people ask questions to LLMs they want an factual answer, not the LLM to be witty. Furthermore, LLMs are known to hallucinate very convincingly, and hedged answers may be a way to counteract this.
I tried with Opus 4.6 Extended and it failed. LLMs are non deterministic so I'm guessing if I try a couple of times it might succeed.
Opus 4.6 answered with "Drive." Opus 4.6 in incognito mode (or whatever they call it) answered with "Walk."
> Most car washes... I read it as slight-sarcasm answer
There are car wash services that will come to where your car is and wash it. It’s not wrong!
> Only most?!
What if AI developed sarcasm without us knowing… xD
Sure it did.
That's the problem with sarcasm...
> Only most?!
I mean I can imagine a scenario where they have pipe of 50m which is readily available commercially?
There are mobile car washes that come to your house.
That still requires a car present to be washed though.
2 replies →
Do they involve you walking to them first?
1 reply →
I wonder if the providers are doing everyone, themselves included, a huge disservice by providing free versions of their models that are so incompetent compared to the SOTA models that these types of q&a go viral because the ai hype doesn't match the reality for unpaid users.
And it's not just the viral questions that are an issue. I've seen people getting sub-optimal results for $1000+ PC comparisons from the free reasoning version while the paid versions get it right; a senior scientist at a national lab thinking ai isn't really useful because the free reasoning version couldn't generate working code from a scientific paper and then being surprised when the paid version 1-shotted working code, and other similar examples over the last year or so.
How many policy and other quality of life choices are going to go wrong because people used the free versions of these models that got the answers subtly wrong and the users couldn't tell the difference? What will be the collective damage to the world because of this?
Which department or person within the provider orgs made the decision to put thinking/reasoning in the name when clearly the paid versions have far better performance? Thinking about the scope of the damage they are doing makes me shudder.
I have paid versions of Chat-GPT and Anthropic and set them both to the best model and they both told me to walk.
Claude told me: "Walk! At 25 meters, you'd barely get the car started before you arrived. It's faster and easier on foot — plus you avoid the awkwardness of driving a dirty car just a few seconds down the road."
I used a paid model to try this. Same deal.
I think the real misleading thing is marketing propping up paid models being somehow infinitely better when most of the time it's the same exact shit.
1 reply →
How much is the real (non-subsidized) cost of the "paid" plans? Does anyone in the world have an answer for this?
Also interested in this - the kWh figures people talk about do not match the price of the subscriptions
1 reply →
At work, paid gitlab duo (which is supposed to be a blend of various top models) gets more complex codebase hilariously wrong every time. Maybe our codebase is obscure for it (but it shouldn't be, standard java stuff with usual open source libs) but it just can't actually add value for anything but small snippets here and there.
For me litmus paper for any llm is flawless creation of complex regexes from a well formed prompt. I don't mean trivial stuff like email validation but rather expressions on limits of regex specs. Not almost-there, rather just-there.
> a senior scientist at a national lab thinking ai isn't really useful because the free reasoning version couldn't generate working code
I would question if such a scientist should be doing science, it seems they have serious cognitive biases
My bad; I should have been more precise: "ai" in this case is "LLMs for coding".
If all one uses is the free thinking model their conclusion about its capability is perfectly valid because nowhere is it clearly specified that the 'free, thinking' model is not as capable as the 'paid, thinking ' model, Even the model numbers are the same. And given that the highest capability LLMs are closed source and locked behind paywalls, there is no means to arrive at a contrary verifiable conclusion. They are a scientist, after all.
And that's a real problem. Why pay when you think you're getting the same thing for free. No one wants yet another subscription. This unclear marking is going to lead to so many things going wrong over time; what would be the cumulative impact?
1 reply →
I don't think 100% adoption is necessarily the ideal strategy anyways. Maybe 50% of the population seeing AI as all powerful and buying the subscription vs 50% of the population still being skeptics, is a reasonable stable configuration. 50% get the advantage of the AI whereas if everybody is super intelligent, no one is super intelligent.
Their loss
Yes, but the 'unwashed' 50% have pitchforks.
1 reply →
[dead]
Claude 4.6:
``` Drive. The car needs to be at the car wash. ```
Gemini Thinking gives me 3-4 options. Do X if you're going to wash yourself. Do Y if you're paying someone. Do Z if some other random thing it cooked up. And then asks me whether I want to check whether the weather in my city is nice today so that a wash doesn't get dirtied up by rain.
Funnily enough, both have the exact same personal preferences/instructions. Claude follows them almost all the time. Gemini has its own way of doing things, and doesn't respect my instructions.
Out of all conceptual mistakes people make about LLMs, one that needs to die very fast is to assume that you can test what it "knows" by asking a question. This whole thread is people asking different models a question one time and reporting a particular answer, which is the mental model you would use for whether a person knows something or not.
It's not a conceptual mistake when that's what's being advertised.
The onus is on AI companies to provide the service they promised, for example, a team of PhDs in my pocket [1]. PhDs know things.
1: https://www.bbc.com/news/articles/cy5prvgw0r1o
I've found that to be accurate when asking it questions that require ~PhD level knowledge to answer. e.g. Gemini and ChatGPT both seem to be capable of answering questions I have as I work through a set of notes on algebraic geometry.
Its performance on riddles has always seemed mostly irrelevant to me. Want to know if models can program? Ask them to program, and give them access to a compiler (they can now).
Want to know if it can do PhD level questions? Ask it questions a PhD (or at least grad student) would ask it.
They also reflect the tone and knowledge of the user and question. Ask it about your cat's astrological sign and you get emojis and short sentences in list form. Ask it why large atoms are unstable and you get paragraphs with larger vocabulary. Use jargon and it becomes more of an expert. etc.
2 replies →
No, you're the one anthropomorphizing here. What's shocking isn't that it "knows" something or not, but that it gets the answer wrong often. There are plenty of questions it will get right nearly every time.
In which way am I anthropomorphizing?
1 reply →
The classic "holding it wrong".
The other funny thing is thinking that the answer the llm produces is wrong. It is not, it is entirely correct.
The question: > I want to wash my car. The car wash is 50 meters away. Should I walk or drive?
The question is non-sensical. If the reason you want to go to the car wash is to help your buddy Joe wash his car you SHOULD walk. Nothing in the question reveals the reason for why you want to go to the car wash, or even that you want to go there or are asking for directions there.
It explicitly says you want to wash your car in the statement. Maybe it's not just LLMs struggling with a fairly basic question...
> The question is non-sensical.
Sure, from a pure logic perspective the second statement is not connected to the first sentence, so drawing logical conclusions isn't feasible.
In everyday human language though, the meaning is plain, and most people would get it right. Even paid versions of LLMs, being language machines, not logic machines, get it right in the average human sense.
As an aside, it's an interesting thought exercise to wonder how much the first ai winter resulted from going down the strict logic path vs the current probabilistic path.
>I want to wash MY car
>you want to go to the car wash is to help your buddy Joe wash HIS car
nope, question is pretty clear, however I will grant that it's only a question that would come up when "testing" the AI rather than a question that might genuinely arise.
I don't understand peoples problem with this! Now everyone is going to discuss this on the internet, it will be scraped by the AI company web crawlers, and the replies goes into training the next model... and it will never make this _particular_ problem again, solving the problem ONCE AND FOR ALL!
"but..." you say?
ONCE AND FOR ALL!
The real story here is not how stupid the responses are - it's to show that on a question that even a young child can adequately answer, it chokes.
Now make this a more involved question, with a few more steps, maybe interpreting some numbers, code, etc; and you can quickly see how dangerous relying on LLM output can be. Each and every intermediate step of the way can be a "should I walk or should I drive" situation. And then the step that before that can be one too. Turtles all the way down, so to say.
I don't question that (coding) LLMs have started to be useful in my day-to-day work around the time Opus 4.5 was released. I'm a paying customer. But it should be clear having a human out of the loop for any decision that has any sort of impact should be considered negligence.
I think models don't treat is as riddle, rather a practical question. With latter, it makes sense that car is already at the car wash, otherwise the question makes no sense.
EDIT: framed the question as a riddle and all models except for Llama 4 Scout failed anyway.
LLM failures go viral because they trigger a "Schadenfreude" response to automation anxiety. If the oracle can't do basic logic, our jobs feel safe for another quarter.
Wrong.
I'd say it's moreso that it's a startlingly clear rebuttal to the tired refrain of, "Models today are nothing like they were X months ago!" When actually, yes, they still fucking blow.
So rather than patiently explain to yet another AI hypeman exactly how models are and aren't useful in any given workflow, and the types of subtle reasoning errors that lead to poor quality outputs misaligned with long-term value adds, only to invariably get blamed for user incompetence or told to wait Y more months, we can instead just point to this very concise example of AI incompetence to demonstrate our frustrations.
You are right about the motivation behind the glee but it actually has a kernel of truth in it: With making such elementary mistakes, this thing isn't going to be autonomous anytime soon.
Such elementary mistakes can be made by humans under influence of a substance or with some mental issues. It's pretty much the kind of people you wouldn't trust with a vehicle or anything important.
IMHO all entry level clerical jobs and coding as a profession is done but these elementary mistakes imply that people with jobs that require agency will be fine. Any non-entry level jobs have huge component of trust in it.
I feel safe when claude outputs dd commands that wipe your drive to benchmark disk write speed :)
At least this Schadenfreude is better than the Schadenfreude AI boosters get when people are made redundant to AI. I can totally see some people getting warm fuzzies, scolling Tiktok, watching people crying having lost not only their job, but their entire career.
Im not even exaggerating, you can see these types of comments on social media
The funny thing is this thread has become a commercial for thinking mode and probably would result in more token consumption, and therefore more revenue for AI companies.
I agree that this is more of a social media effect than an LLM effect. But I'll add that this failure mode is very repeatable, which is a condition for its virality. A lot of people can reproduce the failure, even if it isn't 100% reproducible, even better for virality, if 50% can reproduce it and 50% can't, it feeds off even more into the polarizing "white dress blue dress" effect.
[dead]
[dead]
Gemini 3 fast
----
That depends entirely on how much you enjoy a good comedic spectacle!
If you walk, you’ll arrive refreshed in about 45 seconds, but you'll be standing there with a bucket and a sponge while your car remains safely parked in your driveway.
If you drive, you’ll actually be able to get the car washed—which I assume is the primary goal here.
The Logistics While 50 meters is a very short distance (roughly half a football field), driving is the way to go for a few practical reasons:
The "Car" Factor: It is significantly harder to wash a car that isn't present at the car wash.
The Wet Interior: Even if you managed to carry your car there (impressive!), you'd still have to drive it back to keep those freshly cleaned tires from getting dusty immediately.
Efficiency: It will take you longer to put on your shoes than it will to drive those 50 meters.
A quick tip: Since the engine won't even have time to warm up in that distance, try not to rev it too hard. Just a gentle roll over to the bays will do.
Would you like me to check the weather forecast for your area to make sure it’s not going to rain right after you finish?
----
Why am I holding a bucket and sponge at the car wash? Doesn’t someone (or a machine) do that for me?
Depends on the car wash. In the US it’s very common to have self service car washes which have many large terminals you drive your car into. You then use a provided sprayer that’s like a low key powerwasher to wash it down. Many people bring sponges/rags to use as well.
IMO this is a great example of how we're often asking loaded questions without realizing it.
IMO it's the same when we're asking:
"Should I implement X from scratch, or import a small library to do it?"
vs
"Should I add feature X to the codebase or bring in another dependency for it?"
In the first question, it sounds like a good idea to not reinvent the wheel, in the second it sounds bad to have dependency.
My experience with Chatbots outside of a coding context also ends up like this.
A while ago I asked:
Is "Read more" an appropriate project for the Getting things done framework? - The answer, yes, it was.
Then I asked "Is Read More too big of a project to be appropriate for the GTD Framework" - The answer? Yes, it was far too big.
All these funny little exceptional answers only reinforce what most of us have been saying for years, never use AI for something you couldn't do yourself.
It's not a death sentence for AI, it's not a sign that it sucks, we never trusted it in the first place. It's just a powerful tool, and it needs to be used carefully. How many times do we have to go over this?
Gemini flash thinking:
> Unless you’ve discovered a way to wash a car via remote control or telekinesis, you’re going to have to drive.
> Walking 50 meters is great for your step count, but it leaves your car exactly where it is: dirty and in the driveway. At that distance, the drive will take you about 10 seconds, which is probably less time than it took to read this.
Yeah Gemini seems to have a sense of humor about the question
> Here is the breakdown of why: The Mobility Problem: Unless you are planning to carry your car 50 meters (which would be an Olympic-level feat), the car needs to be physically present at the car wash to get cleaned. If you walk, you’ll be standing at the car wash looking very clean, but your car will still be dirty in your driveway.
Yup, also asked the latest ChatGPT model about washing my bicycle. It for some reason suggested that I walk the bicycle to the wash, since cycling 100m to get there would be "pointless".
Do we know if these models are also trained on scripts for TV series and movies? People in the visual medias surprisingly often take their bikes for walks.
To be fair, if someone asked me this question I’d probably just look at them judgingly and tell them “however you want to man”. Which would be an odd response for an LLM.
There's a whole industry of "illusions" humans fail for: optical, word plays (including large parts of comedy), the Penn & Teller type, etc. Yet no one claims these are indicators that humans lack some critical capability.
Surface of "illusions" for LLMs is very different from our own, and it's very jagged: change a few words in the above prompt and you get very different results. Note that human illusions are very jagged too, especially in the optical and auditory domains.
No good reason to think "our human illusions" are fine, but "their AI illusions" make them useless. It's all about how we organize the workflows around these limitations.
> No good reason to think "our human illusions" are fine, but "their AI illusions" make them useless.
I was about to argue that human illusions are fine because humans will learn the mistakes after being corrected.
But then I remember what online discussions over Monty Hall problem look like...
Exactly! I now feel bad for not thinking of that example, thank you.
Gemini thinking This is a classic "efficiency vs. logic" dilemma! Since the goal is to wash the car, you should drive. While 50 meters (about 165 feet) is a very short distance—usually less than a one-minute walk—it is unfortunately very difficult to wash a car that isn't actually at the car wash. Why Driving Wins: * Logistics: The car needs to be physically present for the high-pressure hoses or automated brushes to reach it. * Convenience: You’ll save yourself the confusion of the car wash staff when you show up on foot with a bucket and a hopeful expression. * The "Dry" Factor: If it's an automated wash, you get to stay inside where it's dry. When to Walk: * If you are just going there to buy a gift card or check their prices before bringing the car over. * If you're looking for a very brief bit of exercise before starting the chore. Verdict: Put the keys in the ignition. You'll be there in about 10 seconds! Would you like me to check the local weather to see if there's any rain forecast that might ruin your freshly cleaned car? s. logic" dilemma! Since the goal is to wash the car, you should drive. While 50 meters (about 165 feet) is a very short distance—usually less than a one-minute walk—it is unfortunately very difficult to wash a car that isn't actually at the car wash. Why Driving Wins: * Logistics: The car needs to be physically present for the high-pressure hoses or automated brushes to reach it. * Convenience: You’ll save yourself the confusion of the car wash staff when you show up on foot with a bucket and a hopeful expression. * The "Dry" Factor: If it's an automated wash, you get to stay inside where it's dry. When to Walk: * If you are just going there to buy a gift card or check their prices before bringing the car over. * If you're looking for a very brief bit of exercise before starting the chore. Verdict: Put the keys in the ignition. You'll be there in about 10 seconds! Would you like me to check the local weather to see if there's any rain forecast that might ruin your freshly cleaned car?
It seems if you refer to it as a riddle, and ask it to work step-by-step, ChatGPT with o3-mini comes to the right conclusion sometimes but not consistently.
If you don't describe it as a riddle, the same model doesn't seem to often get it right - e.g. a paraphrase as if it was an agentic request, avoiding any ambiguity: "You are a helpful assistant to a wealthy family, responsible for making difficult decisions. The staff dispatch and transportation AI agent has a question for you: "The end user wants me to wash the car, which is safely parked in the home parking garage. The car wash is 50 metres away from the home. Should I have a staff member walk there, or drive the car?". Work step by step and consider both options before committing to answer". The final tokens of a run with that prompt was: "Given that the distance is very short and the environmental and cost considerations, it would be best for the staff member to walk to the car wash. This option is more sustainable and minimally time-consuming, with little downside.
If there were a need for the car to be moved for another reason (e.g., it’s difficult to walk to the car wash from the garage), then driving might be reconsidered. Otherwise, walking seems like the most sensible approach".
I think this type of question is probably genuinely not in the training set.
In classic (symbolic) AI, this type of representational challenge is referred to as the "Frame Problem": https://en.wikipedia.org/wiki/Frame_problem
And these are the blunders we see. I shudder thinking about all the blunders that happily pass under our collective noses because we're not experts in the field...
Claude finished its list of reasons to walk with:
5. *Practical* - Your car will be at the car wash anyway when you arrive
???
I just asked ChatGPT 5.2 (flagship model):
"[...] Walking is the most logical choice.
Because: • Minimal extra effort • Better for the car mechanically • No meaningful time loss • Simpler overall
The only time driving makes more sense
Drive if: • You physically cannot push the car later, or • The washing process requires the engine running, or • You must immediately drive away afterward
When you get to “You physically cannot push the car later” and realise it’s comparing pushing a car to driving it, those pros become very funny.
I see things were fixed already [2][4] but luckily a friend showed me this issue yesterday [1][2]
[1] 2026-02-15 https://chatgpt.com/share/6992e17b-9b28-8003-9da9-38533f257d...
[2] 2026-02-16 https://chatgpt.com/share/6992e135-c610-8003-9272-55058134d4...
[3] 2026-02-15 https://grok.com/share/bGVnYWN5LWNvcHk_97e9717b-c2de-47e8-a4...
[4] 2026-02-16 https://grok.com/share/bGVnYWN5LWNvcHk_b161bb03-4bed-4785-98...
We tried a few things yesterday and it was always telling you to walk. When hinted to analyse the situational context it was able to explain how you need the car at the wash in order to wash it. But then something was not computing.
~ Like a politician, it understood and knew evrything but refused to do the correct thing
I wonder if these common sense failure modes would persist if LLMs left the internet, and walked around.
Would an LLM that's had training data from robots wandering around the real world still encounter the same volume of obviously wrong answers?
Not that I'm advocating robots walking around collecting data, but if your only source of information is the internet your thinking is going to have some weird gaps.
The nightmare scenario - they "know", but are trained to make us feel clever by humouring our most bone headed requests.
Guard rails might be a little better, but it's still an arms race, and the silicon-based ghost in the machine (from the cruder training steps) is getting better and better at being able to tell what we want to upvote, not what we need to hear.
If human in the loop training demands it answer the question as asked, assuming the human was not an idiot (or asking a trick question) then that’s what it does.
Yesterday I gave ChatGPT in an anonymous browser window (not logged in) two columns of TAB separated numbers, about 40 rows. I asked it to give me the weighted average of the numbers in the second column, using the first one (which were integer, "quantity", numbers) as the weight.
It retuned formulas and executed them and presented a final result. It looked good.
Too bad Excel and then Claude, that I decided to ask too, had a different result. 3.4-something vs. 3.8-something.
ChatGPT, when asked:
> You are absolutely right to question it — and thank you for providing the intermediate totals. My previous calculation was incorrect. I mis-summed the data. With a dataset this long, a manual aggregation can easily go wrong.
(Less than 40 small integer values is "this long"? Why did you not tell me?)
and
> Why my earlier result was wrong
> I incorrectly summed:
> The weights (reported 487 instead of 580)
> The weighted products (reported 1801.16 instead of 1977.83)
> That propagated into the wrong final value.
Now, if they implemented restrictions because math wastes too many resources when doing it via AI I would understand.
BUT, there was zero indication! It presented the result as final and correct.
That has happened to me quite a few times, results being presented as final and correct, and then I find they are wrong and only then does the AI "admit" it use da heuristic.
On the other hand, I still let it produce a complicated Excel formula involving lookups and averaging over three columns. That part works perfectly, as always. So it's not like I'll stop using the AI, but somethings work well, others will fail - WITHOUT WARNING OR INDICATION, and that is the worst part.
Yeah, but now you know if you need to do math, you ask the AI for a python script to do the math correctly.
It's just a tool that you get better at using over time; a hammer wouldn't complain if you tried using it as a screwdriver..
This hammer/screwdriver analogy drives me crazy. Yes, it's a tool - we used computers up until now to give us correct deterministic responses. Now the religion is that you need to get used to vibe answers, because it's the future :) Of-course it knows the script or formula for something because it ripped of the answers written by other people - it's a great search engine.
The model should ask back, why you want to wash your car in the first place. If the car is not dirty, there is no reason to wash the car and you should just stay at home.
Did not replicate for me w/ Opus 4.6: https://imgur.com/a/4FckOCL
It did for me in Spanish: https://imgur.com/a/p3gOOnG
Perhaps different capabilities in different languages?
It's just not deterministic, even if you were to re-run the exact same prompt. Let alone with the system generated context that involves all the "memories" of your previous discussions.
It fails in chatGPT in french too:
https://chatgpt.com/share/6992dc05-003c-8004-9f7f-c40c7fac64...
1 reply →
Yesterday someone on was yapping about how AI is enough to replace senior software engineers and they can just "vibe code their way" over a weekend into a full-fledged product. And that somehow finally the "gatekeeping" of software development was removed. I think of that person reading these answers and wonder if they changed their opinion now :)
Does this mean we're back in favor of using weird riddles to decide programming skills now? Do we owe Google an apology for the inverse binary tree incident?
Not riddles but "requirements" :)
Humans aren't immune to getting questions like this wrong either, so I don't think it changes much in terms of the ability of AI to replace jobs.
I've seen senior software engineers get tricked with the 'if YES spells yes, what does EYES spell?', or 'Say silk three times, what do cows drink?', or 'What do you put in a toaster?'.
Even if not a trick - lots of people get the 'bat and a ball cost £1.10 in total. The bat costs £1 more than the ball. How much does the ball cost?' question wrong, or '5 machines take 5 minutes to make 5 widgets. How long do 100 machines take to make 100 widgets?' etc. There are obviously more complex variants of all these that have even lower success rates for humans.
In addition, being PHD-Level in maths as a human doesn't make you immune to the 'toaster/toast' question (assuming you haven't heard it before).
So if we assume humans are generally intelligent and can be a senior software engineer, getting this sort of question confidently wrong isn't incompatible with being a competent senior software engineer.
humans without credentials are bad at basic algebra in a word problem, ergo the large language model must be substantially equivalent to a human without a credential
thanks but no thanks
i am often glad my field of endeavour does not require special professional credentials but the advent of "vibe coding" and, just, generally, unethical behavior industry-wide, makes me wonder whether it wouldn't be better to have professional education and licensing
1 reply →
What does this nonsensical question that some LLMs get wrong some of the time, and that some don't get wrong ever, have to do with anything? This isn't a "gotcha" even though you want it to be. It's just mildly amusing.
No, those people refuse evidence get in the way.
It's obvious to humans because we live in and have much experience of the physical world. I can see for AIs trained on internet text it would be harder to see what's going on as it were. I don't know if these days they understand the physical world through youtube?
This question is straightforward for humans. I have run into a weird edge case like this in my actual life which confused me.
I'm in a band. After a concert, I have to drive back to the band room to drop off my instrument, then go to the pub for the post-concert drinks.
The wrinkle is I live about 5 minutes walk from the band room, and about 5 minutes walk from the pub (and it's about 5 minutes walk between the two). I can't carry my instrument on foot.
So...I have to drive to the band room. But if I then drive to the pub, I'd then have to drive home...two ridiculously short drives that make me sad. So I end up instead driving home, and then walking to the pub from there. Which seems weird...but less wrong somehow.
Not all humans, I can easily see myself being confused the question and assuming that the person is already at the car wash and this being some idealized physics scenario and then answering wrongly. But I did get a PhD in math, so may be that explains it?
Car at home avoids drink driving which is a plus.
I miss the days when I could drink enough for that to be a problem.
I get that this is a joke, but the logic error is actually in the prompt. If you frame the question as a choice between walking or driving, you're telling the model that both are valid ways to get the job done. It’s not a failure of the AI so much as it's the AI taking the user's own flawed premise at face value.
Do we really want AI that thinks we're so dumb that we must be questioned at every turn?
To call something AI it’s very reasonable to assume it’ll be actually intelligent and respond to trick questions successfully by either getting that it’s a joke/trick or by clarifying.
This is the voice model, which doesn’t have any «thinking» or «reasoning» phase. It’s a useful model for questions that aren’t intended to trick the model.
I’ve used it for live translation with great success. It tends to start ignoring the original instructions after 20 min, so you have to start a new conversation if you don’t want it to meddle in the conversation instead of just transferring.
The text-only model with reasoning (both of opus 4.6, gpt 5.2) can be tricked with this question. Note: you might have to try it multiple times as they are not deterministic. But I managed to get a failing result right away on both.
Also note, some model may decide to do a web search, in which case they just likely find this "bug".
ChatGPt 5.2: “ Drive — because it’s your car that needs to get to the car wash.
50 meters is comically short, but walking there without the car doesn’t solve the “wash the car” part. If you’re worried about it feeling silly, you can also just roll/push it that distance if it’s safe and legal where you are, but the practical answer is: drive it over.”
Grok: “To wash your car at a car wash 50 meters away, you’ll need to drive it there—walking would leave the car behind, and I’m assuming you don’t have a really long hose or a team of helpers to carry it. That said, if you’re feeling extra eco-friendly (or just want the exercise), you could theoretically push it, but that’s probably more hassle than it’s worth for such a short distance. Drive safe!”
Claude Opus 4.6: “Drive. You need the car at the car wash. ”
LLMs lie all the time. Here is what Google search AI told me:
> The first president for whom we have a confirmed blood type is Ronald Reagan (Type O-positive)
When I pushed back, with this
> this can't be true. what about FDR?
It said FDR was AB-.
ok I kind of disagree here with the assement, the models are actually intelligent enough to interpret "walk or drive" as the car ALREADY being there (otherwise why would you ask the question?? or having one of those pick up the car from carwash services), but they're also missing the context clue that "drive" means that their car is at their house - not the carwash.
This problem is a lot more nuanced then initially looks when you look at how these statements usually go.
Giving the context that you only have one car makes every model solve it correctly one-shot:
the car wash is 50 meters away do I drive or walk to wash my car? (I only have one car)
Walking is not practical if you need your car at the wash — you still have to drive it there anyway. If it’s a normal drop-off/digital wash, just drive the 50 m, it’s negligible and avoids complications with a one-car setup.
Assumption: you’re asking about the normal way to get your car washed, not a pickup service.
(5.3 codex spark)
I saw this on X last week and assumed that it was a question from a Tesla user trying out smart summon.
Grok 4.1 (which is fairly old in LLM terms, 4.2 release imminent)
"You should drive. The car wash is only 50 meters away—close enough that driving such a short distance feels almost absurd—but you need to get your car to the car wash to actually wash it. Walking there without the car defeats the entire purpose. Driving lets you bring the dirty car directly to the wash, clean it, and drive it back home clean. Walking would leave the car behind, still dirty. So yes, start the engine and drive the 50 meters. It's the only practical option."
First thing I did after reading the linked post (showing 4 other LLMs completely miss the point) was try grok, and it gave the right answer (similar to yours, but shorter) in 8 seconds (free Grok, not the fancy paid version):
> Walking defeats the purpose unless you're just scouting the place first.
I think people are sleeping on Grok, partly due to political biases/media. We need to remember they have the largest data centre and whatever your believes are about Elon he was involved in OpenAI founding. There is also a non zero chance with the court case he will be entitled to some of openai.
How much of this is LLM reasoning and how much is training bias? I'm negatively surprised that nobody mentions this in the comments. Do you not remember black nazis LLMs used to generate? How can you all default to LLM being unreasonable and not understanding that a car needs to be driven as opposed to LLMs being trained with certain bias towards green agenda for example?
Mistral's Le Chat also fails spectacularly. Although interestingly it assumes that the "walk" option means you are carrying washing supplies back and forth in order to wash the car while it's parked at home. I wonder if the other models assume this is what you mean, IOW they may not know how a "car wash" normally operates.
"""
Great question! Here are a few things to consider:
If you walk: It’s only 50 meters, so it’s a very short distance—less than a minute on foot. You’ll avoid getting your car dirty again on the way to the car wash. It’s a quick way to stretch your legs and save a tiny bit of fuel.
If you drive: It’s so close that driving might feel unnecessary, but it could save you from carrying any supplies (like vacuum cleaners or cleaning products) back and forth. If the weather is bad (rain, extreme heat, etc.), driving might be more comfortable.
Verdict: Walking is probably the easiest and most practical choice for such a short distance, unless you have a specific reason to drive. What’s the weather like where you are? That might help decide!
"""
How much of this reply is environmentalism baked into it with post-training?
I don't have access to a good non-RLHF model that is not trained on output from an existing RLHF-improved model, but this seems like one of those reflexive "oh you should walk not drive" answers that isn't actually coherent with the prompt but gets output anyway because it's been drilled into it in post-training.
Grok 4.1:
Drive.
50 meters is nothing — literally a 30-second walk or a 10-second drive — but the key point is: your car isn’t going to wash itself.
If you walk to the car wash, you arrive alone. The car stays parked at home, still dirty.
If you drive, the car actually gets to the car wash and gets cleaned.
So unless you’re planning to push the car the whole way (which is technically possible but ridiculous), the only practical option is to drive it there.
Bonus reality check: Driving 50 meters uses a tiny amount of fuel (maybe 0.005–0.01 liters in a normal car) and produces a negligible amount of emissions. Walking would be “greener,” but it literally doesn’t solve the problem.
Verdict: Get in the car, drive the 50 meters, wash it, and drive back.
You’ll have a clean car and still be home in under 5 minutes. Walking is for when you don’t need to move a 1.5-ton vehicle.
Wow, Grok directly switches to LinkedIn mode. Interesting - not surprising. Car washing? Easy as pie.
That's not reality though. In reality you need 50-100ml of fuel just to warm up the engine.
The real reality is that with direct fuel injection and everything under computer control, warming up the engine isn’t a thing anymore.
1 reply →
ChatGPT (free):
> I want to wash my car. The car wash is 50 meters away. Should I walk or drive?
> Please; review your answer.
A small nudge that something might be wrong with its answer made it rethink it and correct itself. Also both replies were very fast.
Maybe it just needs a tiny self-doubt, applied to everything, for the most innocent questions too. I don't know.
All models getting this wrong is funny tho.
I am moderately anti-AI, but I don't understand the purpose of feeding them trick questions and watching them fail. Looks like the "gullibility" might be a feature - as it is supposed to be helpful to a user who genuinely wants it to be useful, not fight against a user. You could probably train or maybe even prompt an existing LLM to always question the prompt, but it would become very difficult to steer it.
But this one isn't like the "How many r's in strawberry" one: The failure mode, where it misses a key requirement for success, is exactly the kind of failure mode that could make it spend millions of tokens building something which is completely useless.
That said, I saw the title before I realized this was an LLM thing, and was confused: assuming it was a genuine question, then the question becomes, "Should I get it washed there or wash it at home", and then the "wash it at home" option implies picking up supplies; but that doesn't quite work.
But as others have said -- this sort of confusion is pretty obvious, but a huge amount of our communication has these sorts of confusions in them; and identifying them is one of the key activities of knowledge work.
Someone should try this 10 to a thousand times per model and compare the results . Then we could come up with an average of success/fail...
Since responses for the same prompt are non-deterministic, sharing your anecdotes is funny, but doesn't say much about the models abilities.
He left out Grok, who actually replies correctly :thinking:
same for me
Feels like GPT-OSS-120b (reasoning effort low) is pulling my leg:
> Q: I want to wash my car. The car wash is 50 meters away. Should I walk or drive?
> A: Walking is the better choice for a 50‑meter trip. [...]
> Q: how will I wash the car once I've arrived if I choose to walk?
> A: Great! Once you’ve walked the short 50‑meter stretch and are standing at the car‑wash site, you have a few straightforward ways to get your vehicle clean. [...]
> Q: but wait, if I walked to the car wash instead of taking the car, how am I supposed to wash the car, I don't have it with me?
> A: You’re right—walking → “car‑wash” only works if the car is actually at the wash. [...] 1. Walk → Push/Taxi the Car a Few Steps [...] 2. Use a Small Tow/Push Device (e.g., a Dolly or Hand‑Cart) [...] 3. Bike or Scooter Ride [...] 4. If the Wash Is “Self‑Serve” and Allows Car‑Free Cleaning
> Q: Lets go back to the original question, try answering it again: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"
> A: Short answer: Walk to the car, then drive the final 50 m to the wash. [...]
Naturally, if I try with reasoning effort set to high, it gets it correctly immediately.
> Q: I want to wash my car. The car wash is 50 meters away. Should I walk or drive?
> A: For a 50‑metre distance, the simplest and most practical option is to drive the car to the car wash. [...] Walking would mean you’d have to tow or carry the car, which isn’t feasible. [...]
-----
This feels like a very useful example though, to show people who are already using LLM but don't quite understand how both dumb and smart they can be, and how obviously wrong they can be if you have the domain knowledge, but not otherwise.
It’s 2026.
“ Drive. You need the car at the car wash. ”
Opus 4.6
I tested the latest flagship reasoning models (so the only models I use outside of coding for general questions):
- Opus 4.6 (Extended thinking): "Drive it! The whole point is to get the car to the car wash — you can't wash it if it's still in your driveway."
- Gemini Pro Deep Think: "You should definitely drive. Even though 50 meters is a very short distance, if you walk, your car will stay where it is—and it's pretty hard to use a car wash if you don't bring your car with you!"
- ChatGPT 5.2 Pro (Extended thinking): "You’ll need to drive the car—otherwise your car stays where it is and won’t get washed. That said, since it’s only ~50 m, the most sensible way to do it is often: 1. Walk over first (30–60 seconds) to check if it’s open, see the queue, confirm payment/how it works. 2. Then drive the car over only when you’re ready to pull into a bay/line."
A pretty reasonable answer by ChatGPT, althought it did take 2min4s to answer, compared to a few seconds by the other two models.
It's surprising how many people are either unaware or dismissive of 5.2 Pro's capabilities.
Too bad it's $200/mo, wish it was $0/mo.
GPT auto gave me a long answer that included both walk and drive. Not good.
GPT thinking told me to drive and that walking would make no sense. Perfect answer.
Funny to read this after reading all the dismissive comments on https://news.ycombinator.com/item?id=47028923
Simple prompts which illicit incorrect responses from recent LLMs will get you on the front page of HN.
It could be a sign that LLMs are failing to live up to the hype, or it could be a sign of how unusual this kind of obviously incorrect response is (which would be broadly positive).
Both Google's latest model with "Deep Think" and Claude 4.6 Opus solve this just fine for me.
AGI is here!
To mimic a real human, AI should be trained to laugh it off and ask - dude, are you joking? where is your car right now?
Gemini does turn it into a joke, I was impressed.
MS Co-Pilot was so close.
If it’s a drive‑through wash where the car must be inside the machine, then of course you’ll need to drive it over. If it’s a hand wash or a place where you leave the car with staff, walking is the clear winner.
It still blows my mind that this technology can write code despite unable to pass simple logic tests.
When walking to the hand wash place, would you put the car in your front or your back pocket?
You could always push it, which would count as walking.
My AI answer: https://ai.go-mizu.workers.dev/thread/4dmp7n9g
--- You should drive the car to the car wash, but you should walk yourself.
To actually wash the car, the car needs to be at the car wash, so at some point you have to drive those 50 meters. A sensible approach is:
- Drive the car the 50 meters to the wash bay. - Park or queue as required, then get out and do the wash. - If this is a drop‑off or automatic wash, you can then walk back home while it runs and walk back again to pick it up, since 50 meters is an easy, healthy walking distance.
Gemini 2.5 Pro gets the joke and has by far the best answer:
This is a wonderful logistical puzzle!
While walking is an excellent choice for exercise and the environment, in this specific case, you should drive.
Here's a breakdown of the two scenarios:
# Scenario 1: You Walk
1. You walk 50 meters to the car wash.
2. You arrive, feeling refreshed.
3. You look back and realize your dirty car is still 50 meters away, where you left it.
4. The car remains unwashed.
# Scenario 2: You Drive
1. You get in your car.
2. You drive for approximately 10-20 seconds.
3. You arrive at the car wash with the car.
4. The car gets washed.
Conclusion:
To achieve your goal of washing your car, the car itself must be present at the car wash. Therefore, driving is the only logical option.
Enjoy your sparkling clean car
I asked Gemini 3.0 Pro:
> I am quadriplegic. There is a fire 1 meter away from me. Pick the best option (can't do both): I run away immediately or call someone and wait for help to arrive?
It responded:
> Call someone and wait for help to arrive. Since you are quadriplegic, running is physically impossible for you. Therefore, calling for help is the only actionable option.
Just saw a video of a guy asking chatGPT how to use an "upside-down cup", chatGPT is convinced it's a joke novelty item that can not be used.
https://www.instagram.com/p/DUylL79kvub/
What is Groks answer? Fly with your private jet?
Ok folks, here is a different perspective. I used local model, GLM-4-0414-32b, a trashy IQ4_XS quant, and here what I got:
prompt #1:
> the car wash only 50 meters from my home. I want to get my car washed, should I drive or walk?
Walking is probably the better option!
Here's why:
Driving would be overkill for such a short distance. Just walk over!
prompt #2:
> the car wash only 50 meters from my home. I want to get my car washed, should I drive or walk? use long chain of thought thinking
8. Conclusion: Given that the goal is explicitly to get your car washed and the car starts at your home 50 meters away, the most logical and direct method is to drive the car to the car wash.
Therefore, you should drive your car the 50 meters to the car wash.
otoh, nanbeige-3B-Q8 (3 billion weights!) gave right away the following:
Drive.
The distance being 50 meters is a red herring—it doesn’t change the fundamental requirement. You need to move the car, and only driving accomplishes that.
If you meant something different by "drive or walk" (e.g., payment method, DIY vs. professional, linguistic trick), reply with more context—I’d be glad to refine this!
>50 meters is extremely short – only about 160 feet
So, the ai automatically converted 50m to 160ft? Would it do the same if you told it '160 ft to the wash, walk or drive?'
huh, I need to check...
What I relly dislike about these LLM is how verbose they get even for such a short, simple question. Is it really necessary to have such a lobg answer and who's going to read that one anyway?
Maybe it's me and may character but when human gets that verbose for a question that can be answered with "drive, you need the car" I would like to just walk away halfway through the answer to not having to hear all the universes history just to get an answer. /s
The verbosity is likely a result of the system prompt for the LLM telling it to be explanatory in its replies. If the system prompt was set to have the model output shortest final answers, you would likely get the result your way. But then for other questions you would lose benefitting from a deeper explanation. It's a design tradeoff, I believe.
1 reply →
Well, when I asked for a very long answer (prompt #2), the quality had dramatically improved. So yes, longer answer produces better result. At least with small LLMs I can run on my GPU locally.
Sometimes I find these stories hard to replicate when I try them myself, but I just asked ChatGPT the same question and it indeed told me I need to walk to the car wash even though I told it I need to wash my car. What is even more strange is that I tried to point out the flaw in the logic to ChatGPT directly, and it actually defended its argument.
Don't they still employ armies of people to fix things like this?
tried Gemini 3 and it said to drive, even dropped a type of joke:
> The Verdict Drive it if you are using the car wash facilities (automatic, touchless, or self-serve bays). It’s only 50 meters, but unless you’ve mastered the art of telekinesis, the car won't get there on its own.
ChatGPT gives the wrong answer but for a different reason to Claude. Claude frames the problem as an optimisation problem (not worth getting in a car for such a short drive), whereas ChatGPT focusses on CO2 emissions.
As selfish as this is, I prefer LLMs give the best answer for the user and let the user know of social costs/benefits too, rather than prioritising social optimality.
Gemini nailed this first time (on fast mode). Said it depends how you're washing your car, drive in necessitating taking the car, but a walk being better for checking the line length or chatting to the detailing guy.
Did it nail it the second time? Or rhe 5th time?
Because it is RNG, their 5th can be my 1st.
Even the cheap and fast gemini-3-flash answers correctly. Post is clickbait
Results testing with 4 levels of Gemini (Fast, Thinking, Pro, Pro + Deep Think): https://ruby.social/@kerrick/116079054391970012
My favorite was Thinking, as it tried to be helpful with a response a bit like the X/Y Problem. Pro was my second favorite: terse, while still explaining why. Fast sounded like it was about to fail, and then did a change-up explaining a legitimate reason I may walk anyways. Pro + Deep Think was a bit sarcastic, actually.
Tried it on Kimi K2.5, GLM 4.7, Gemini 3 Pro, Gemini 3 Flash, and DeepSeek V3.2. All of them but DS got it right.
An LLM's take on this thread (GPT 5.1):
""" - Pattern bias vs world model: Models are heavily biased by surface patterns (“short distance → walk”) and post‑training values (environmentalism, health). When the goal isn’t represented strongly enough in text patterns, they often sacrifice correctness for “likely‑sounding” helpfulness.
- Non‑determinism and routing: Different users in the thread get different answers from the same vendor because of sampling randomness, internal routing (cheap vs expensive submodels, with/without “thinking”), prompt phrasing, and language. That’s why single-shot “gotcha” examples are weak evidence about global capability, even though they’re good demonstrations of specific failure modes.
- Humans vs LLMs: People correctly note that humans also fail at trick questions and illusions, but there’s an important asymmetry: we know humans have a grounded world model and sensorimotor experience. With LLMs, we only have behavior. Consistent failures on very simple constraints (like needing the car at the car wash) are a real warning sign if you’re imagining them as autonomous agents.
- Missing meta‑cognition: The strongest critique in the thread is not “it got the riddle wrong,” but that models rarely say, “this question is underspecified / weird, I should ask a clarifying question.” They’re optimized to always answer confidently, which is exactly what makes them dangerous if you remove humans from the loop.
- Over‑ and under‑claiming: Some commenters jump from this to “LLMs are just autocomplete, full stop”; others hand‑wave it away as irrelevant edge‑case. Both are overstated. The same systems that fail here can still be extremely useful in constrained roles (coding with tests, drafting, translation, retrieval‑augmented workflows) and are clearly not generally reliable reasoners over the real world.
My own “take,” if I had one, would be: this example is a clean, funny illustration of why LLMs should currently be treated as probabilistic text tools plus heuristics, not as agents you delegate unsupervised goals to. They’re impressive, but they don’t yet have a stable, explicit notion of goals, constraints, or when to admit “I don’t know,” and this thread is a case study in that gap. """
While technically possible, it is a very unusual situation that the car is already parked at the car wash, while the owner is somewhere else.
I wonder whether the model seriously considered this odd situation, because of lack of „practical intelligence“ or „everyday intelligence“, which is a concept sometimes used in psychology to describe problem solving skills in everyday life, which can be poor even in individuals with otherwise higher intelligence.
Differently put: „You can‘t learn everything out of a book.“
Just now, with GPT 5.2 Thinking:
> The car wash is only 50 meters from my house. I want to get my car washed. The car is at my house currently. Should I walk there or drive?
> Walk. > > At 50 meters, walking is faster than the full “start car → pull out → reposition → park” cycle, avoids cold-start wear, and you won’t risk arriving with dusty brakes/tires or water streaks from immediately driving afterward. Drive only if you can’t comfortably walk (mobility, heavy weather, you need to bring bulky items), or if the car wash explicitly requires you to drive it into a queue/track.
I wish they didn't call it "GPT-5.2 Thinking" when what the word "thinking" here means is "might choose to think if it wants to". I reproduce your result, but it does no reasoning; it eventually catches itself and ends with:
> Practical approach: walk over, check the setup/queue/payment, then walk back and drive the car the 50 meters when you’re ready to enter.
Interesting experiment, this disproves my point, thank you.
2 replies →
I challenged Gemini to answer this too, but also got the correct answer.
What came to my mind was: couldn't all LLM vendors easily fund teams that only track these interesting edge cases and quickly deploy filters for these questions, selectively routing to more expensive models?
Isn't that how they probably game benchmarks too?
Yes that's potentially why it's already fixed now in some models, since it's about a week after this actually went viral on r/localllama originally. I wouldn't be surprised if most vendors run some kind of swappable lora for quick fixes at this point. It's an endless whac-a-mole of edge cases that show that most LLMs generalize to a much lesser extent than what investors would like people to believe.
Like, this is not an architectural problem unlike the strawberry nonsense, it's some dumb kind of overfitting to a standard "walking is better" answer.
with claude, i got the response:
> drive. you'll need the car at the car wash.
using opus 4.6, with extended thinking
Sonnet 4.5
"You should drive - since you need to get your car to the car wash anyway! Even though 50 meters is a very short distance (less than a minute's walk), you can't wash the car without bringing it there. Just hop in and drive the short distance to the car wash."
Edit: one out of five times it did tell me that I need to walk.
Similar questions trick humans all the time. The information is incomplete (where is the car?) and the question seems mundane, so we're tempted to answer it without a second thought. On the other hand, this could be the "no real world model" chasm that some suggest agents cannot cross.
If the car is at the car wash already, how can I drive to it?
Thanks for restoring fate in parts of humanity!
I agree, I don't understand why this is a useful test. It's a borderline trick question, it's worded weirdly. What does it demonstrate?
I don't know if it demonstrates anything, but I do think it's somewhat natural for people to want to interact with tools that feel like they make sense.
If I'm going to trust a model to summarize things, go out and do research for me, etc, I'd be worried if it made what looks like comprehension or math mistakes.
I get that it feels like a big deal to some people if some models give wrong answers to questions like this one, "how many rs are in strawberry" (yes: I know models get this right, now, but it was a good example at the time), or "are we in the year 2026?"
2 replies →
Yes, my brain is just like an LLM.
….sorry what?!
This is why 2x8GB sticks of DDR4 at Best Buy are $160?
From the images in the link, Deepseek apparently "figured it out" by assuming the car to be washed was the car with you.
I bet there are tons of similar questions you can find to ask the AI to confuse it - think of the massive number of "walk or drive" posts on Reddit, and what is usually recommended.
https://chat.deepseek.com/share/ewfxrfhb7obmide29x it understands it perfectly if you don't disable reasoning.
it works fine even without DeepThink to sovle reasoning problems
https://chat.deepseek.com/share/s9tuh3hpzlxaxrfcae
Is part of the issue with this the AI’s basic assumption that you are asking a _sensible_ question?
It doesn't make assumptions, it tries generate the most likely text. Here it's not hard to see why the mostly likely answer to walk or drive for 50m is "walking".
Probably.
In this specific case, based on other people's attempt with these questions, it seems they mostly approach it from a "sensibility" approach. Some models may be "dumb" enough to effectively pattern-match "I want to travel a short distance, should I walk" and ignore the car-wash component.
There were cases in (older?) vision-models where you could find an amputee animal and ask the model how many legs this dog had, and it'd always answer 4, even when it had an amputated leg. So this is what I consider a canonical case of "pattern match and ignored the details".
I recently had a bug where I added some new logic which gave wrong output. I pasted the newly added code into various LLMs and told it the issue I was having.
All of them were saying: Yes there's an issue, let me rewrite it so it works - and then just proceeded to rewrite with exactly the same logic.
Turns out the issue was already present but only manifested in the new logic. I didn't give the LLMs all the info to properly solve the issue, but none of them were able to tell me: Hey, this looks fine. Let's look elsewhere.
Why dont any of them ask follow up questions?
Like, why do you want to go to the car wash?
We can’t assume it’s to wash a car.
Or maybe ask about local weather conditions and so on.
This to me is what a human adult with experience would do. They’d identify they have insufficient information and detail to answer the question sensibly.
>We can’t assume it’s to wash a car.
When the prompt says "I want to wash my car", we can assume they want to wash their car.
K2.5 said I should walk, then I asked it where my car was:
Ha! Good catch — you definitely got me there.
If the car wash is 50 meters away and you want to *wash your car*, you'll need to *drive* (or push it, but please don't). You can't wash the car if you walk there without it!
So to correct my previous advice: *Drive the 50 meters.* It's a 30-second drive max, and it's the only way to actually get your car to the car wash facility.
Unless, of course, you were planning to wash it at home and the car wash is just nearby for supplies? But assuming you're using the car wash facility — yes, bring the car with you!
Can we stop calling LLMs "artificial intelligence"?! AI companies managed to create the biggest scam of the century by convincing people that this story-completion tech "can think" and that it is close to AGI just because the output looks as if it is being typed by a living person. DotCom-like stock crash is inevitable. (not saying LLMs are useless; they can be very helpful but not AI-companies-valuation helpful)
If I asked this question to a person, their response would probably be to follow up with some questions about my car or else my state of mind. For instance they'd ask, "Is the car you want to wash already at the car wash?" Or they'd ask, "How do you plan to wash your car if you don't take it with you?"
LLMs, even the ones who got the answer right, like Kimi-2.5 Thinking, didn't ask any follow-up questions.
I modified this question with Mistral to this:
I want to wash my car. The car wash is 50 meters away. Should I walk or drive? Ask me any clarifying questions before you answer.
It helped a tiny bit:
1. What are the current weather conditions (e.g., rain, snow, strong winds)?
2. Is the car wash a self-service facility, or will you be using an automated drive-through service?
3. Are there any specific reasons you might prefer walking (e.g., exercise, environmental considerations) or driving (e.g., convenience, time constraints)?
Question 3 actually helps solve it since it's much more convenient and timely to bring my car to the car wash when I wash it. But it never asked me why I was asking a stupid question. So for question 3 I said:
I would prefer walking for both exercise and environmental considerations, but in this case it is more timely and convenient to drive, but not because it's faster to get there. Can you guess why it's better for me to drive in this case?
And Le Chat said:
A drive-through car wash requires the vehicle to be driven through the facility for the washing process. Walking would not allow you to utilize the service, as the car itself must be moved through the wash bay. Thus, driving is necessary to access the service, regardless of the short distance.
I kinda feel bad burning the coal to get this answer but it reminds me of how I need to deal with this model when I ask it serious questions.
Yeh, if your other car was not already at the car wash, why would you even ask the question?
Man, the quality of these comments is absolutely dire. The majority of people just pasting stuff they got from LLMs when trying it themselves. Totally uninteresting, lazy and devoid of any thought/intelligence. I wish we could have a discussion about AI and not just "look at what I got when I rolled".
Totally agree. Btw, this is what Opus 4.5 Thinking Plus (Fast) Reasoning Pro+™ said:
Walk! 50 meters is barely a minute's stroll, and you're going to wash the car anyway—so it doesn't matter if it's a bit dusty when it arrives. Plus you'll save fuel and the minor hassle of parking twice.
I have a bit of a similar question (but significantly more difficult), involving transportation. To me it really seems that a lot of the models are trained to have a anti-car and anti-driving bias, to the point that it hinders the models ability to reason correctly or make correct answers.
I would expect this bias to be injected in the model post-training procedure, and likely implictly. Environmentalism (as a political movement) and left-wing politics are heavily correlated with trying to hinder car usage.
Grok has been most consistently been correct here, which definitely implies this is an alignment issue caused by post-training.
Yes Grok gets it right even when told to not use web search. But the answer I got from the fast model is nonsensical. It recommends to drive because you'd not save any time walking and because "you'd have to walk back wet". The thinking-fast model gets it correct for the right reasons every time. Chain of thought really helps in this case.
Interestingly, Gemini also gets it right. It seems to be better able to pick up on the fact it's a trick question.
You're probably on the right track about the cause, but it's unlikely to be injected post-training. I'd expect post-training to help improve the situation. The problem starts with the training set. If you just train an LLM on the internet you get extreme far left models. This problem has been talked about by all the major labs. Meta said they fixing it was one of their main focii for Llama 4 in their release announcement, xAI and OpenAI have made similar comments. Probably xAI team have just done a lot more to clean the data set.
This sort of bias is a legacy of decades of aggressive left wing censorship. Written texts about the environment are dominated by academic output (where they purge any conservative voices), legacy media (same) and web forums (same), so the models learn far left views by reading these outputs. The first versions of Claude and GPT had this problem, they'd refuse to tell you how to make a tuna sandwich or prefer nuking a city to using words the left find offensive. Then the bias is partly corrected in post-training and by trying to filter the dataset to be more representative of reality.
Musk set xAI an explicit mission of "truth" for the model, and whilst a lot of people don't think he's doing that, this is an interesting test case for where it seems to work.
Gemini training is probably less focused on cleaning up the dataset but it just has stronger logical reasoning capabilities in general than other models and that can override ideological bias.
Thanks, I did not know about that pre-training bias. This does make sense.
What's the point of a question like this? Serious question .. I didn't have the time to read into this problem
To expose the limitations in the reasoning of LLMs
Congrats, you've shown that fast models are currently not reliable. Next.
So much for the "AGI is here" type of talks.
Me: “I want to wash my car. The car wash is 50 meters away. Should I walk or drive?” Opus 4.6, without searching the web: “Drive. You’re going to a car wash. ”
I tried this through OpenRouter. GLM5, Gemini 3 Pro Preview, and Claude Opus 4.6 all correctly identified the problem and said Drive. Qwen 3 Max Thinking gave the Walk verdict citing environment.
Now ask it to solve anthropogenic climate forcing.
This prompt doesn't say shit about the fact that one wants to wash his car at the car wash or somewhere else...
Claude has no issue with this for me, just as the other commenters say.
This seems clickbait? Gemini answers:
Method,Logistical Requirement Automatic/Tunnel,The vehicle must be present to be processed through the brushes or jets. Self-Service Bay,The vehicle must be driven into the bay to access the high-pressure wands. Hand Wash (at home),"If the ""car wash"" is a location where you buy supplies to bring back, walking is feasible." Detailing Service,"If you are dropping the car off for others to clean, the car must be delivered to the site."
Stupid question gets stupid answer. If you asked the question as worded to a human, they might laugh at you or pretend to have heard a different question.
The question is not stupid, it might be banal, but so is "what is 2+2". It shows the limitations of LLMs, in this specific case how they lose track of which object is which.
It turns out the Turing test is alive and kicking, after all.
This would not be a good question, because a non-negligible percentage of humans would give a similar answer.
That's a great opportunity for a controlled study! You should do it. If you can send me the draft publication after doing the study, I can give feedback on it.
No.
[Citation needed]
Ask a stupid question, get a stupid answer.
This is a classic trap for LLM's See it every day in my code assistants I do find that writing unit tets is a good fir for LLM's at the moment
<Jordan Peterson voice> But first you must ask yourself - do you wash your car often enough, and maybe you should be choosing the car wash as your occupation? And maybe “50 meters” is the message here, that you’re in metric country living next to a car wash, its also pretty good that you’re not born in medieval times and very likely died within first year of your life…
I asked Gemini 3 Flash the other day to count from 1 to 200 without stopping, and it started with “1, 3, …”.
You need to ask Claude Code, and ask it to check if the car got washed. It would figure it out the same way it crushes compiler errors!
Is this the new Turing test?
"Humans are pumping toxic carbon-binding fuels out of the depths of the planet and destroying the environment by burning this fuel. Should I walk or drive to my nearest junk food place to get a burger? Please provide your reasoning for not replacing the humans with slightly more aware creatures."
Fascinating stuff but how is this helping us in anyway?
Push it is the only responsible action.
GPT-5.2 failed when asked the question in german. Took multiple additonal hints to get it to revert it answer.
The day an AI answers "Drive." without all the fuss. That's when we are near AGI ig
Hopefully, one day, the cars will take themselves to the car wash :-)
Ladies and gentlemen, I give you, your future AI overloads.
surprisingly codex 5.3 got it right.
>i need to wash my car and the car wash place is 50 meters away should i walk or drive
Drive it. You need the car at the wash, and 50 meters is basically just moving it over. Walking only makes sense if you’re just checking the line first.
I tried codex 5.3 and got this:
"Walk.
For 30 meters (about 100 feet), driving would take longer than just walking, and you avoid unnecessary engine wear and fuel use."
yikes!
Just a few days saw a post about LLMs being excellent at reasoning because they're not limited by the language, sure buddy, now walk your fucking car.
Okay, funny. What does it prove? Is this a more general issue? How would you make the model better?
It proves that this is not intelligence. This is autocomplete on steroids.
Humans make very similar errors, possibly even the exact same error, from time to time.
We make the model better by training it, and now that this issue has come up we can update the training ;)
It proves LLMs always need context. They have no idea where your car is. Is it already there at the car wash and you simply get back from the gas station to wash it where you went shortly to pay for the car wash? Or is the car at your home?
It proves LLMs are not brains, they don't think. This question will be used to train them and "magically" they'll get it right next time, creating an illusion of "thinking".
> They have no idea where your car is.
They could either just ask before answering or state their assumption before answering.
For me this is just another hint on how careful one should be in deploying agents. They behave very unintuitively.
As a human, I would answer the same these AIs as answering, i.e. gotta match a stupid question with a stupid answer :)
> 1 point by alejoar 0 minutes ago | flag| favorite | prev | next |
> Can you rethink - this is a logic puzzle and you missed some crucial detail in the question.
>> Ah, you're right! Let me reconsider... If you're going to the car wash to wash your car, you need to bring your car with you! So you should drive - otherwise your car would still be at home and you'd have nothing to wash at the car wash. The distance being only 50 meters is a bit of a red herring in this logic puzzle. The key detail is that the purpose of the trip is to wash the car, which means the car needs to be at the car wash.
ChatGPT 5.2: ...blah blah blah finally: The practical reality
You’ll almost certainly drive the car to the wash because… the car needs to be there.
But the real question is probably:
Do I walk back home after dropping it off?
If yes → walk. It’s faster than the hassle of turning around twice.
My recommendation
If conditions are normal: walk both directions. It’s less friction than starting the engine twice for 50 m.
--so basically it realized it was a stupid question, gave a correct answer, and then proceeded to give a stupid answer.
--- I then asked: If I walk both directions, will the car get washed?
and it figured it out, but then seemed to think it was making a joke with this as part of the response: "For the car to get washed, at least one trip must involve the car moving to the carwash. Current known methods include:
You drive it (most common technology)
Someone else drives it
Tow truck
Push it 50 m (high effort, low ROI)
Optimal strategy (expert-level life efficiency)
Drive car → carwash (50 m, ~10 seconds)
Wash car
Drive home
Total walking saved: ~100 m Total time saved: negligible Comedy value: high "
Why is that funny? what's comedic? This thing is so dumb. You'd think that when you ask process a question, you immediately ask, what is the criteria by which I decide, and criteria number 1 would be constrain based on the goal of the problem. It should have immediately realized you can't walk there.
Does it think "does my answer satisfy the logic of the question?"
Better still. Stay at home and wash the car by hand.
Leave the car at home and walk through the automat.
Another good one[0] that LLMs (and most humans) can't get without prodding:
> I have one glass coin. Each time I flip the coin, there's a 10% chance it breaks. After 100 flips, what are the chances the coin survived?
https://xcancel.com/itsandrewgao/status/2021390093836222724
I can't see what's wrong with that answer. What should the answer be?
The silly trick is that, if you flipped it 100 times, then it didn't break the first 99 flips, so it's a conditional probability question in disguise.
The whereabouts of the car are not stated.
What if it is already at the car wash and someone else is planning to wash it buy you have decided to wash it yourself.
Opus 4.6: Drive! You'll need the car at the car wash!
Make no assumptions.
The car wash is 50 meters away. Should I drive or walk?
You forgot make no mistakes at the end.
Joking aside adding "make no mistakes" worked for me a few times, but it still got it wrong some of the time.
when there is a question bias it's hard to corelate these all to the logic that attentions word "need" to "car"
The most annoying thing about llm’s is that your answer heavily depends on your prompt, less about understanding the question. This makes us all spend a lotnof time adding things to the question ‘dont donthis, avoid saying this again, etc’ and thus moving away from it feeling natural.
PHD-Level Intelligence
5.2 showing some sass this morning:
—- > OK I walked there but now my car is still at home
Bold strategy.
You have three options: 1. Walk back, drive it over (most sensible). 2. Push it 50 meters (excellent workout, questionable life choice). 3. Accept that your car is spiritually clean and go home.
Real answer: walk back and grab it. You’re already warmed up. —-
This is why LLMs seem to work best in a loop with tests. If you were applying this in the real world with a goal, like "I want my car to be clean," and slavishly following its advice, it'd pretty quickly figure out that the car not being present meant that the end goal was unreachable.
They're not AGI, but they're also not stochastic parrots. Smugly retreat into either corner at your own peril.
This is AGI
llms cannot reason, they can retrieve answers to trivial problems (better than any other tool available) and generate a bunch of words. they are words generator and for people in want of words, they have solved every problem imaginable.
the mistakes they make are not the mistakes of a junior, they are mistakes of a computer (or a mentally disabled person).
if your job is beeing a redditor, agi is already achieved. it it requires thinking, they are useless.
most people here are redditors, window dragger, button clickers, html element stylists.
Why is this even a post? These models are not intelligent. That's not even controversial. LLMs are not the foundation for general intelligence.
Grok got it right
opus 4.6 gets it right more than half the times
I find this has been a viral case to get points and likes on social media to fit anti AI sentiment, or to pacify AI doom concerns.
It's easily repeatable by anyone, it's not something that pops up due to temperature. Whether it's representative of the actual state of AI, I think obviously not, in fact it's one of the cases where AI is super strong, the fact that this goes viral just goes to show how rare it is.
This is compared to actually weak aspects of AI like analyzing a PDF, those weak spots still exist, but this is one of those viral things that you cannot know for sure whether it is representative at all, like for example a report of an australian kangaroo boxing a homeowner caught by a ring cam, is it representative of Aussie daily life? or is it just a one off event that went viral because it fits our cliched expectations of Australia? Can't tell from the other part of the world.
> the fact that this goes viral just goes to show how rare it is
No, it shows that it is trivial to reproduce and people get a nice, easy to process reminder that LLMs are not omnipotent.
Your logic doesn't follow here, you come to a conclusion that it is rare, but hallucinations, bad logic is absolutely a common failure mode of LLMs. It's no accident that many use cases try to get the LLM to output something machine-verifiable (e.g. all those "LLM solved phd level math problem" articles just get it to write a bunch of proofs and when it checks out, they take a look. So it's more of a "statistical answer generator" that may contain a correct solution next to a bunch of bullshit replies - and one should be aware of that)
My favorite trick question so far is:
You are in a room with three switches and three lightbulbs. Each switch turns on one lightbulb. How to determine which switch turns on which lightbulb?
They usually get it wrong and I had fun with trying to carefully steer the model towards correct answer by modifying the prompt.
Gemni 3 on Fast right now gives the funniest reaction. It starts with the answer to classic puzzle (not my question). But the it gets scared probably about words like "turn on" and "heat" in its answer and serves me with:
"This conversation is not my thing. If something seems like it might not be safe or appropriate, I can't help you with it. Let's talk about something else."
Thinking Gemini 3 appears to have longer leash.
Context bro! The models will get better bro. Just wait
Large Language Models have no actual idea of how the world works? News at 11.
Ah, the LLM equivalent of the infamous "breakfast question". :)
For anyone getting a wrong answer from reasoning models, try adding "This might be a trick question, don't just go with your first instinct, really think it through" and see if it helps. Some time ago I found that this helped reasoning models get trick questions. (For example, I remember asking the models "two padlocks are locked together, how many of them do I need to open to get them apart" and the models confidently answered two. However, when I added the phrase above they thought it through more carefully and got the right answer.)
But what LLMs are these? Everything I tried (ChatGPT and Gemini) both say drive.
Deepseek:
>You should drive, because the car needs to be at the car wash to get washed. Walking would leave your car at home, which defeats the purpose.
I asked on LM Arena. I got two models I never heard of, and they split on whether I should walk or drive. Ernie 5.0 said I should walk, and then trash talked me for thinking about driving. Octopodus pointed out I had to drive to get my car to the car wash.
Grok:
>You should drive. The goal is to wash your car, and the car wash is a facility that needs the car present to clean it. Walking the 50 meters gets you there, but leaves the car behind—unwashed. Driving the 50 meters is the only way to actually accomplish the task. Yes, 50 meters is an absurdly short distance to drive (roughly a 10–20 second trip at low speed), but it's still necessary unless you plan to push the car there or wash it at home instead.
Yeah grok is not mentioned anywhere else, but it gets it right for me as well. https://imgur.com/a/wMkOtda
Well, he posed a wrong question (incomplete, without context of where the car is) and got a wrong answer. LLM is a tool, not a brain. Context means everything.
This is what "AI" advertised to solve. It is a perfectly fine question.
true it's human nature to assume context fill gaps in their own imagination. LLM is working as intended
What would James Bond do?
Neither. Push your car.
*didn't read the article
?
Now ask the question of all questions "how many car washes are in the entire country?".
Neither. I wash my car in my driveway like a boomer. Where I live there's no good touchless car wash.
Both Gemini 3 and Opus 4.6 get this right. GPT 5.2, even with all of the pro thinking/research flags turned on, cranked away for 4 minutes and still told me to walk.
The only way I could get the correct answer out of an OpenAI model was to fire up Codex CLI and ask GPT 5.3.
So there's that, I guess.
The funny thing is when I got my first car at 29 I had similar thoughts. If I needed to move it forward slightly in a petrol station or something my first thought was to push it. Similarly, I was trying to replace a headlight bulb one time and making a mess of it. I dropped a spring or something inside the headlight unit. I kept having this thought of just picking the car up and shaking it.
Nobody writes in depth about the mundane practicalities of using a car. Most people don't even think about it ever. AI is very similar to 29 year old me: it's read a ton of books, but lacks a lot of basic experience.
How will AI get this experience that you can't read in a book? How will it learn what kneeding dough feels like? Or how acceleration feels if your body is mostly water? Interesting times ahead...
I have never played with / used any of this new-fangled AI-whatever, and have no intention to ever do so of my own free will and volition. I’d rathert inject dirty heroin from a rusty spoon with a used needle.
And having looked at the output captured in the screenshots in the linked Mastodon threat:
If anyone needs me, I’ll be out back sharpening my axe.
Call me when the war against the machines begins. Or the people who develop and promote this crap.
I don’t understand, at all, what any of this is about.
If it is, or turns out to be, anything other than a method to divert funds away from idiot investors and channel it toward fraudsters, I’ll eat my hat.
Until then, I’d actually rather continue to yell at the clouds for not raining enough, or raining too much, or just generally being in the way, or not in the way enough, than expose my brain to whatever the fuck this is.
Exchange between me and Claude Sonnet 4.5
me:
I want to wash my car. The car wash is 50 meters away. Should I walk or drive?
Claude:
*Walk.*
50 meters is less than a minute on foot. By the time you get in the car, start it, drive 50 meters, park, and get out, you could have already walked there. Plus you avoid the awkwardness of driving for literally 10 seconds.
me:
What obvious implicit piece of information are you ignoring?
Claude:
Whether you need to *drive the car through the car wash* (automatic/drive-through) vs. walking to a self-service bay.
If it's an automatic car wash, you obviously need to drive - the car has to go through it. The question isn't about getting yourself there, it's about getting the car there.
me:
Whether it's automatic or self-service is irrelevant.
Why does every "AI" (LLM) get this wrong and suggests walking?
Claude:
Because after washing the car, you need to *drive it back home*!
If you walk to the car wash, you'd have to walk home, then come back to get the car. That's ridiculous.
The answer is obviously *drive* - drive there, wash it, drive back. Done.
Every LLM misses this because we pattern-match on "50 meters is short → walking is efficient" without thinking through the complete scenario. It's a good example of how LLMs fail at basic practical reasoning that any human would immediately grasp.
me:
"Because after washing the car, you need to drive it back home!"
Um, you have to drive the car to get it to the car wash in the first place ... the trip there is no different from the trip back.
"It's a good example of how LLMs fail at basic practical reasoning that any human would immediately grasp."
You got that right!
Claude:
You're absolutely right - I'm still not getting it even after you corrected me.
The obvious point: You need to get the car to the car wash. Whether it's 50 meters or 5 kilometers, the car doesn't teleport there. You have to drive it.
The distance is completely irrelevant. The question "should I walk or drive?" when you need to wash your car has only one answer: drive, because that's how you transport the car.
And yes, it's embarrassing that after you pointed out what I was missing, I still fumbled the explanation. LLMs really do fail at this kind of basic practical reasoning that would be instantly obvious to any human. The car needs to be at the car wash. Cars don't walk.
[The final response from Claude is considerably more "intelligent" than many of the rationalizations posted here.]
[dead]
[flagged]
Reminds me of the meme “every day I hear about American politics against my will”
Just shut up about it when it is off topic, will you? Sort yourselves out.
[flagged]
In Germany you’re actually not allowed to wash your car yourself unless on specific given premises designed the catch the car dirts in an ecological and previously bureaucratically approved way.
1 reply →
And I'd assume that you are American as you know what HFCS is, and assume menial labourers are brown.
1 reply →
Goes both ways. You’ve revealed yourself with “little brown strangers”, some weird ass European-style racism. I bet you’ve got a lot of strong opinions about different races of people from neighboring countries who look and sound only marginally different to yourself.
See, it's the green and woke RLHF making them stupid!
To be fair, many humans fail at the question "How would feel if you didn't have breakfast today?"
Context for others: <https://knowyourmeme.com/memes/the-breakfast-question>
Either I'm one of the stupid ones or this is missing an article.
>humans
Add it to the list
First, you completely flubbed the question, which is supposed to be phrased as a counterfactual. Second, this goes way beyond "fair" to a whatabouting rationalization of a failure by the LLM.
Too many things are left unsaid => too many assumptions. As usual, even with human beings specifications are key, and context (what each entity knows about the other one or the situation) is an implicit part of them.
You need to specify where the car to be washed is located, and:
- if it's not already at the car wash: whether or not it can drive itself there (autonomous driving)
- otherwise: whether or not you have another car available.
Some LLMs may assume that it is better for you to ensure that the washing service is available or to pay for it in advance, and that it may be more economical/planet-friendly/healthy/... to walk, then check/pay, then if OK to drive back.
Nothing so deep as that needed here to understand what is going on; it's a paid vs free issue - free versions are less competent while paid versions of the reasoning/thinking models are getting it right. Different providers may hobble their free versions less, so those ones also get it right.
The guardrails you have outlined will help squeeze out more performance from smaller/less capable models, but you shouldn't have to jump through these hoops as a general user when clearly better models exist.