Comment by KronisLV
8 days ago
> Us having to specify things that we would never specify when talking to a human.
The first time I read that question I got confused: what kind of question is that? Why is it being asked? It should be obvious that you need your car to wash it. The fact that it is being asked in my mind implies that there is an additional factor/complication to make asking it worthwhile, but I have no idea what. Is the car already at the car wash and the person wants to get there? Or do they want to idk get some cleaning supplies from there and wash it at home? It didn't really parse in my brain.
I would say, the proper response to this question is not "walk, blablablah" but rather "What do you mean? You need to drive your car to have it washed. Did I miss anything?"
Yes, this is what irks me about all the chatbots, and the chat interface as a whole. It is a chat-like UX without a chat-like experience. Like you are talking to a loquacious autist about their favorite topic every time.
Just ask me a clarifying question before going into your huge pitch. Chats are a back & forth. You don’t need to give me a response 10x longer than my initial question. Etc
I think for "GPT-4o is my life partner" reasons, labs are a little bit icey about making the models overly human.
9 replies →
>You don’t need to give me a response 10x longer than my initial question.
Except, of course, when that is exactly what the user wants.
1 reply →
> Like you are talking to a loquacious autist about their favorite topic every time
That's the best part.
2 replies →
With ChatGPT, at least, you can tell the bot to work that way using [persistent] Custom Instructions, if that's what you want. These aren't obeyed perfectly (none of the instructions are, AFAICT), but they do influence behavior.
A person can even hammer out an unstructured list of behavioral gripes, tell the bot to organize them into instructional prose, have it ask clarifying questions and revise based on answers, and produce directions for integrating them as Custom Instructions.
From then on, it will invisibly read these instructions into context at the beginning of each new chat.
Mold it and steer it to be how you want it to be.
(My own bot tends to be very dry, terse, non-presumptuous, pragmatic, and profane. It's been years now since it has uttered an affirmation like "That's a great idea!" or "Wow! My circuits are positively buzzing with the genius I'm seeing here!" or produced a tangential dissertation in response to a simple question. But sometimes it does come back with functional questions, or phrasing like "That shit will never work. Here's why.")
This. Nailed it.
That’s why I don’t understand why LLMs don’t ask clarifying questions more often.
In a real human to human conversation, you wouldn’t simply blurt out the first thing that comes to mind. Instead, you’d ask questions.
This is a great point, because when you ask it (Claude) if it has any questions, it often turns out it has lots of good ones! But it doesn't ask them unless you ask.
That's because it doesn't really have any questions until you ask it whether it does.
6 replies →
you can get it to change by putting instructions to ask questions in the system prompt but I found it annoying at a while
Because 99% of the time it's not what users want.
You can get it to ask you clarifying questions just by telling it to. And then you usually just get a bunch of questions asking you to clarify things that are entirely obvious, and it quickly turns into a waste of time.
The only time I find that approach helpful is when I'm asking it to produce a function from a complicated English description I give it where I have a hunch that there are some edge cases that I haven't specified that will turn out to be important. And it might give me a list of five or eight questions back that force me to think more deeply, and wind up being important decisions that ensure the code is more correct for my purposes.
But honestly that's pretty rare. So I tell it to do that in those cases, but I wouldn't want it as a default. Especially because, even in the complex cases like I describe, sometimes you just want to see what it outputs before trying to refine it around edge cases and hidden assumptions.
Google Gemini often gives an overly lengthy response, and then at the end asks a question. But the question seems designed to move on to some unnecessary next step, possibly to keep me engaged and continue conversing, rather than seeking any clarification on the original question.
This is a topic that I’ve always found rather curious, especially among this kind of tech/coding community that really should be more attuned to the necessity of specificity and accuracy. There seems to be a base set of assumptions that are intrinsic to and a component of ethnicities and cultures, the things one can assume one “wouldn’t never specify when talking to a human [of one’s own ethnicity and culture].”
It’s similar to the challenge that foreigners have with cultural references and idioms and figurative speech a culture has a mental model of.
In this case, I think what is missing are a set of assumptions based on logic, e.g., when stating that someone wants to do something, it assumes that all required necessary components will be available, accompany the subject, etc.
I see this example as really not all that different than a meme that was common among I think the 80s and 90s, that people would forget buying batteries for Christmas toys even though it was clear they would be needed for an electronic toy. People failed that basic test too, and those were humans.
It is odd how people are reacting to AI not being able to do these kinds of trick questions, while if you posted something similar about how you tricked some foreigners you’d be called racist, or people would laugh if it was some kind of new-guy hazing.
AI is from a different culture and has just arrived here. Maybe we’re should be more generous and humane… most people are not humane though, especially the ones who insist they are.
Frankly, I’m not sure it bodes well for if aliens ever arrive on Earth, how people would respond; and AI is arguably only marginally different than humans, something an alien life that could make it to Earth surely would not be.
AI isn’t “from a different culture”. It doesn’t have culture. Any culture it does have is what it has sucked up from its training data and set in its weights.
There is no need to be “humane” to AI because it possess no humanity. It has no personhood at all. It can’t feel. You can’t be inhumane to something that is literally incapable of feeling.
A blade of grass has more humanity and is more deserving of respect than anything being referred to as AI does.
Aliens might not be received well but it’s going to depend a lot on how they show up.
AI is a “revolution” where the promise is that nobody will have to do meaningless work anymore ( I guess).
The only problem is right now basically everyone has to do work meaningful or “meaningless” because the dominant thinking requires it for human survival. Weird how most people aren’t happy for the thing that is pitched to take away the meager scraps they get under the current regime.
> A blade of grass has more humanity and is more deserving of respect than anything being referred to as AI does.
Emphatically disagree.
Even ignoring the obvious absurdity in this statement by pointing out that an LLM is emulating a human (quite well!) and a blade of grass is not:
I don't trust any human who can interact with something that uses the same method of communication as a human, and for all intents and purposes communicates like a human, and not feel any instinct to treat it with respect.
This is the kind of mindset that leads to dehumanizing other humans. Our brain isn't sophisticated enough to actually compartmentalize that - building the habit that it's right to treat something that talks like a sapient as if it deserves zero respect is going to have negative consequences.
Sure, you can believe it's a just a tool, and consciously let yourself treat it as one. But treat it like an incompetent intern, not a slave.
1 reply →
Whether you view the question as nonsensical, the most simple example of a riddle, or even an intentional "gotcha" doesn't really matter. The point is that people are asking the LLMs very complex questions where the details are buried even more than this simple example. The answers they get could be completely incorrect, flawed approaches/solutions/designs, or just mildly misguided advice. People are then taking this output and citing it as proof or even objectively correct. I think there are ton of reasons this could be but a particularly destructive reason is that responses are designed to be convincing.
You _could_ say humans output similar answers to questions, but I think that is being intellectually dishonest. Context, experience, observation, objectivity, and actual intelligence is clearly important and not something the LLM has.
It is increasingly frustrating to me why we cannot just use these tools for what they are good for. We have, yet again, allowed big tech to go balls deep into ham-fisting this technology irresponsibly into every facet of our lives the name of capital. Let us not even go into the finances of this shitshow.
Yeah people are always like "these are just trick questions!" as though the correct mode of use for an LLM is quizzing it on things where the answer is already available. Where LLMs have the greatest potential to steer you wrong is when you ask something where the answer is not obvious, the question might be ill-formed, or the user is incorrectly convinced that something should be possible (or easy) when it isn't. Such cases have a lot more in common with these "nonsensical riddles" than they do with any possible frontier benchmark.
This is especially obvious when viewing the reasoning trace for models like Claude, which often spends a lot of time speculating about the user's "hints" and trying to parse out the intent of the user in asking the question. Essentially, the model I use for LLMs these days is to treat them as very good "test takers" which have limited open book access to a large swathe of the internet. They are trying to ace the test by any means necessary and love to take shortcuts to get there that don't require actual "reasoning" (which burns tokens and increases the context window, decreasing accuracy overall). For example, when asked to read a full paper, focusing on the implications for some particular problem, Claude agents will try to cheat by skimming until they get to a section that feels relevant, then searching directly for some words they read in that section. They will do this even if told explicitly that they must read the whole paper. I assume this is because the vast majority of the time, for the kinds of questions that they are trained on, this sort of behavior maximizes their reward function (though I'm sure I'm getting lots of details wrong about the way frontier models are trained, I find it very unlikely that the kinds of prompts that these agents get very closely resemble data found in the wild on the internet pre-LLMs).