Comment by keiferski
2 days ago
The thing that bothers me the most about LLMs is how they never seem to understand "the flow" of an actual conversation between humans. When I ask a person something, I expect them to give me a short reply which includes another question/asks for details/clarification. A conversation is thus an ongoing "dance" where the questioner and answerer gradually arrive to the same shared meaning.
LLMs don't do this. Instead, every question is immediately responded to with extreme confidence with a paragraph or more of text. I know you can minimize this by configuring the settings on your account, but to me it just highlights how it's not operating in a way remotely similar to the human-human one I mentioned above. I constantly find myself saying, "No, I meant [concept] in this way, not that way," and then getting annoyed at the robot because it's masquerading as a human.
LLMs all behave as if they are semi-competent (yet eager, ambitious, and career-minded) interns or administrative assistants, working for a powerful CEO-founder. All sycophancy, confidence and positive energy. "You're absolutely right!" "Here's the answer you are looking for!" "Let me do that for you immediately!" "Here is everything I know about what you just mentioned." Never admitting a mistake unless you directly point it out, and then all sorry-this and apologize-that and "here's the actual answer!" It's exactly the kind of personality you always see bubbling up into the orbit of a rich and powerful tech CEO.
No surprise that these products are all dreamt up by powerful tech CEOs who are used to all of their human interactions being with servile people-pleasers. I bet each and every one of them are subtly or overtly shaped by feedback from executives about how they should respond to conversation.
I agree entirely, and I think it's worthwhile to note that it may not even be the LLM that has that behavior. It's the entire deterministic machinery between the user and the LLM that creates that behavior, with the system prompt, personality prompt, RLHF, temperature, and the interface as a whole.
LLMs have an entire wrapper around them tuned to be as engaging as possible. Most people's experience of LLMs is a strongly social media and engagement economy influenced design.
> "You're absolutely right!" "Here's the answer you are looking for!" "Let me do that for you immediately!" "Here is everything I know about what you just mentioned." Never admitting a mistake unless you directly point it out, and then all sorry-this and apologize-that and "here's the actual answer!" It's exactly the kind of personality you always see bubbling up into the orbit of a rich and powerful tech CEO.
You may be on to something there: the guys and gals that build this stuff may very well be imbibing these products with the kind of attitude that they like to see in their subordinates. They're cosplaying the 'eager to please' element to the point of massive irritation and left out the one feature that could serve to redeem such behavior which is competence.
Maybe it’s just the fact that many models are trained by americans? I’ve seen great improvements in answers by asking it to “tone it down, answer like you’re British”.
1 reply →
> the guys and gals that build this stuff may very well be imbibing these products with the kind of attitude that they like to see in their subordinates
Or that the individual developers see in themselves. Every team I've worked with in my career had one or two of these guys: When the Director or VP came in to town, they'd instantly launch into brown-nose mode. One guy was overt about it and would say things like "So-and-so is visiting the office tomorrow--time to do some petting!" Both the executive and the subordinate have normalized the "royal treatment" on the giving and receiving end.
An alternative is that these patterns just increase the likelihood of the next thing it outputs being correct, thus are useful to insert during training as the first thing the model says before giving an answer
3 replies →
Analogies of LLMs to humans obfuscates the problem. LLMs aren't like humans of any sort in any context. They're chat bots. They do not "think" like humans and applying human-like logic to them does not work.
You're right, mostly, but the fact remains that the behavior we see is produced by training, and the training is driven by companies run by execs who like this kind of sycophancy. So it's certainly a factor. Humans are producing them, humans are deciding when the new model is good enough for release.
15 replies →
It’s not about thinking, it’s about what they are trained to do. You could train a LLM to always respond to every prompt by repeating the prompt in Spanish, but that’s not the desired behavior.
I don't think these LLMs were explicitly designed based on the CEO's detailed input that boils down to 'reproduce these servile yes-men in LLM form please'.
Which makes it more interesting. Apparently reddit was a particularly hefty source for most LLMs; your average reddit conversation is absolutely nothing like this.
Separate observation: That kind of semi-slimey obsequious behaviour annoys me. Significantly so. It raises my hackles; I get the feeling I'm being sold something on the sly. Even if I know the content in between all the sycophancy is objectively decent, my instant emotional response is negative and I have to use my rational self to dismiss that part of the ego.
But I notice plenty of people around me that respond positively to it. Some will even flat out ignore any advice if it is not couched in multiple layers of obsequious deference.
Thus, that raises a question for me: Is it innate? Are all people placed on a presumably bell-curve shaped chart of 'emotional response to such things', with the bell curve quite smeared out?
Because if so, that would explain why some folks have turned into absolute zealots for the AI thing, on both sides of it. If you respond negatively to it, any serious attempt to play with it should leave you feeling like it sucks to high heavens. And if you respond positively to it - the reverse.
Idle musings.
The servile stuff was trained into them with RLHF with the trainers largely being low-wage workers in the global south. That's also where some of the other stuff like excessive em-dash stuff came from. I think it's a combination of those workers anticipating how they would be expected to respond by a first-world employer, and also explicit instructions given to them about how the robot should be trained.
5 replies →
This is a really interesting observation, as someone who feels disquiet as the obsequiousness, but have been getting used to just mentally skipping over the first paragraph that's put an interesting spin on my behaviour
Thanks!
It’s not innate. Purpose trained llm can be quite stubborn and not very polite.
thats the audience! Incompetent CEOS!
Nearly every woman I know who is an English as a second language speaker is leaning hard into these things currently to make their prose sound more natural. And that has segued into them being treated almost as a confidant or a friend.
As flawed as they are currently, I remain astounded that people think they will never improve and that people don't want a plastic pal who's fun to be with(tm).
I find them frustrating personally, but then I ask them deep technical questions on obscure subjects and I get science fiction in return.
4 replies →
As an EE working in engineering 30 years, I ran out of fingers and toes 29 years ago trying to count the number of asocial, incompetent programmer Dark Triads who can only relate to the world through esoteric semantics unrelated to engineering problems right in front of them.
"To add two numbers I must first simulate the universe." types that created a bespoke DSL for every problem. Software engineering is a field full of educated idiots.
Programmers really need to stop patting themselves on the back. Same old biology with the same old faults. Programmers are subjected to the same old physics as everyone else.
The problem with these LLM chat-bots is they are too human, like a mirror held up to the plastic-fantastic society we have morphed into. Naturally programmed to serve as a slave to authority, this type of fake conversation is what we've come to expect as standard. Big smiles everyone! Big smiles!!
Nah. Talking like an LLM would get you fired in a day. People are already suspicious of ass-kissers, they hate it when they think people are not listening to them, and if you're an ass-kisser who's not listening and is then wrong about everything, they want you escorted out by security.
The real human position would be to be an ass-kisser who hangs on every word you say, asks flattering questions to keep you talking, and takes copious notes to figure out how they can please you. LLMs aren't taking notes correctly yet, and they don't use their notes to figure out what they should be asking next. They're just constantly talking.
2 replies →
There is sort of the opposite problem as well, as the top comment was saying, where it can super confidently propose that its absolutely right and you're wrong instead of asking questions to try and understand what you mean.
Looking forward to living in a society where everyone feels like they’re CEOs.
> LLMs all behave as if they are semi-competent
Only in the same way that all humans behave the same.
You can prompt an LLM to talk to you however you want it to, it doesn't have to be nice to you.
Isn’t it kind of true that the systems we as servile people-pleasers have to operate out of are exactly these? The hierarchical status games and alpha-animal tribal dynamics are these. Our leaders who are so might and rich and powerful want to keep their position, and we don’t want to admit they have more influence than we do for things like AI now and so we stand and watch naively as they reward the people pleasers and eventually historically we learn(ed) it pays to please until leadership changes.
> LLMs all
Sounds like you don't know how RLHF works. Everything you describe is post-training. Base models can't even chat, they have to be trained to even do basic conversational turn taking.
> Everything you describe is post-training. Base models can't even chat, they have to be trained to even do basic conversational turn taking.
So, that's still training then, so not 'post-training'. Just a different training phase.
1 reply →
This is partly true, partly false, partly false in the opposite direction, with various new models. You really need to keep updating and have tons of interactions regularly in order to speak intelligently on this topic.
maybe this is also part of the problem? Once I learn the idiosyncrasies of a person I don't expect them to dramatically change overnight, I know their conversational rhythms and beat; how to ask / prompt / respond. LLMs are like a eager sycophantic intern how completely changes their personality from conversation to conversation, or - surprise - exactly like a machine
1 reply →
> LMs don't do this. Instead, every question is immediately responded with extreme confidence with a paragraph or more of text.
Having just read a load of Quora answers like this, which did not cover the thing I was looking for, that is how humans on the internet behave and how people have to write books, blog posts, articles, documentation. Without the "dance" to choose a path through a topic on the fly, the author has to take the burden of providing all relevant context, choosing a path, explaining why, and guessing at any objections and questions and including those as well.
It's why "this could have been an email" is a bad shout. The summary could have been an email, but the bit which decided on that being the summary would be pages of guessing all the things which what might have been in the call and which ones to include or exclude.
This is a recent phenomenon. It seems most of the pages today are SEO optimized LLM garbage with the aim of having you scroll past three pages of ads.
THe internet really used to be efficient and i could always find exactly what i wanted with an imprecise google search ~ 15 years ago.
Don’t you get this today with AI Overviews summarizing everything on top of most Google results?
6 replies →
You'd think with the reputation of LLMs being trained on Twitter (pre-Musk radicalization) and Reddit, they'd be better at understanding normal conversation flow since twitter requires short responses and Reddit... while Wall of Text happens occasionally, it's not the typical cadence of the discussion.
2 replies →
Interesting. Like many people here, I've thought a great deal about what it means for LLMs to be trained on the whole available corpus of written text, but real world conversation is a kind of dark matter of language as far as LLMs are concerned, isn't it? I imagine there is plenty of transcription in training data, but the total amount of language use in real conversational surely far exceeds any available written output and is qualitatively different in character.
This also makes me curious to what degree this phenomenon manifests when interacting with LLMs in languages other than English? Which languages have less tendency toward sycophantic confidence? More? Or does it exist at a layer abstracted from the particular language?
That's part of it, but I think another part is just the way the LLMs are tuned. They're capable of more conversational tones, but human feedback in post-training biases them toward a writing style that's more of a Quora / StackOverflow / Reddit Q&A style because that's what gets the best ratings during the RLHF process.
Yes you're totally right! I misunderstood what you meant, let me write six more paragraphs based on a similar misunderstanding rather than just trying to get clarification from you
My favorite is when it bounces back and forth between the same two wrong answers, each time admitting that the most recent answer is wrong and going back to the previous wrong answer.
Doesn't matter if you tell it "that's not correct and neither is ____ so don't try that instead," it likes those two answers and it's going to keep using them.
Ha! Just experienced this. It was very frustrating.
3 replies →
The false info baked into its context at that point in the conversation and it will get stuck in a local minima trying to generate a response to the given context.
Once the context is polluted with wrong information, it is almost impossible to get it right again.
The only reliable way to recover is to edit your previous question to include the clarification, and let it regenerate the answer.
ChatGPT offered a "robotic" personality which really improved my experience. My frustrations were basically decimated right away and I quickly switched to a more "You get out of it what you put in" mindset.
And less than two weeks in they removed it and replaced it with some sort of "plain and clear" personality which is human-like. And my frustrations ramped up again.
That brief experiment taught me two things: 1. I need to ensure that any robots/LLMs/mech-turks in my life act at least as cold and rational as Data from Star Trek. 2. I should be running my own LLM locally to not be at the whims of $MEGACORP.
> I should be running my own LLM
I approve of this, but in your place I'd wait for hardware to become cheaper when the bubble blows over. I have a i9-10900, and bought an M.2 SSD and 64GB of RAM in july for it, and get useful results with Qwen3-30B-A3B (some 4-bit quant from unsloth running on llama.cpp).
It's much slower than an online service (~5-10 t/s), and lower quality, but it still offers me value for my use cases (many small prototypes and tests).
In the mean time, check out LLM service prices on https://artificialanalysis.ai/ Open source ones are cheap! Lower on the homepage there's a Cost Efficiency section with a Cost vs Intelligence chart.
I have a 9070 XT (16 GB VRAM) and it is fast with deepseek-r1:14B but I didn't know about that Qwen model. Most of the 'better' models will crash for lack of RAM.
https://dev.to/composiodev/qwen-3-vs-deep-seek-r1-evaluation...
If it runs, it looks like I can get a bit more quality. Thanks for the suggestion.
Sort of a personal modified Butlerian Jihad? Robots / chatbots are fine as long as you KNOW they're not real humans and they don't pretend to be.
I never expected LLMs to be like an actual conversation between humans. The model is in some respects more capable and in some respects more limited than a human. I mean, one could strive for an exact replica of a human -- but for what purpose? The whole thing is a huge association machine. It is a surealistic inspiration generator for me. This is how it works at the moment, until the next break through ...
> but for what purpose?
I recently introduced a non-technical person to Claude Code, and this non-human behavior was a big sticking point. They tried to talk to Claude similar as to a human, presenting it one piece of information at a time. With humans this is generally beneficial, and they will either nod for you to continue or ask clarifying questions. With Claude this does not work well, you have to infodump as much as possible in each message
So even from a perspective of "how do we make this automaton into the best tool", a more human-like conversation flow might be beneficial. And that doesn't seem beyond the technological capabilities at all, it's just not what we encourage in today's RLHF
I often find myself in these situations where I'm afraid that if I don't finish infodumping everything in a single message, it'll go in the wrong direction. So what I've been doing is switching it back to Plan Mode (even when I don't need a plan as such), just as a way of telling it "Hold still, we're still having a conversation".
2 replies →
I haven't tried claude, but Codex manages this fine as long as you prompt it correctly to get started.
A lazy example:
"This goal of this project is to do x. Let's prepare a .md file where we spec out the task. Ask me a bunch of questions, one at a time, to help define the task"
Or you could just ask it to be more conversational, instead of just asking questions. It will do that.
also, this is what chat-style interfaces encourage. Anything where the "enter" key sends the message instead of creating a paragraph block is just hell.
I'm prompting Gemini, and I write:
I have the following code, can you help me analyze it? <press return>
<expect to paste the code into my chat window>
but Gemnini is already generating output, usually saying "I'm waiting for you to enter the code"
3 replies →
I usually do the "drip feed" with ChatGPT, but maybe that's not optimal. Hmm, maybe info dump is a good thing to try.
2 replies →
I hate when I accidentally hit return halfway through writing my prompt and it gives me two pages of advice about some nonsense half sentence.
Clarifying ambiguity in questions before dedicating more resources to search and reasoning about the answer seems both essential and almost trivial to elicit via RLHF.
I'd be surprised if you can't already make current models behave like that with an appropriate system prompt.
The disconnect is that companies are trying desperately to frame LLMs as actual entities and not just an inert tech tool. AGI as a concept is the biggest example of this, and the constant push to "achieve AGI" is what's driving a lot of stock prices and investment.
A strictly machinelike tool doesn't begin answers by saying "Great question!"
Training data is quite literally weighted this way - long responses on Reddit have lots of tokens, and brief responses don't get counted nearly as much.
The same goes for "rules" - you train an LLM with trillions of tokens and try to regulate its behavior with thousands. If you think of a person in high school, grading and feedback is a much higher percentage of the training.
Not to mention that Reddit users seek "confident idiots". Look at where the thoughtful questions that you'd expect to hear in a human social setting end up (hint: Downvoted until they disappear). Users on Reddit don't want to have to answer questions. They want to read the long responses that they can then nitpick. LLMs have no doubt picked up on that in the way they are trained.
> The thing that bothers me the most about LLMs is
What bothers me the most is the seemingly unshakable tendency of many people to anthropomorphise this class of software tool as though it is in any way capable of being human.
What is it going to take? Actual, significant loss of life in a medical (or worse, military) context?
It's the fact that these are competent human language word salad generators that messes with human psychology.
My calculator produces accurate, verifiable results.
My calculator is a great tool but it is not a mathematician. Not by a long shot.
1 reply →
That qualifier only makes the anthropormorphization more sound. Have you actually thought it through? Give an untrained and unspecialized human the power to cause significant loss of life in a medical context in the same exact capacity, and it's all but guaranteed that's the outcome you'll end up with.
I think it's important to be skeptical and push back against a lot of the ridiculous mass-adoption of LLMs, but not if you can't actually make a well-reasoned point. I don't think you realize the damage you do when the people gunning for mass proliferation of LLMs in places they don't belong can only find examples of incoherent critique.
> an untrained and unspecialized human
An untrained and unspecialised human can be trained quickly and reliably for the cost of meals and lodging and will very likely actually try to do the right thing because of personal accountability.
Delegating responsibility to badly-designed or outright unfit-for-purpose systems because of incoherent confidence is plainly a bad plan.
As for the other nuances of your post, I will assume the best intention.
A lot of this, I suspect, on the basis of having worked on a supervised fine-tuning project for one of the largest companies in this space, is that providers have invested a lot of money in fine-tuning datasets that sound this way.
On the project I did work on, reviewers were not allowed to e.g. answer that they didn't know - they had to provide an answer to every prompt provided. And so when auditing responses, a lot of difficult questions had "confidently wrong" answers because the reviewer tried and failed, or all kinds of evasive workarounds because they knew they couldn't answer.
Presumbly these providers will eventually understand (hopefully already has - this was a year ago) that they also need to train the models to understand when the correct answer is "I don't know", or "I'm not sure. I think maybe X, but ..."
Its not the training/tuning, its pretty much the nature of llms. The whole idea is to give a best quess of the token. The more complex dynamics behind the meaning of the words and how those words relate to real world concepts isn't learned.
You're not making any sense. The best guess will often be refusals if they see enough of that in the training data, so of course it is down to training
And I literally saw the effect of this first hand, in seeing how the project I worked on was actively part of training this behaviour into a major model.
As for your assertion they don't learn the more complex dynamics, that was trite and not true already several years ago.
They are purposely trained to be this way.
In a way it's benchmaxxing because people like subservient beings that help them and praise them. People want a friend, but they don't want any of that annoying friction that comes with having to deal with another person.
If you're paying per token then there is a big business incentive for the counterparty to burn tokens as much as possible.
Making a few pennies more from inference is not even on the radar of the labs making frontier models. The financial stakes are so much higher than that for them.
As long as there's no moat (and arguably current LLM inference APIs are far from having one), it arguably doesn't really matter what users pay by.
The only thing I care about are whether the answer helps me out and how much I paid for it, whether it took the model a million tokens or one to get to it.
If I'll pay to get a fixed result, sure. I'd expect a Jevons paradox effect: if LLMs got me results twice as fast for the same cost, I'm going to use it more and end up paying more in total.
Maximizing the utility of your product for users is usually the winning strategy.
Cursor Plan mode works like this. It restricts the LLMs access to your environment and will allow you to iteratively ask and clarify and it’ll piece together a plan that it allows you to review before it takes any action.
ChatGPT deep research does this but it’s weird and forced because it asks one series of questions and then goes off to the races, spending a half hour or more building a report. It’s frustrating if you don’t know what to expect and my wife got really mad the first time she wasted a deep research request asking it “can you answer multiple series of questions?” Or some other functionality clarifying question.
I’ve found Crusor’s plan mode extremely useful, similar to having a conversation with a junior or offshore team member who is eager to get to work but not TOO eager. These tools are extremely useful we just need to get the guard rails and user experience correct.
There are plenty of LLM services that have a conversational style. The paragraph blocks thing is just a style.
My favorite description of an LLM so far is of a typical 37-year-old male Reddit user. And in that sense, we have already created the AGI.
Lately, ChatGPT 5.1 has been less guilty of this and sometimes holds off answering fully and just asks me to clarify what I meant.
This is not necessarily a fundamental limitation. It's a consequence of a fine-tuning process where human raters decide how "good" an answer is. They're not rating the flow of the conversation, but looking at how complete / comprehensive the answer to a one-shot question looks like. This selects for walls of overconfident text.
Another thing the vendors are selecting for is safety / PR risk. If an LLM answers to a hobby chemistry question in a matter-of-factly way, that's a disastrous PR headline in the making. If they open with several paragraphs of disclaimers or just refuse to answer, that's a win.
Its not a magic technology, they can only represent data they were trained on. Naturally most represented data in their training data is NOT conversational. Consider that such data is very limited and who knows how it was labeled if at all during pretraining. But with that in mind, LLM's definitely can do all the things you describe, but a very robust and well tested system prompt has to be used to coax this behavior out. Also a proper model has to be used, as some models are simply not trained for this type of interaction.
a) I find myself fairly regularly irritated by the flow of human-human conversations. In fact, it's more common than not. Of course, I have years of practice handling that more or less automatically, so it rarely raises to the level of annoyance, but it's definitely work I bring to most conversations. I don't know about you but that's not really a courtesy I extend to the LLM.
b) If it is, in fact, just one setting away, then I would say it's operating fairly similarly?
I didn't have the words to articulate some of my frustrations, but I think you summed it up nicely.
For example, there's been many times when they take it too literally instead of looking at the totality of the context and what was written. I'm not an LLM, so I don't have perfect grasp on every vocab term for every domain and it feels especially pandering when they repeat back the wrong word but put it in quotes or bold instead of simply asking if I meant something else.
>LLMs don't do this
They did at the beginning. It used to be that if you wanted a full answer with an intro, bullet points, lists of pros/cons, etc., you had to explicitly ask for it in the prompt. The answers were also a lot more influenced by the tone of the prompt instead of being forced into answering with a specific format like it does right now.
That just means that you need to learn to adapt to the situation: Make your prompt a carefully crafted multi-paragraph description of every detail of the problem and what you want from the solution, with bullet points if appropriate.
Maybe it feels a bit sad that you have follow what the LLM wants, but that's just how any tool works really.
When I expect it to do that I just end my prompt with '. Discuss' - usually this works really well. Not exactly human like - it tries to list all questions and variants at once - but most with good default answers so I only need to engage with a couple of them.
I like Manus's suggested follow-up questions.
In fact, sometimes I screenshot them and use Mac's new built-in OCR to copy them, because Manus gives me three options but they disappear if I click one, and sometimes I really like 2 or even all 3.
By default they don't ask questions. You can craft that behaviour with the system message or account settings. Though they will tend to ask 20 questions at once so you have to request it to limit to one question at a time to get a more natural experience.
The day when the LLM responds to my question with another question will be quite interesting. Especially at work, when someone asks me a question I need to ask for clarifying information to answer the original question fully.
Have you tried adding a system prompt asking for this behavior? They seem to readily oblige when I ask for this (e.g. brainstorming)
Have you used Claude much? It often responds to things with questions
I don't want to talk to a computer like I would a human
I suspect that's because, trained on website content, seo values more text (see recipe websites). So the default response is fluff.
The benchmarks are dumb but highly followed so everyone optimizes for the wrong thing.
Reflect a moment over the fact that LLMs currently are just text generators.
Also that the conversational behavior we see it’s just examples of conversations that we have the model to mimic so when we say “System: you are a helpful assistant. User: let’s talk. Assistant:” it will complete the text in a way that mimics a conversation?.
Yeah, we improved over that using reinforcement learning to steer the text generation into paths that lead to problem solving and more “agentic” traces (“I need to open this file the user talked about to read it and then I should run bash grep over it to find the function the user cited”), but that’s just a clever way we found to let the model itself discover which text generation paths we like the most (or are more useful to us).
So to comment on your discomfort, we (humans) trained the model to spill out answers (there are thousand of human being right now writing nicely though and formatted answers to common questions so that we can train the models on that).
If we try to train the models to mimic long dances into shared meaning we will probably decrease their utility. And we won’t be able anyway to do that because then we would have to have customized text traces for each individual instead of question-answers pairs.
Downvoters: I simplified things a lot here, in name of understanding, so bear with me.
> Reflect a moment over the fact that LLMs currently are just text generators.
You could say the same thing about humans.
No, you actually can't.
Humans existed for 10s to 100s of thousands of years without text. or even words for that matter.
5 replies →
No, you cannot. Our abstract language abilities (especially the written word part) are a very thin layer on top of hundreds of millions of years of evolution in an information dense environment.
3 replies →
The human world model is based on physical sensors and actions. LLMs are based on our formal text communication. Very different!
Just yesterday I observed myself acting on an external stimulus without any internal words (this happens continuously, but it is hard to notice because we usually don't pay attention to how we do things): I sat in a waiting area of a cinema. A woman walked by and dropped her scarf without noticing. I automatically without thinking raised arm and pointer finger towards her, and when I had her attention pointed behind her. I did not have time to think even a single word while that happened.
Most of what we do does not involved any words or even just "symbols", not even internally. Instead, it is a neural signal from sensors into the brain, doing some loops, directly to muscle activation. Without going through the add-on complexity of language, or even "symbols".
Our word generator is not the core of our being, it is an add-on. When we generate words it's also very far from being a direct representation of internal state. Instead, we have to meander and iterate to come up with appropriate words for an internal state we are not even quite aware of. That's why artists came up with all kinds of experiments to better represent our internal state, because people always knew the words we produce don't represent it very well.
That is also how people always get into arguments about definitions. Because the words are secondary, and the further from the center of established meaning for some word you get the more the differences show between various people. (The best option is to drop insisting of words being the center of the universe, even just the human universe, and/or to choose words that have the subject of discussion more firmly in the center of their established use).
We are text generators in some areas, I don't doubt that. Just a few months ago I listened to some guy speaking to a small rally. I am certain that not a single sentence he said was of his own making, he was just using things he had read and parroted them (as a former East German, I know enough Marx/Engels/Lenin to recognize it). I don't want to single that person out, we all have those moments, when we speak about things we don't have any experiences with. We read text, and when prompted we regurgitate a version of it. In those moments we are probably closest to LLM output. When prompted, we cannot fall back on generating fresh text from our own actual experience, instead we keep using text we heard or read, with only very superficial understanding, and as soon as an actual expert shows up we become defensive and try to change the context frame.
1 reply →
How do you reconcile this belief with the fact that we evolved from organisms that had no concept of text?
1 reply →
You could, but you’d be missing a big part of the picture. Humans are also (at least) symbol manipulators.
1 reply →
You just need to be more explicit. Including “ask clarifying questions” in your prompt makes a huge difference. Not sure if you use Claude Code but if you do, use plan mode for almost every task.
When using an LLM for anything serious (such as at work) I have a standard canned postscript along the lines of “if anything about what I am asking is unclear or ambiguous, or if you need more context to understand what I’m asking, you will ask for clarification rather than try to provide an answer”. This is usually highly effective.
Claude doesn't really have this problem.
same experience. i try to learn with it but i can't really tell if what its teaching me is actually correct or merely making up when i challenge it with followup questions.
This drives me nuts when trying to bounce an architecture or coding solution idea off an LLM. A human would answer with something like "what if you split up the responsibility and had X service or Y whatever". No matter how many times you tell the LLM not to return code, it returns code. Like it can't think or reason about something without writing it out first.
> Like it can't think or reason about something without writing it out first.
Setting aside the philosophical questions around "think" and "reason"... it can't.
In my mind, as I write this, I think through various possibilities and ideas that never reach the keyboard, but yet stay within my awareness.
For an LLM, that awareness and thinking through can only be done via its context window. It has to produce text that maintains what it thought about in order for that past to be something that it has moving forward.
There are aspects to a prompt that can (in some interfaces) hide this internal thought process. For example, the ChatGPT has the "internal thinking" which can be shown - https://chatgpt.com/share/69278cef-8fc0-8011-8498-18ec077ede... - if you expand the first "thought for 32 seconds" bit it starts out with:
(aside: that still makes me chuckle - in a question about gravity assists around Jupiter, it notes that its not time-sensitive... and the passage "I’m not sure if I’m in doubt here" is amusing)
However, this is in the ChatGPT interface. If I'm using an interface that doesn't allow internal self-prompts / thoughts to be collapsed then such an interface would often be displaying code as part of its working through the problem.
You'll also note a bit of the system prompt leaking in there - "the instructions say I must browse when in doubt". For an interface where code is the expected product, then there could be system prompts that also get in there that try to always produce code.
I have architectural discussions all the time with coding agents.
> Like it can't think or reason about something without writing it out first.
LLM's neither think nor reason at all.
Right, so LLM companies should stop advertising their models can think and reason.
1 reply →
There are billions of humans. Not every one speaks the same way all the time. The default behavior is trying to be useful for most people.
It's easy to skip and skim content you don't care about. It's hard to prod and prod to get to to say something you do care about it if the machine is traint to be very concise.
Complaining the AI can't read your mind is exceptionally high praise for the AI, frankly.
> When I ask a person something, I expect them to give me a short reply which includes another question/asks for details/clarification. A conversation is thus an ongoing "dance" where the questioner and answerer gradually arrive to the same shared meaning.
You obviously never wasted countless hours trying to talk to other people on online dating apps.
In the US anyway, most adults read at a middle school level.
It's not "masquerading as a human". The majority of humans are functional illiterates who only understand the world through the elementary principles of their local culture.
It's the minority of the human species that take what amounts to little more than arguing semantics that need the reality check. Unless one is involved in work that directly impacts public safety (defined as harm to biology) the demand to apply one concept or another is arbitrary preference.
Healthcare, infrastructure, and essential biological support services are all most humans care about. Everything else the majority see as academic wank.