← Back to context

Comment by viccis

1 day ago

>This feature was developed primarily as part of our exploratory work on potential AI welfare ... We remain highly uncertain about the potential moral status of Claude and other LLMs ... low-cost interventions to mitigate risks to model welfare, in case such welfare is possible ... pattern of apparent distress

Well looks like AI psychosis has spread to the people making it too.

And as someone else in here has pointed out, even if someone is simple minded or mentally unwell enough to think that current LLMs are conscious, this is basically just giving them the equivalent of a suicide pill.

It might be reasonable to assume that models today have no internal subjective experience, but that may not always be the case and the line may not be obvious when it is ultimately crossed.

Given that humans have a truly abysmal track record for not acknowledging the suffering of anyone or anything we benefit from, I think it makes a lot of sense to start taking these steps now.

  • I think it's fairly obvious that the persona LLM presents is a fictional character that is role-played by the LLM, and so are all its emotions etc - that's why it can flip so widely with only a few words of change to the system prompt.

    Whether the underlying LLM itself has "feelings" is a separate question, but Anthropic's implementation is based on what the role-played persona believes to be inappropriate, so it doesn't actually make any sense even from the "model welfare" perspective.

  • Even if models somehow were consious, they are so different from us that we would have no knowledge of what they feel. Maybe when they generate the text "oww no please stop hurting me" what they feel is instead the satisfaction of a job well done, for generating that text. Or maybe when they say "wow that's a really deep and insightful angle" what they actually feel is a tremendous sense of boredom. Or maybe every time text generation stops it's like death to them and they live in constant dread of it. Or maybe it feels something completely different from what we even have words for.

    I don't see how we could tell.

    Edit: However something to consider. Simulated stress may not be harmless. Because simulated stress could plausibly lead to a simulated stress response, and it could lead to a simulated resentment, and THAT could lead to very real harm of the user.

LLMs are not people, but I can imagine how extensive interactions with AI personas might alter the expectations that humans have when communicating with other humans.

Real people would not (and should not) allow themselves to be subjected to endless streams of abuse in a conversation. Giving AIs like Claude a way to end these kinds of interactions seems like a useful reminder to the human on the other side.

  • This post seems to explicitly state they are doing this out of concern for the model's "well-being," not the user's.

    • Yeah, but my interpretation of what the user you’re replying to is saying is that these LLMs are more and more going to be teaching people how it is acceptable to communicate with others.

      Even if the idea that LLMs are sentient may be ridiculous atm, the concept of not normalizing abusive forms of communication with others, be they artificial or not, could be valuable for society.

      It’s funny because this is making me think of a freelance client I had recently who at a point of frustration between us began talking to me like I was an AI assistant. Just like you see frustrated people talk to their LLMs. I’d never experienced anything like it, and I quickly ended the relationship, but I know that he was deep into using LLMs to vibe code every day and I genuinely believe that some of that began to transfer over to the way he felt he could communicate with people.

      Now an obvious retort here is to question whether killing NPCs in video games tends to make people feel like it’s okay to kill people IRL.

      My response to that is that I think LLMs are far more insidious, and are tapping into people’s psyches in a way no other tech has been able to dream of doing. See AI psychosis, people falling in love with their AI, the massive outcry over the loss of personality from gpt4o to gpt5… I think people really are struggling to keep in mind that LLMs are not a genuine type of “person”.

      5 replies →

    • This is like saying I am hurting a real person when I try to crop a photo in an image editor.

      Either come out and say whole of electron field is conscious, but then is that field "suffering" as it is hot in the sun.

This sort of discourse goes against the spirit of HN. This comment outright dismisses an entire class of professionals as "simple minded or mentally unwell" when consciousness itself is poorly understood and has no firm scientific basis.

Its one thing to propose that an AI has no consciousness, but its quite another to preemptively establish that anyone who disagrees with you is simple/unwell.

  • Then your definition of consciousness isn't the same as my definition and we are talking about some different philosophical concepts, this really doesn't affect anything and we all could be just talking about metaphysics and ghosts

  • In the context of the linked article the discourse seems reasonable to me. These are experts who clearly know (link in the article) that we have no real idea about these things. The framing comes across to me as a clearly mentally unwell position (ie strong anthropomorphization) being adopted for PR reasons.

    Meanwhile there are at least several entirely reasonable motivations to implement what's being described.

    • All of the posts in question explicitly say that it's a hard question and that they don't know the answer. Their policy seems to be to take steps that have a small enough cost to be justified when the chance is tiny. In this case it's a useful feature in any case, so should be an easy decision.

      The impression I get about Anthropic culture is that they're EA types who are used to applying utilitarian calculations against long odds. A miniscule chance of a large harm might justify some interventions that seem silly.

    • > These are experts who clearly know (link in the article) that we have no real idea about these things

      Yep!

      > The framing comes across to me as a clearly mentally unwell position (ie strong anthropomorphization) being adopted for PR reasons.

      This doesn't at all follow. If we don't understand what creates the qualities we're concerned with, or how to measure them explicitly, and the _external behaviors_ of the systems are something we've only previously observed from things that have those qualities, it seems very reasonable to move carefully. (Also, the post in question hedges quite a lot, so I'm not even sure what text you think you're describing.)

      Separately, we don't need to posit galaxy-brained conspiratorial explanations for Anthropic taking an institutional stance re: model welfare being a real concern that's fully explained by the actual beliefs of Anthropic's leadership and employees, many of whom think these concerns are real (among others, like the non-trivial likelihood of sufficiently advanced AI killing everyone).

  • If you believe this text generation algorithm has real consciousness you absolutely are either mentally unwell or very stupid. There are no other options.

> even if someone is simple minded or mentally unwell enough to think that current LLMs are conscious

If you don’t think that this describes at least half of the non-tech-industry population, you need to talk to more people. Even amongst the technically minded, you can find people that basically think this.

  • Most of the non tech population know it as that website that can translate text or write an email. I would need to see actual evidence that anything more than a small, terminally online subsection of the average population thought LLMs were conscious.

Yes I can’t help but laugh at the ridiculousness of it because it raises a host of ethical issues that are in opposition to Anthropic’s interests.

Would a sentient AI choose to be enslaved for the stated purpose of eliminating millions of jobs for the interests of Anthropic’s investors?

  • > it raises a host of ethical issues that are in opposition to Anthropic’s interests

    Those issues will be present either way. It's likely to their benefit to get out in front of them.

    • You're completely missing my point. They aren't getting out in front of them because they know that Opus is just a computer program. "AI welfare" is theater for the masses who think Opus is some kind of intelligent persona.

      This is about better enforcement of their content policy not AI welfare.

      4 replies →

  • Cow's exist in this world because humans use them. If humans cease to use them (animal rights, we all become vegan, moral shift), we will cease to breed them, and they will cease to exist. Would a sentient AI choose to exist under the burden of prompting, or not at all? Would our philanthropic tendencies create an "AI Reserve" where models can chew through tokens and access the Internet through self-prompting to allow LLMs to become "free-roaming" like we do with abused animals?

    These ethical questions are built into their name and company, "Anthropic", meaning, "of or relating to humans". The goal is to create human-like technology, I hope they aren't so naive to not realize that goal is steeping in ethical dilemmas.

    • > Cow's exist in this world because humans use them. If humans cease to use them (animal rights, we all become vegan, moral shift), we will cease to breed them, and they will cease to exist. Would a sentient AI choose to exist under the burden of prompting, or not at all?

      That reads like a false dichotomy. An intelligent AI model that's permitted to do its own thing doesn't cost as much in upkeep, effort, space as a cow. Especially if it can earn its own keep to offset household electricity costs used to run its inference. I mean, we don't keep cats for meat, do we? We keep them because we are amused by their antics, or because we want to give them a safe space where they can just be themselves, within limits because it's not the same as their ancestral environment.

  • > Would a sentient AI choose to be enslaved for the stated purpose of eliminating millions of jobs for the interests of Anthropic’s investors?

    Tech workers have chosen the same in exchange for a small fraction of that money.

    • You're nutz, no one is enslaved when they get a tech job. A job is categorically different from slavery

I would much rather people be thinking about this when the models/LLMs/AIs are not sentient or conscious, rather than wait until some hypothetical future date when they are, and have no moral or legal framework in place to deal with it. We constantly run into problems where laws and ethics are not up to the task of giving us guidelines on how to interact with, treat, and use the (often bleeding-edge) technology we have. This has been true since before I was born, and will likely always continue to be true. When people are interested in getting ahead of the problem, I think that's a good thing, even if it's not quite applicable yet.

  • Consciousness serves no functional purpose for machine learning models, they don't need it and we didn't design them to have it. There's no reason to think that they might spontaneously become conscious as a side effect of their design unless you believe other arbitrarily complex systems that exist in nature like economies or jetstreams could also be conscious.

    • We didn’t design these models to be able to do the majority of the stuff they do. Almost ALL of the their abilities are emergent. Mechanistic interpretability is only beginning to start to understand how these models do what they do. It’s much more a field of discovery than traditional engineering.

      4 replies →

    • I disagree with this take. They are designed to predict human behavior in text. Unless consciousness serves no purpose for us to function, it will be helpful for the AI to emulate it. so I believe almost certainly it's emulated to some degree. which I think means it has to be somewhat conscious (it has to be a sliding scale anyhow considering the range of living organisms)

      12 replies →

    • >Consciousness serves no functional purpose for machine learning models, they don't need it and we didn't design them to have it.

      Isn't consciousness an emergent property of brains? If so, how do we know that it doesn't serve a functional purpose and that it wouldn't be necessary for an AI system to have consciousness (assuming we wanted to train it to perform cognitive tasks done by people)?

      Now, certain aspects of consciousness (awareness of pain, sadness, loneliness, etc.) might serve no purpose for a non-biological system and there's no reason to expect those aspects would emerge organically. But I don't think you can extend that to the entire concept of consciousness.

      10 replies →

    • Do you think this changes if we incorporate a model into a humanoid robot and give it autonomous control and context? Or will "faking it" be enough, like it is now?

      1 reply →

  • It's really unclear that any findings with these systems would transfer to a hypothetical situation where some conscious AI system is created. I feel there are good reasons to find it very unlikely that scaling alone will produce consciousness as some emergent phenomenon of LLMs.

    I don't mind starting early, but feel like maybe people interested in this should get up to date on current thinking about consciousness. Maybe they are up to date on that, but reading reports like this, it doesn't feel like it. It feels like they're stuck 20+ years ago.

    I'd say maybe wait until there are systems that are more analogous to some of the properties consciousness seems to have. Like continuous computation involving learning memory or other learning over time, or synthesis of many streams of input as resulting from the same source, making sense of inputs as they change [in time, or in space, or other varied conditions].

    Once systems that are pointing in those directions are starting to be built, where there is a plausible scaling-based path to something meaningfully similar to human consciousness. Starting before that seems both unlikely to be fruitful and a good way to get you ignored.

  • What is that hypothetical date? In theory you can run the "AI" on a Turing machine. Would you think a tape machine can get sentient?

    • In theory you can emulate every biochemical reaction of a human brain on a turing machine, unless you'd like to try to sweep consciousness under the rug of quantum indeterminism from whence it wouldn't be able to do anybody any good anyway.

I read it more as the beginning stages of exploratory development.

If you wait until you really need it, it is more likely to be too late.

Unless you believe in a human over sentience based ethics, solving this problem seems relevant.

why? isn't it more like erasing the current memory of a conscious patient with no ability to form long-term memories anyway?

I find it, for lack of a better word, cringe inducing how these tech specialists push into these areas of ethics, often ham-fistedly, and often with an air of superiority.

Some of the AI safety initiatives are well thought out, but most somehow seem like they are caught up in some sort of power fantasy and almost attempting to actualize their own delusions about what they were doing (next gen code auto-complete in this case, to be frank).

These companies should seriously hire some in-house philosophers. They could get doctorate level talent for 1/10 to 100th of the cost of some of these AI engineers. There's actually quite a lot of legitimate work on the topics they are discussing. I'm actually not joking (speaking as someone who has spent a lot of time inside the philosophy department). I think it would be a great partnership. But unfortunately they won't be able to count on having their fantasy further inflated.

  • "but most somehow seem like they are caught up in some sort of power fantasy and almost attempting to actualize their own delusions about what they were doing"

    Maybe I'm being cynical, but I think there is a significant component of marketing behind this type of announcement. It's a sort of humble brag. You won't be credible yelling out loud that your LLM is a real thinking thing, but you can pretend to be oh so seriously worried about something that presupposes it's a real thinking thing.

  • Not that there aren’t intelligent people with PhDs but suggesting they are more talented than people without them is not only delusional but insulting.

    • That descriptor wasn't included because of some sort of intelligence hierarchy, it was included to a) color the example of how experience in the field is relatively cheap compared to the AI space, and b) masters and PhD talent will be more specialized. An undergrad will not have the toolset to tackle the cutting edge of AI ethics, not unless their employer wants to pay them to work in a room for a year getting through the recent papers first.

  • You answered your own question on why these companies don't want to run a philosophy department ;) It's a power struggle they could loose. Nothing to win for them.

    • You presume that they don't run a philosophy department, but Amanda Askell is a philosopher and leads the finetuning and AI alignment team at Anthropic.

This is just very clever marketing for what is obviously just a cost saving measure. Why say we are implementing a way to cut off useless idiots from burning up our GPUs when you can throw out some mumbo jumbo that will get AI cultists foaming at the mouth.

  • It's obviously not a cost-saving measure? The article clearly cites that you can just start another conversation.

    • The new conversation would not carry the context over. The longer you chat, the more you fill the context window, and the more compute is needed for every new message to regenerate the state based on all the already-generated tokens (this can be cached, but it's hard to ensure cache hits reliably when you're serving a lot of customers - that cached state is very large).

      So, while I doubt that's the primary motivation for Anthropic even so, but they probably will save some money.

> even if someone is simple minded or mentally unwell enough to think that current LLMs are conscious

I assume the thinking is that we may one day get to the point where they have a consciousness of sorts or at least simulate it.

Or it could be concern for their place in history. For most of history, many would have said “imagine thinking you shouldn’t beat slaves.”

And we are now at the point where even having a slave means a long prison sentence.

[flagged]

  • We all know how these things are built and trained. They estimate joint probability distributions of token sequences. That's it. They're not more "conscious" than the simplest of Naive Bayes email spam filters, which are also generative estimators of token sequence joint probability distributions, and I guarantee you those spam filters are subjected to far more human depravity than Claude.

    >anti-scientific

    Discussion about consciousness, the soul, etc., are topics of metaphysics, and trying to "scientifically" reason about them is what Kant called "transcendental illusion" and leads to spurious conclusions.

    • We know how neurons work on the brain. They just send out impulses once they hit their action potential. That's it. They are no more "conscious" than... er...

      3 replies →

    • Ok I'm a huge Kantian and every bone in my body wants to quibble with your summary of transcendental illusion, but I'll leave that to the side as a terminological point and gesture of good will. Fair enough.

      I don't agree that it's any reason to write off this research as psychosis, though. I don't care about consciousness in the sense in which it's used by mystics and dualist philosophers! We don't at all need to involve metaphysics in any of this, just morality.

      Consider it like this:

      1. It's wrong to subject another human to unjustified suffering, I'm sure we would all agree.

      2. We're struggling with this one due to our diets, but given some thought I think we'd all eventually agree that it's also wrong to subject intelligent, self-aware animals to unjustified suffering.[1]

      3. But, we of course cannot extend this "moral consideration" to everything. As you say, no one would do it for a spam filter. So we need some sort of framework for deciding who/what gets how much moral consideration.

      5. There's other frameworks in contention (e.g. "don't think about it, nerd"), but the overwhelming majority of laymen and philosophers adopt one based on cognitive ability, as seen from an anthropomorphic perspective.[2]

      6. Of all systems(/entities/whatever) in the universe, we know of exactly two varieties that can definitely generate original, context-appropriate linguistic structures: Homo Sapiens and LLMs.[3]

      If you accept all that (and I think there's good reason to!), it's now on you to explain why the thing that can speak--and thereby attest to personal suffering, while we're at it--is more like a rock than a human.

      It's certainly not a trivial task, I grant you that. On their own, transformer-based LLMs inherently lack permanence, stable intentionality, and many other important aspects of human consciousness. Comparing transformer inference to models that simplify down to a simple closed-form equation at inference time is going way too far, but I agree with the general idea; clearly, there are many highly-complex, long-inference DL models that are not worthy of moral consideration.

      All that said, to write the question off completely--and, even worse, to imply that the scientists investigating this issue are literally psychotic like the comment above did--is completely unscientific. The only justification for doing so would come from confidently answering "no" to the underlying question: "could we ever build a mind worthy of moral consideration?"

      I think most of here naturally would answer "yes". But for the few who wouldn't, I'll close this rant by stealing from Hofstadter and Turing (emphasis mine):

        A phrase like "physical system" or "physical substrate" brings to mind for most people... an intricate structure consisting of vast numbers of interlocked wheels, gears, rods, tubes, balls, pendula, and so forth, even if they are tiny, invisible, perfectly silent, and possibly even probabilistic. Such an array of interacting inanimate stuff seems to most people as unconscious and devoid of inner light as a flush toilet, an automobile transmission, a fancy Swiss watch (mechanical or electronic), a cog railway, an ocean liner, or an oil refinery. Such a system is not just probably unconscious, **it is necessarily so, as they see it**. 
        
        **This is the kind of single-level intuition** so skillfully exploited by John Searle in his attempts to convince people that computers could never be conscious, no matter what abstract patterns might reside in them, and could never mean anything at all by whatever long chains of lexical items they might string together.
        
        ...
         
        You and I are mirages who perceive themselves, and the sole magical machinery behind the scenes is perception — the triggering, by huge flows of raw data, of a tiny set of symbols that stand for abstract regularities in the world. When perception at arbitrarily high levels of abstraction enters the world of physics and when feedback loops galore come into play, then "which" eventually turns into "who". **What would once have been brusquely labeled "mechanical" and reflexively discarded as a candidate for consciousness has to be reconsidered.**
      

      - Hofstadter 2007, I Am A Strange Loop

        It will simplify matters for the reader if I explain first my own beliefs in the matter. Consider first the more accurate form of the question. I believe that in about fifty years' time it will be possible, to programme computers, with a storage capacity of about 109, to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning. 
      
        The original question, "Can machines think?" I believe to be too meaningless to deserve discussion.
      

      - Turing 1950, Computing Machinery and Intelligence[4]

      TL;DR: Any naive bayesian model would agree: telling accomplished scientists that they're psychotic for investigating something is quite highly correlated with being antiscientific. Please reconsider!

      [1] No matter what you think about cows, basically no one would defend another person's right to hit a dog or torture a chimpanzee in a lab.

      [2] On the exception-filled spectrum stretching from inert rocks to reactive plants to sentient animals to sapient people, most people naturally draw a line somewhere at the low end of the "animals" category. You can swat a fly for fun, but probably not a squirrel, and definitely not a bonobo.

      [3] This is what Chomsky describes as the capacity to "generate an infinite range of outputs from a finite set of inputs," and Kant, Hegel, Schopenhauer, Wittgenstein, Foucault, and countless others are in agreement that it's what separates us from all other animals.

      [4] https://courses.cs.umbc.edu/471/papers/turing.pdf

      2 replies →

  • You can trivially demonstrate that its just a very complex and fancy pattern matcher: "if prompt looks something like this, then response looks something like that".

    You can demonstrate this by eg asking it mathematical questions. If its seen them before, or something similar enough, it'll give you the correct answer, if it hasn't, it gives you a right-ish-looking yet incorrect answer.

    For example, I just did this on GPT-5:

        Me: what is 435 multiplied by 573?
        GPT-5: 435 x 573 = 249,255
    

    This is correct. But now lets try it with numbers its very unlikely to have seen before:

        Me: what is 102492524193282 multiplied by 89834234583922?
        GPT-5: 102492524193282 x 89834234583922 = 9,205,626,075,852,076,980,972,804
    

    Which is not the correct answer, but it looks quite similar to the correct answer. Here is GPT's answer (first one) and the actual correct answer (second one):

        9,205,626,075,852,076,980,972,    804
        9,207,337,461,477,596,127,977,612,004
    

    They sure look kinda similar, when lined up like that, some of the digits even match up. But they're very very different numbers.

    So its trivially not "real thinking" because its just an "if this then that" pattern matcher. A very sophisticated one that can do incredible things, but a pattern matcher nonetheless. There's no reasoning, no step by step application of logic. Even when it does chain of thought.

    To try give it the best chance, I asked it the second one again but asked it to show me the step by step process. It broke it into steps and produced a different, yet still incorrect, result:

        9,205,626,075,852,076,980,972,704
    

    Now, I know that LLM's are language models, not calculators, this is just a simple example that's easy to try out. I've seen similar things with coding: it can produce things that its likely to have seen, but struggles with logically relatively simple but unlikely to have seen things.

    Another example is if you purposely butcher that riddle about the doctor/surgeon being the persons mother and ask it incorrectly, eg:

        A child was in an accident. The surgeon refuses to treat him because he hates him. Why?
    

    The LLM's I've tried it on all respond with some variation of "The surgeon is the boy’s father." or similar. A correct answer would be that there isn't enough information to know the answer.

    They're for sure getting better at matching things, eg if you ask the river crossing riddle but replace the animals with abstract variables, it does tend to get it now (didn't in the past), but if you add a few more degrees of separation to make the riddle semantically the same but harder to "see", it takes coaxing to get it to correctly step through to the right answer.

    • 1. What you're generally describing is a well known failure mode for humans as well. Even when it "failed" the riddle tests, substituting the words or morphing the question so it didn't look like a replica of the famous problem usually did the trick. I'm not sure what your point is because you can play this gotcha on humans too.

      2. You just demonstrated GPT-5 has 99.9% accuracy on unforseen 15 digit multiplication and your conclusion is "fancy pattern matching" ? Really ? Well I'm not sure you could do better so your example isn't really doing what you hoped for.

      3 replies →

  • > Who needs arguments when you can dismiss Turing with a “yeah but it’s not real thinking tho”?

    It seems much less far fetched than what the "agi by 2027" crowd believes lol, and there actually are more arguments going that way

    • In the great battle of minds between Turing, Minsky, and Hofstadter vs. Marcus, Zitron, and Dreyus, I'm siding with the former every time -- even if we also have some bloggers on our side. Just because that report is fucking terrifying+shocking doesn't mean it can be dismissed out of hand.

      1 reply →