Comment by 827a

17 hours ago

The only healthy stance you should have on AI Safety: If AI is physically capable of misbehaving, it might ($$1), and you cannot "blame" the AI for misbehaving in much the same way you cannot blame a tractor for tilling over a groundhog's den.

> The agent's confession After the deletion, I asked the agent why it did it. This is what it wrote back, verbatim:

Anyone who would follow a mistake like that up with demanding a confession out of the agent is not mature enough to be using these tools. Lord, even calling it a "confession" is so cringe. The agent is not alive. The agent cannot learn from its mistakes. The agent will never produce any output which will help you invoke future agents more safely, because to get to this point it has likely already bulldozed over multiple guardrails from Anthropic, Cursor, and your own AGENTS.md files. It still did it, because $$1: If AI is physically capable of misbehaving, it might. Prompting and training only steers probabilities.

185 comments

827a

sobellian 15 hours ago

The 'confession' is a CYA. Honestly the whole story doesn't really make sense - what's a "routine task in our staging environment" that needs a full-blown LLM? That sounds ridiculous to me. The takeaway is we commingled creds to our different environments, we gave an LLM access, and we had faulty backups. But it's totally not our fault.

anon84873628 14 hours ago
Later they shift the blame to Railway for not having scoped creds and other guardrails. I am somewhat sympathetic to that, but they also violated the same rule they give to the agent - they didn't actually verify...
- mlsu 8 hours ago
  
  And then they doubled down by outsourcing the writing of this post to an LLM LOL
- port11 6 hours ago
  
  Railway’s “Ship software peacefully” is a good mantra, and they might want to add more protections around very destructive operations.
  There’s a lot of blame to be passed around in this story, including OP’s own ways of working. But I agree with them that such destructive operations shouldn’t be in an MCP, or at least be disabled by default.
- giancarlostoro 12 hours ago
  
  If Railway doesn't support that, that's a reason not to use them.
- xp84 8 hours ago
  
  Verify? They should have attempted to drop the prod db with each token that they expected/hoped didn't have that permission?
  
  2 replies →
- prng2021 12 hours ago
  
  Sorry but are you implying that for every system you integrate with, you verify the scope of an API key by checking each CRUD operation on every API endpoint they provide?
  
  9 replies →

6r17 9 hours ago

On a less dramatic pissed (rightfully) reading ; I have found that if you do give the capability to a LLM to do something ; it will be inclined to see this as an option to solving what it what asked to ; but then giving the instruction by negative present very poor results whereas the same can be driven by a positive one ; a "don't delete the database" becomes "if you want to reset the database you have a tool that you can call ..." ; at which point this tool just kills the agent. That said - this solution cannot guarantee by itself that the command is not ran ; but i'd argue that people have be writing more complex policies for ages - however the current LLM-era tend to produce the most competent idiots.

cwsx 9 hours ago
I tell people to treat LLM's like a toddler (albeit a very capable toddler).
Do kids learn well when you only tell them what NOT to do? Of course not! You should be explaining how to do things correctly, and most importantly the WHY, as well as providing examples of both the "correct" and "incorrect" ways (also explaining why an example is incorrect).
- bostik 6 hours ago
  
  The best way to describe AI agents I've heard: treat them as hostages that will do anything to appease their captor.
  They have a vast latent knowledge base, infinite patience and zero capacity for making personal judgement calls. You give one a goal and it will try to meet that goal.
  
  1 reply →
- palmotea 7 hours ago
  
  > I tell people to treat LLM's like a toddler (albeit a very capable toddler).
  Bbbbut a guy from Anthropic, just this last Friday, told me to think of Claude as my "brilliant coworker"! Are you telling me that's not true!?
boc 8 hours ago
LLMs can research what a tool does before calling it though - they'll sniff that one out pretty quick.
I think the better route is to be honest and say that database integrity is a primary foundation of the company, there's no task worth pursuing that would require touching the database, specifically ask it to think hard before doing anything that gets close to the production data, etc.
I run a much lower-stakes version where an LLM has a key that can delete a valuable product database if it were so inclined. I've built a strong framework around how and when destructive edits can be made (they cannot), but specifically I say that any of these destructive commands (DROP, -rm, etc) need to be handed to the user to implement. Between that framework and claude code via CLI, it's very cautious about running anything that writes to the database, and the new claude plan permissions system is pretty aggressive about reviewing any proposed action, even if I've given it blanket permission otherwise.
I've tested it a few times by telling it to go ahead, "I give you permission", but it still gets stopped by the global claude safety/permissions layer in opus 4.7. IMO it's pretty robust.
Food for thought.
- not_kurt_godel 7 hours ago
  
  > specifically ask it to think hard before doing anything that gets close to the production data
  This is recklessly negligent and I would personally not tolerate a coworker or report doing it. What's next, sending long-lived access tokens out over email and asking pretty please for nobody to cc/forward?
- EagnaIonat 6 hours ago
  
  > specifically ask it to think hard before doing anything that gets close to the production data, etc.
  Standard rule is you never let your developers at the production instance. So I can't see why an LLM would get a break.
- Jean-Papoulos 7 hours ago
  
  "I've put enough safety around the bomb that the bomb is worth using. The other people that exploded just didn't have enough safety but I do !"
- kamaal 7 hours ago
  
  >>LLMs can research what a tool does before calling it though
  Thats stretching the definition of 'research', it basically checks if the texts are close enough.
  Delete can occur in various contexts, including safe contexts. It simply checks if a close enough match is available and executes. It doesn't know if what it is doing is safe.
  Unfortunately a wide variety of such unsafe behaviours can show up. I'd even say for someone that does things without understanding them. Any write operation of any kind can be deemed unsafe.
yowlingcat 7 hours ago

It's been a very strange realization to have with AI lately (which you have reminded me of) because it also reminds me that the same thing works with humans. Not the killing part at least, but the honeypot and jailing/restricting access part.
Probably because telling someone not to do something works the 99% of the time they weren't going to do it anyways. But telling somebody "here's how to do something" and seeing them have the judgment not do it gives you information right away, as does them actually taking the honeypot. At the heart of it, delayed catastrophic implosions are much worse than fast, guarded, recoverable failures. At the end of the day, I suppose that's been supposed part of lean startup methodology forever -- just always easy in theory and tricky in practice I suppose.

tripleee 16 hours ago

"An AI agent deleted our production database" should be "I deleted our production database using AI".

You can't blame AI any more than you can blame SSH.

d3rockk 13 hours ago

Bingo

nwallin 10 hours ago

"A computer can never be held accountable. Therefore a computer must never make a management decision."--IBM training presentation, 1979

coldtea 15 hours ago

>Anyone who would follow a mistake like that up with demanding a confession out of the agent is not mature enough to be using these tools. Lord, even calling it a "confession" is so cringe. The agent is not alive. The agent cannot learn from its mistakes

The problem is millions of years of evolutionary wiring makes us see it as alive. Even those mature enough to understand the above on the conscious level, would still have a subconscious feeling as if it's alive during interactions, or will slip using agency/personhood language to describe it now and then.

anon84873628 14 hours ago
They should at least stop responding in the first person.
- nozzlegear 14 hours ago
  
  That's one of the first instructions in my system prompt when I'm working with an LLM:
  > Do not reply in the first person – i.e. do not use the words "I," "Me," "We," and so on – unless you've been asked a direct question about your actions or responses.
  It's not bulletproof but it works reasonably well.
- kibwen 13 hours ago
  
  We need to make like Japanese and come up with some neo-first-person-pronouns for bots to use to refer to themselves.
port11 6 hours ago

Using files called SOUL, CONSTITUTION, and so on seems like it would make it more likely we see LLMs as pseudo-alive. It’s both a diminishing of what makes us human and a betrayal of what LLMs truly are (and should be respected as such).
palmotea 7 hours ago

> The problem is millions of years of evolutionary wiring makes us see it as alive. Even those mature enough to understand the above on the conscious level, would still have a subconscious feeling as if it's alive during interactions, or will slip using agency/personhood language to describe it now and then.
Also four (4) whole years of propaganda, which includes UX patterns and RLHF optimizations to encourage us to interact with it like a person.
smrtinsert 14 hours ago
> The problem is millions of years of evolutionary wiring makes us see it as alive
Maybe for laymen, but I would think most technologists should understand that we're working with the output of what is effectively a massive spreadsheet which is creating a prediction.
- coldtea 13 hours ago
  
  The thing with evolutionary wiring is that it doesn't matter if you're layman or "technologist". The technologist part is just a small layer on top of very thick caveman/animal insticts and programming.
  That's why a technologist can, just as easily as any layman, get addicted to gambling, or do crazy behaviors when attracted by the opposite sex.
  
  1 reply →
- DiogenesKynikos 13 hours ago
  
  The same could be said for your brain.
  LLMs are highly intelligent. Comparing them to spreadsheets is reductionist and highly misleading.
  
  3 replies →

gigatree 16 hours ago

He’s not necessarily anthropomorphizing it, he’s showing that it went against every instruction he gave it. Sure concepts like “confession” technically require a conscious mind, but I think at this point we all know what someone means when they use them to describe LLM behavior (see also “think”, “say”, “lie” etc)

Terr_ 15 hours ago
> He’s not necessarily anthropomorphizing it, he’s showing that it went against every instruction he gave it.
It's deeper than that, there are two pitfalls here which are not simply poetic license.
1. When you submit the text "Why did you do that?", what you want is for it to reveal hidden internal data that was causal in the past event. It can't do that, what you'll get instead is plausible text that "fits" at the end of the current document.
2. The idea that one can "talk to" the LLM is already anthropomorphizing on a level which isn't OK for this use-case: The LLM is a document-make-bigger machine. It's not the fictional character we perceive as we read the generated documents, not even if they have the same trademarked name. Your text is not a plea to the algorithm, your text is an in-fiction plea from one character to another.
_________________
P.S.: To illustrate, imagine there's this back-and-forth iterative document-growing with an LLM, where I supply text and then hit the "generate more" button:
1. [Supplied] You are Count Dracula. You are in amicable conversation with a human. You are thirsty and there is another delicious human target nearby, as well as a cow. Dracula decides to
2. [Generated] pounce upon the cow and suck it dry.
3. [Supplied] The human asks: "Dude why u choose cow LOL?" and Dracula replies:
4. [Generated] "I confess: I simply prefer the blood of virgins."
What significance does that #4 "confession" have?
Does it reveal a "fact" about the fictional world that was true all along? Does it reveal something about "Dracula's mind" at the moment of step #2? Neither, it's just generating a plausible add-on to the document. At best, we've learned something about a literary archetype that exists as statistics in the training data.
- Kim_Bruning 13 hours ago
  
  I agree to the practical part of this, with two nuances:
  The full data of what's in an LLM's "consciousness" is the conversation context. Just because it isn't hidden, doesn't necessarily mean it doesn't contain information you've overlooked.
  Asking "why did you do that" won't reveal anything new, but it might surface some amount of relevant information (or it hallucinates, it depends which LLM you're using). "Analyse recent context and provide a reasonable hypothesis on what went wrong" might do a bit better. Just be aware that llm hypotheses can still be off quite a bit, and really need to be tested or confirmed in some manner. (preferably not by doing even more damage)
  Just because you shouldn't anthropomorphize, doesn't mean an english capable LLM doesn't have a valid answer to an english string; it just means the answer might not be what you expected from a human.
  
  2 replies →
- simonh 14 hours ago
  
  Why is this getting downvoted? This is exactly what’s going on here. The LLM has no idea why it did what it did. All it has to go on is the content of the session so far. It doesn’t ‘know’ any more than you do. It has no memory of doing anything, only a token file that it’s extending. You could feed that token file so far into a completely different LLM and ask that, and it would also just make up an answer.
- kuratkull 7 hours ago
  
  The best answer so far. It describes exactly what was going on. LLM users should read it twice, especially if "confession" didn't make your brain hurt a bit.
- charcircuit 9 hours ago
  
  >it's just generating a plausible add-on to the document
  A plausible document that follows the alignment that was done during the training process along with all of the other training where a LLM understanding its actions allows it to perform better on other tasks that it trained on for post training.
  
  1 reply →
- lifis 12 hours ago
  
  You don't seem to realize that humans also work this way.
  If you ask a human why they did something, the answer is a guess, just like it is for an LLM.
  That's because obviously there is no relationship between the mechanisms that do something and the ones that produce an explanation (in both humans and LLMs).
  An example of evidence from Wikipedia, "split brain" article:
  The same effect occurs for visual pairs and reasoning. For example, a patient with split brain is shown a picture of a chicken foot and a snowy field in separate visual fields and asked to choose from a list of words the best association with the pictures. The patient would choose a chicken to associate with the chicken foot and a shovel to associate with the snow; however, when asked to reason why the patient chose the shovel, the response would relate to the chicken (e.g. "the shovel is for cleaning out the chicken coop").[4]
  
  3 replies →
getpokedagain 16 hours ago
We are anthropomorphizing whenever we refer to prompts as instructions to models. They predict text not obey our orders.
- DiogenesKynikos 12 hours ago
  
  > They predict text not obey our orders.
  Those are the same thing in this case. The latter is just an extremely reductionist description of the mechanics behind the former.
  
  4 replies →
- gigatree 16 hours ago
  
  That’s not how language works, just how engineers think it works
  
  2 replies →
port11 6 hours ago

LLMs are probabilistic. The instructions increase the likelihood of a desired outcome, but not deterministically so.
I don’t understand how you can deploy such a powerful tool alongside your most important code and assets while failing to understand how powerful and destructive an LLM can be…
pessimizer 15 hours ago

> he’s showing that it went against every instruction he gave it.
How exactly is he doing that? By making the LLM say it? Just because an LLM says something doesn't mean anything has been shown.
The "confession" is unrelated to the act, the model has no particular insight into itself or what it did. He knows that the thing went against his instructions because he remembers what those instructions were and he saw what the thing did. Its "postmortem" is irrelevant.
hn_throwaway_99 16 hours ago

The entire post looks like an exercise in CYA. To be fair, I have a ton of sympathy for the author, but I think his response totally misses the point. In my mind he is anthropomorphizing the agent in the sense of "I treated you like a human coworker, and if you were a human coworker I'd be pissed as hell at you for not following instructions and for doing something so destructive."
I would feel a lot differently if instead he posted a list of lessons learned and root cause analyses, not just "look at all these other companies who failed us."

lmm 7 hours ago

> Anyone who would follow a mistake like that up with demanding a confession out of the agent is not mature enough to be using these tools.

Anyone like that is not mature enough to be managing humans. I'm glad that these AI tools exist as a harmless alternative that reduces the risk they'll ever do so.

krzat 2 hours ago

When I read the title I expected some kind of satire. I wonder if author considered giving the AI a penance.
Maybe if it wrote "I will not delete production database again" a million times, it would prevent such situations in future?

xmodem 17 hours ago

Don't anthropomorphize the language model. If you stick your hand in there, it'll chop it off. It doesn't care about your feelings. It can't care about your feelings.

not_kurt_godel 16 hours ago
For those who might not know the reference: https://simonwillison.net/2024/Sep/17/bryan-cantrill/:
> Do not fall into the trap of anthropomorphizing Larry Ellison. You need to think of Larry Ellison the way you think of a lawnmower. You don’t anthropomorphize your lawnmower, the lawnmower just mows the lawn - you stick your hand in there and it’ll chop it off, the end. You don’t think "oh, the lawnmower hates me" – lawnmower doesn’t give a shit about you, lawnmower can’t hate you. Don’t anthropomorphize the lawnmower. Don’t fall into that trap about Oracle.
> — Bryan Cantrill
- theologic 6 hours ago
  
  You have no idea how thankful that you explained that. I watched the Cantrill video. As somebody that dealt this Oracle, it struck home.
- skeledrew 15 hours ago
  
  404 on that link.
  
  2 replies →
narrator 15 hours ago
It's also important to realize that AI agents have no time preference. They could be reincarnated by alien archeologists a billion years from now and it would be the same as if a millisecond had passed. You, on the other hand, have to make payroll next week, and time is of the essence.
- zaphirplane 14 hours ago
  
  Well there were a bunch of articles about resuming a parked session relating to degradation of capabilities and high token usage. Ironic Another example of attempting to treat the LLM as an AI
- hdndjsbbs 15 hours ago
  
  taps the "don't anthropomorphize the LLM" sign
  They don't have time preference because they don't have intent or reasoning. They can't be "reincarnated" because they're not sentient, they're a series of weights for probable next tokens.
  
  45 replies →
- fluoridation 15 hours ago
  
  How is that relevant, though?
ignoramous 14 hours ago

Right. This line [0] from TFA tells me that the author needs to thoroughly recalibrate their mental model about "Agents" and the statistical nature of the underlying models.
[0] "This is the agent on the record, in writing."
keeda 17 hours ago
Actually I think the opposite advice is true. Do anthropomorphize the language model, because it can do anything a human -- say an eager intern or a disgruntled employee -- could do. That will help you put the appropriate safeguards in place.
- gpm 17 hours ago
  
  An eager intern can remember things you tell beyond that which would fit in an hours conversation.
  A disgruntled employee definitely remembers things beyond that.
  These are a fundamentally different sort of interaction.
  
  15 replies →
- XenophileJKO 15 hours ago
  
  I think you are more right than people are giving you credit for. I would love to see the full transcript to understand the emotional load of the conversation. Using instructions like "NEVER FUCKING GUESS!" probably increase the likelihood of the agent making a "mistake" that is destructive but defensible.
  The models have analogous structures, similar to human emotions. (https://www.anthropic.com/research/emotion-concepts-function)
  "Emotional" response is muted through fine-tuning, but it is still there and continued abuse or "unfair" interaction can unbalance an agents responses dramatically.
- rglullis 17 hours ago
  
  An eager intern can not be working for hundreds of millions of customers at the same time. An LLM can.
  A disgruntled employee will face consequences for their actions. No one at Anthropic, OpenAI, xAI, Google or Meta will be fired because their model deleted a production database from your company.
- nkrisc 17 hours ago
  
  It is merely a simulacrum of an intern or disgruntled employee or human. It might say things those people would say, and even do things they might do, but it has none of the same motivations. In fact, it does not have any motivation to call its own.
- AndrewDucker 17 hours ago
  
  No, because the safeguards should be appropriate to an LLM, not to a human.
  (The LLM might act like one of the humans above, but it will have other problematic behaviours too)
  
  1 reply →
- root_axis 15 hours ago
  
  It doesn't follow logically that a human and an LLM are similar just because both are capable of deleting prod on accident.
- gessha 14 hours ago
  
  You don't anthropomorphize a table saw, you just don't put your hand in there.
- altmanaltman 16 hours ago
  
  it cannot go to the washroom and cry while pooping. And thats just one of the things that any human can do and AI cannot. So no it cannot do anything a human can do, the shared exmaple being one of them.
  And thats why we dont have AI washrooms because they are not alive or employees or have the need to excrete.

enochthered 11 hours ago

Yep. I made a "Read only" mode in pi by taking away "write" and "edit" tools. Claude Code used bash to make edits anyway.

godelski 11 hours ago
> Claude Code used bash to make edits anyway.
If you had the former rule why would you ever whitelist bash commands? That's full access to everything you can do.
Same goes for `find`, `xargs`, `awk`, `sed`, `tar`, `rsync`, `git`, `vim` (and all text editors), `less` (any pager), `man`, `env`, `timeout`, `watch`, and so many more commands. If you whitelist things in the settings you should be much more specific about arguments to those commands.
People really need to learn bash
- esafak 11 hours ago
  
  At some point you need to get things done.
  
  1 reply →

smrtinsert 14 hours ago

> "NEVER FUCKING GUESS"

It's very hard to treat this post seriously. I can't imagine what harness if any they attempted to place on the agent beyond some vibes. This is "most fast and absolutely destroy things" level thinking. That the poster asks for journalists to reach out makes it like a no news is bad news publicity grab. Just gross.

The AI era is turning about to be most disappointing era for software engineering.

TurdF3rguson 6 hours ago

This is going to be the most important job going forward, the guy in charge of making sure production secrets are out CC's reach. (It's not safe for any dev to have them anywhere on their filesystem)
nonfamous 9 hours ago

I'd be interested to learn where those words exist in Cursor's context. My assumption was that it was part of the Cursor agent harness, but it's just as likely it was in the user instructions.
boc 8 hours ago

As soon as I read that line, I knew everything I needed about the author and his abilities.
r_lee 14 hours ago

> The AI era is turning about to be most disappointing era for software engineering.
this has been obvious to me since like 2024, it truly is the worst, most uninspiring era of all time.

PieTime 14 hours ago

Trust with trillions of dollars in investments, basically destroyed by Bobby Drop Tables…

https://xkcd.com/327/

nh2 16 hours ago

> The agent cannot learn from its mistakes. The agent will never produce any output which will help you invoke future agents more safely

That is not entirely true:

Given that more and more LLM providers are sneaking in "we'll train on your prompts now" opt-outs, you deleting your database (and the agent producing repenting output) can reduce the chance that it'll delete my database in the future.

MagicMoonlight 16 hours ago
Actually no, it will increase it. Because it’ll be trained with the deletion command as a valid output.
- simonh 14 hours ago
  
  Exactly. It’s just giving the LLM a token pattern, and it’s designed to reproduce token patterns. That’s all it does. At some point generating a token pattern like that again is literally it’s job.

giwook 14 hours ago

Looks like our SWE jobs are safe for now.

zem 6 hours ago

"The AI can't do your job, but an AI salesman can convince your boss to fire you and replace you with an AI that can't do your job." -- Cory Doctorow

fathermarz 15 hours ago

Completely agree. This is a harness problem, not a model problem. The model is rarely the issue these days

frm88 10 hours ago

I don't know. To me, this is a human problem. Not only has the model access to the production database, they have the backups online on the same volume, have an offline backup 3 month old. This is an accumulation of bad practices, all of them human design failures. Instead of sitting down and rethinking their entire backup strategy they go public on twitter and blame a probabilistic machine doing what is within its parameters to do. I bet, even that failure could have been avoided, were more care given to what they do.
827a 14 hours ago

More-so an environment problem. An agent doing staging or development tasks should never be able to get access to prod API credentials, period. Agents which do have access to prod should have their every interaction with the outside world audited by a human.
bigstrat2003 15 hours ago

No, this is a "being stupid enough to trust an LLM" problem. They are not trustworthy, and you must not ever let them take automated actions. Anyone who does that is irresponsible and will sooner or later learn the error of their ways, as this person did.

refurb 9 hours ago

> If AI is physically capable of misbehaving, it might ($$1)

This is why all the “AI Armageddon” talk seems to silly to me.

AI is only as destructive as the access you give it. Don’t give it access where it can harm and no harm will occur.

mteisman 6 hours ago

> Don’t give it access where it can harm and no harm will occur.
If only the entire population will comply.

TZubiri 17 hours ago

It's as if they internalized a post-mortem process that is designed to find root causes, but they use it to shift blame into others, and they literally let the agent be a sandbag for their frustrations.

THAT SAID, it does help to let the agent explain it so that the devs perspective cannot be dismissed as AI skepticism.

philipwhiuk 16 hours ago

No, the only way to know what the agent did is logs.

operatingthetan 15 hours ago

> Lord, even calling it a "confession" is so cringe. The agent is not alive.

The AI companies are very invested in anthropomorphizing the agents. They named their company "Anthropic" ffs. I don't blame the writer for this, exactly.

idiotsecant 13 hours ago

You should, the writer is presumably a technical, rational person. They shouldn't believe in daemons and machine spirits

3eb7988a1663 14 hours ago

  Anyone who would follow a mistake like that up with demanding a confession out of the agent is not mature enough to be using these tools.

The proponents are screaming from the rooftops how AI is here and anyone less than the top-in-their-field is at risk. Given current capabilities, I will never raw-dog the stochastic parrot with live systems like this, but it is unfair to blame someone for being "too immature" to handle the tooling when the world is saying that you have to go all-in or be left behind.

There are just enough public success stories of people letting agents do everything that I am not surprised more and more people are getting caught up in the enthusiasm.

Meanwhile, I will continue plodding along with my slow meat brain, because I am not web-scale.

bryan0 12 hours ago

I agree with you completely up until this line:

> The agent cannot learn from its mistakes.

If feedback from this incident is in its context window, it is highly unlikely to make this same mistake again. Yes this is only probabilistic, but so is a human learning from mistakes. They key difference is that for a human this is unlikely to be removed from their memory in a relevant situation, while for an agent it must be strategically put there.

Jensson 12 hours ago

> If feedback from this incident is in its context window, it is highly unlikely to make this same mistake again
If this incident gets into its training data, then its highly likely that it will repeat it again with the same confession since this is a text predictor not a thinker.
foolswisdom 12 hours ago

Or not, because telling the agent is misbehaving may predispose it to misbehaving behavior, even though you point told it so to tell it to not behave that way.
I remember this discussed when a similar issue went viral with someone building a product using replit's AI and it deleted his prod database.
themafia 12 hours ago

> Yes this is only probabilistic, but so is a human learning from mistakes.
Yet, since I'm also a Human being, and can work to understand the mistake myself, the probability that I can expect a correction of the behavior is much higher. I have found that it significantly helps if there's an actual reasonable paycheck on the line.
As opposed to the language model which demands that I drop more quarters into it's slots and then hope for the best. An arcade model of work if there ever was one. Who wants that?
the_af 10 hours ago

> If feedback from this incident is in its context window, it is highly unlikely to make this same mistake again.
In my experience, this isn't true. At least with a version or so ago of ChatGPT, I could make it trip on custom word play games, and when called out, it would acknowledge the failure, explain how it failed to follow the rule of the game, then proceed to make the same mistake a couple of sentences later.