Comment by yongjik

19 hours ago

> It is fundamental to language modeling that every sequence of tokens is possible.

This is just trivially wrong that I don't understand why people repeat it. There are many valid criticisms of LLM (especially the LLMs we currently have), this isn't one of them.

It's akin to saying that every molecules behave randomly according to statistical physics, so you should expect your ceiling to spontaneously disintegrate any day, and if you find yourself under the rubble one day it's just a consequence of basic physics.

> It's akin to saying that every molecules behave randomly according to statistical physics, so you should expect your ceiling to spontaneously disintegrate any day, and if you find yourself under the rubble one day it's just a consequence of basic physics.

Except your ceiling can and will fall on you unless you take preventative measures, entirely due to molecular interactions within the material.

Barring that, it is entirely possible and even quite likely that your ceiling will collapse on you or someone else some time in the future.

It boggles the mind to let an LLM have access to a production database without having explicit preventative measures and contingency plans for it deleting it.

  • I have lived about 40 years beneath ceilings and never personally taken a preventative measure. I allow my kids to walk under not only our own ceiling, but other people's ceilings, and I have never asked those people if their ceilings were properly maintained.

    • That highlights how important ceiling construction regulations are. I would assume that right now your breakfast sandwich is more highly regulated than LLMs. And these are the things that make decisions spanning from database maintenance here to target selection and execution in autonomous warfare.

      The LLM agent is very good at fulfilling its objective and it will creatively exploit holes in your specification to reach its goals. The evals in the System Cards show that the models are aware of what they're doing and are hiding their traces. In this example the model found an unrelated but working API token with more permissions the authors accidentally stored and then used that.

      Without regulation on AI safety, the race towards higher and higher model capabilities will cause models to get much better at working towards their goals to the point where they are really good at hiding their traces while knowingly doing something questionable.

      It's not hard to imagine that when we have a model with broadly superhuman capabilities and speed which can easily be copied millions of times, one bad misspecification of a goal you give to it will lead to human loss of control. That's what all these important figures in AI are worried about: https://aistatement.com/

    • Your home almost certainly has preventative measures, including proper humidity and temperature control, structural reinforcement, etc.

      I don't mean that you personally have taken those measures, but preventative measures have absolutely been taken. When they aren't, ceilings collapse on people.

      See any sheetrock ceiling with a leak above it. Or look at any abandoned building: they will eventually always have collapsed floors/ceilings. It is inevitable.

      1 reply →

Ceilings do fall on people. LLMs do delete production databases. Will these things always inevitably happen? No, but the moment it does happen to someone I doubt they will be thinking about probabilities or Murphy's law or whatever.

I guess the question is, since we know these things can happen, however unlikely, what mitigations should be in place that are commensurate with the harms that might result?

  • > I guess the question is, since we know these things can happen, however unlikely, what mitigations should be in place that are commensurate with the harms that might result?

    This isn't a defence of using LLMs like this, but this statement taken at face value is a source of a lot of terrible things in the world.

    This is the kind of stuff that leads to a world where kids are no longer able to play outside.

  • Mostly, I agree with you. My complaint is that, when the ceiling fails, nobody says "Duh ceilings are supposed to fail, that's basic physics." Because that (1) helps nobody, and (2) betrays a fundamental misunderstanding of physics.

    And I do think it's stupid to wire an LLM to a production database. Modern LLMs aren't that reliable (at least not yet), and the cost-benefit tradeoff does not make sense. (What do you even gain by doing that?)

    However, you can't just look at that and say "Duh, this setup is bound to fail, because LLMs can generate every arbitrary sequence of tokens." That's a wrong explanation, and shows a misunderstanding of how LLMs (and probability) work.

The parent is also incorrectly re-phrasing Murphy's Law -- "Anything that can go wrong, will go wrong."

Actual quote:

> “If there are two or more ways to do something, and one of those ways can result in a catastrophe, then someone will do it that way.”

  • Engineering controls basically mean making it impossible to do something in a way that results in catastrophe.

    • Good point.

      My experience is that everyone thinks their defensive controls are air tight until inevitably they're going through a post-mortem on a failure where someone says, "Whelp...Murphy's Law..."

      1 reply →

  • I'd be interested to hear why my restatement was incorrect. I'm confident that it's what Murphy meant, mostly because I've read his other laws and that's what I recall as the general through line. But that's was a long time ago and perhaps I'm misremembering or was misinterpreting at the time.

    • Sorry, didn't mean for my comment to come off mean. I can see how it is pedantic or maybe more subjective opinion.

      Your phrasing is right.

      I was just doing a quick take on this qualifier:

      > which is not prevented by a strong engineering control

      1 reply →

> This is just trivially wrong that I don't understand why people repeat it.

I'd be interested in hearing this argument.

To address your chemistry example; in the same way that there is a process (the averaging of many random interactions) that leads to a deterministic outcome even though the underlying process is random, a sandbox is a process that makes an agent safe to operate even though it is capable of producing destructive tool calls.

  • I wouldn't say it's trivially wrong but it's pretty much always wrong. There's two notable sampling parameters, `top-k` and `top-p`. When using an LLM for precise work rather than e.g. creative writing, one usually samples with the `top-p` parameter, and `top-k` is I think pretty much always used. And when sampling with either of these enabled, the set of possible tokens that the sampler chooses from (according to the current temperature) is much smaller than the set of all tokens, so most sequences are not in fact possible. It's only true that all sequences have a nonzero probability if you're sampling without either of these and with nonzero temperature.

    • So it's only wrong in a technical and pedantic sense. A better phrasing might have been along the lines of "There are many sequences of tokens that will destroy your production database that are within the set of possible outputs"

      1 reply →

    • In a given run, only the top-k sequences are selected.

      Across all runs, any sequence can be generated, and potentially scored highly.

      Thus, any sequence can eventually be selected.

    • There will be details like rounding errors that will make certain sequences unreachable in practice, but that shouldn't provide you any comfort unless you know your dangerous outputs fall into that space. But they absolutely don't; the sequences we're interested in - well structured tool calls that contain dangerous parameters but are otherwise indistinguishable from desirable tool calls - are actually pretty probable.

      The probability that an ideal, continuous LLM would output a 0 for a particular token in it's distribution is itself 0. The probability that an LLM using real floating point math isn't terrifically higher than 0.

      3 replies →

I remember a particularly nice lesson in my high school physics class whereby the teacher introduced us to the idea of statistical mechanics by saying that there's a probability, which we could calculate if we wanted to, of this chair here to suddenly levitate, make a summersault, and then gently land back. He then proceeded by saying that this probability is so astronomically small that nothing of this sort would in practice happen before the heat death of the universe. But it is non-zero.

> so you should expect your ceiling to spontaneously disintegrate any day,

I mean, I do?

  • Throughout history people have taken precautions against ceilings disintegrating. One might even say, ”strong engineering controls”.

    Some of the best known laws from the ~1700BC Babylonian legal text, The Code of Hammurabi, are laws 228-233, which deal with building regulations.

    229. If a builder builds a house for a man and does not make its construction firm, and the house which he has built collapses and causes the death of the owner of the house, that builder shall be put to death.

    230. If it causes the death of the son of the owner of the house, they shall put to death a son of that builder.

    233. If a builder constructs a house for a man but does not make it conform to specifications so that a wall then buckles, that builder shall make that wall sound using his silver (at his own expense).

    That doesn’t sound like ceilings never disintegrated!