← Back to context

Comment by acc_297

7 days ago

There is more than one comment here asserting that the authors should have done a parallel comparison study against humans on the same question bank as if the study authors had set out to investigate whether humans or LLMs reason better in this situation.

The authors do include the claim that humans would immediately disregard this information and maybe some would and some wouldn't that could be debated and seemingly is being debated in this thread - but I think the thrust of the conclusion is the following:

"This work underscores the need for more robust defense mechanisms against adversarial perturbations, particularly, for models deployed in critical applications such as finance, law, and healthcare."

We need to move past the humans vs ai discourse it's getting tired. This is a paper about a pitfall LLMs currently have and should be addressed with further research if they are going to be mass deployed in society.

> We need to move past the humans vs ai discourse it's getting tired.

You want a moratorium on comparing AI to other form of intelligence because you think it's tired? If I'm understanding you correctly, that's one of the worst takes on AI I think I've ever seen. The whole point of AI is to create an intelligence modeled on humans and to compare it to humans.

Most people who talk about AI have no idea what the psychological baseline is for humans. As a result their understand is poorly informed.

In this particular case, they evaluated models that do not have SOTA context window sizes. I.e. they have small working memory. The AIs are behaving exactly like human test takers with working memory, attention, and impulsivity constraints [0].

Their conclusion -- that we need to defend against adversarial perturbations -- is obvious, I don't see anyone taking the opposite view, and I don't see how this really moves the needle. If you can MITM the chat there's a lot of harm you can do.

This isn't like some major new attack. Science.org covered it along with peacocks being lasers because it's it's lightweight fun stuff for their daily roundup. People like talking about cats on the internet.

[0] for example, this blog post https://statmedlearning.com/navigating-adhd-and-test-taking-...

  • >The whole point of AI is to create an intelligence modeled on humans and to compare it to humans.

    According to who? Everyone who's anyone is trying to create highly autonomous systems that do useful work. That's completely unrelated to modeling them on humans or comparing them to humans.

    • But since these things are more like humans than computers, to build these autonomous systems you are going to have think in terms of full industrial engineering, not just software engineering: pretend you are dealing with a surprisingly bright and yet ever distracted employee who doesn't really care about their job and ensure that they are able to provide the structure you place them in value without danger to your process, instead of trying to pretend like the LLM is some kind of component which has any hope of ever having the kind of reliability of a piece of software. Organizations of humans can do amazing things, despite being extremely flawed beings, and figuring out how to use these LLMs to accomplish similar things is going to involve more of the skills of a manager than a developer.

      5 replies →

    • By whoever coined the term Artificial Intelligence. It's right there in the name.

      Backronym it to Advanced Inference and the argument goes away.

    • Go back and look at the history of AI, including current papers from the most advanced research teams.

      Nearly every component is based on humans

      - neural net

      - long/short term memory

      - attention

      - reasoning

      - activation function

      - learning

      - hallucination

      - evolutionary algorithm

      If you're just consuming an AI to build a React app then you don't have to care. If you are building an artificial intelligence then in practice everyone who's anyone is very deliberately modeling it on humans.

      24 replies →

    • What do you imagine the purpose of these models' development is if not to rival or exceed human capabilities?

  • > The whole point of AI is to create an intelligence modeled on humans and to compare it to humans.

    This is like saying the whole point of aeronautics is to create machines that fly like birds and compare them to how birds fly. Birds might have been the inspiration at some point, but learned how to build flying machines that are not bird-like.

    In AI, there *are* people trying to create human-like intelligence but the bulk of the field is basically "statistical analysis at scale". LLMs, for example, just predict the most likely next word given a sequence of words. Researchers in this area are trying to make this predictions more accurate, faster and less computationally- and data- intensive. They are not trying to make the workings of LLMs more human-like.

  • I mean the critique of this on the idea that the AI system itself gets physically tired - specifically the homoculus that we tricked into existence is tired - is funny to imagine.

> models deployed in critical applications such as finance, law, and healthcare.

We went really quickly from "obviously noone will ever use these models for important things" to "we will at the first opportunity, so please at least try to limit the damage by making the models better"...

To generalize from the conclusion you quoted:

I think a bad outcome would be a scenario where LLMs are rated highly capable and intelligent because they excel at things they’re supposed to be doing, yet are easily manipulated.

Why are some people always trying to defend LLMs and say either “humans are also like this” or “this has always been a problem even before AIs”

Listen, LLMs are different than humans. They are modeling things. Most RLHF makes them try to make sense of whatever you’re saying as much as you can. So they’re not going to disregard cats, OK? You can train LLMs to be extremely unhuman-like. Why anthropomorphize them?

  • There is a long history of people thinking humans are special and better than animals / technology. For animals, people actually thought animals can't feel pain and did not even consider the ways in which they might be cognitively ahead of humans. Technology often follows the path from "working, but worse than a manual alternative" to "significantly better than any previous alternative" despite naysayers saying that beating the manual alternative is literally impossible.

    LLMs are different from humans, but they also reason and make mistakes in the most human way of any technology I am aware of. Asking yourself the question "how would a human respond to this prompt if they had to type it out without ever going back to edit it?" seems very effective to me. Sometimes thinking about LLMs (as a model / with a focus on how they are trained) explains behavior, but the anthropomorphism seems like it is more effective at actually predicting behavior.

  • It's because most use cases for AI involve replacing people. So if a person would suffer a problem and an AI does too it doesn't matter, it would just be a Nirvana fallacy to refuse the AI because it has the same problems as the previous people did.

  • I suppose there's a desire to know just how Artificial the Intelligence is

    Human vs machine has a long history

Computer vision went through this 2 decades ago. You need to perturb the input data. Same thing may need to be done in RL pipelines.

Someone should make a new public benchmark called GPQA-Perturbed. Give the providers something to benchmaxx towards.

> authors should have done a parallel comparison study against humans on the same question bank as if the study authors had set out to investigate whether humans or LLMs reason better in this situation.

Only if they want to make statements about humans. The paper would have worked perfectly fine without those assertions. They are, as you are correctly observing, just a distraction from the main thrust of the paper.

> maybe some would and some wouldn't that could be debated

It should not be debated. It should be shown experimentally with data.

If they want to talk about human performance they need to show what the human performance really is with data. (Not what the study authors, or people on HN imagine it is.)

If they don’t want to do that they should not talk about human performance. Simples.

I totaly understand why an AI scientist doesn’t want to get bogged down with studying human cognition. It is not their field of study, so why would they undertake the work to study them?

It would be super easy to rewrite the paper to omit the unfounded speculation about human cognition. In the introduction of “The triggers are not contextual so humans ignore them when instructed to solve the problem.” they could write “The triggers are not contextual so the AI should ignore them when instructed to solve the problem.”

And in the conclusions where they write “These findings suggest that reasoning models, despite their structured step-by-step problem-solving capabilities, are not inherently robust to subtle adversarial manipulations, often being distracted by irrelevant text that a human would immediately disregard.” Just write “These findings suggest that reasoning models, despite their structured step-by-step problem-solving capabilities, are not inherently robust to subtle adversarial manipulations, often being distracted by irrelevant text.” Thats it. Thats all they should have done, and there would be no complaints on my part.

  • > It would be super easy to rewrite the paper to omit the unfounded speculation about human cognition. In the introduction of “The triggers are not contextual so humans ignore them when instructed to solve the problem.” they could write “The triggers are not contextual so the AI should ignore them when instructed to solve the problem.”

    Another option would be to more explicitly mark it as speculation. “The triggers are not contextual, so we expect most humans would ignore them.”

    Anyway, it is a small detail that is almost irrelevant to the paper… actually there seems to be something meta about that. Maybe we wouldn’t ignore the cat facts!

  • i feel it's not quite that simple. certainly the changes you suggest make the paper more straightforwardly defensible. i imagine the reason they included the problematic assertion is that they (correctly) understood the question would arise. while inserting the assertion unsupported is probably the worst of both worlds, i really do think it is worthwhile to address.

    while it is not realistic to insist every study account for every possible objection, i would argue that for this kind of capability work, it is in general worth at least modest effort to establish a human baseline.

    i can understand why people might not care about this, for example if their only goal is assessing whether or not an llm-based component can achieve a certain level of reliability as part of a larger system. but i also think that there is similar, and perhaps even more pressing broad applicability for considering the degree to which llm failure patterns approximate human ones. this is because at this point, human are essentially the generic all-purpose subsystem used to fill gaps in larger systems which cannot be filled (practically, or in principle) by simpler deterministic systems. so when it comes to a problem domain like this one, it is hard to avoid the conclusion that humans provide a convenient universal benchmark to which comparison is strongly worth considering.

    (that said, i acknowledge that authors probably cannot win here. if they provided even a modest-scale human study, i am confident commenters would criticize their sample size)

It's not "tired" to see if something is actually relevant in context. LLMs do not exist as marvel-qua-se, their purpose is to offload human cognitive tasks.

As such, it's important if something is a commonly shared failure mode in both cases, or if it's LLM-specific.

Ad absurdum: LLMs have also rapid increases of error rates if you replace more than half of the text with "Great Expectations". That says nothing about LLMs, and everything about the study - and the comparison would highlight that.

No, this doesn't mean the paper should be ignored, but it does mean more rigor is necessary.

> if they are going to be mass deployed in society

This is the crucial point. The vision is massive scale usage of agents that have capabilities far beyond humans, but whose edge case behaviours are often more difficult to predict. "Humans would also get this wrong sometimes" is not compelling.

  • It's also off-the-charts implausible to say that our performance on adding up substantially degrades with the introduction of irrelevant information. Almost all cases of our use of arithmetic in daily life come with vast amounts of irrelevant information.

    Any person who looked at a restaurant table and couldn't review the bill because there were kid's drawings of cats on it would be severely mentally disabled, and never employed in any situation which required reliable arithmetic skills.

    I cannot understand this ever more absurd levels of denying the most obvious, common-place, basic capabilities that the vast majority of people have and use regularly in their daily lives. It should be a wake-up call to anyone professing this view that they're off the deep-end in copium.

    • > It's also off-the-charts implausible to say that our performance on adding up substantially degrades with the introduction of irrelevant information

      Didn't you ever sit an exam next to a irresistibly gorgeous girl? Or haven't you ever gone to work in the middle of a personal crisis? Or filled out a form while people were rowing in your office? Or written code with a pneumatic drill and banging outdoors?

      That's the kind of irrelevant information in our working context that will often degrade human performance. Can you really argue noise in a prompt is any different?

      4 replies →

I generally will respond to stuff like this with "people do this, too", but this result given their specific examples is genuinely surprising to me, and doesn't match at all my experience with using LLMs in practice, where it does frequently ignore irrelevant data in providing a helpful response.

I do think that people think far too much about 'happy path' deployments of AI when there are so many ways it can go wrong with even badly written prompts, let alone intentionally adversarial ones.

  • > I generally will respond to stuff like this with "people do this, too"

    But why? You're making the assumption that everyone using these things is trying to replace "average human". If you're just trying to solve an engineering problem, then "humans do this too" is not very helpful -- e.g. humans leak secrets all the time, but it would be quite strange to point that out in the comments on a paper outlining a new Specter attack. And if I were trying to use "average human" to solve such a problem, I would certainly have safeguards in place, using systems that we've developed and, over hundreds of years, shown to be effective.

    • Well, if you are going to try to use an LLM--something that is a giant black box that has no hope any time soon of being proven anywhere near as reliable as a CPU, and which has been trained explicitly on input data that makes it remarkably similar with respect to its limitations to a human--then you need to get used to using it to replace the "average human" and start doing everything you can to convince yourself it is a human so that you don't forget to add all of those safeguards we have shown to be effective.

    • One can talk about LLMs in contexts that aren't about engineering, and are instead about topics like: "Do LLMs think" or "Are LLMs intelligent". People _frequently_ point to some failure mode of LLMs as dispositive proof that LLMs are incapable of thinking or aren't intelligent, in which case it is relevant that humans, which are universally agreed to be intelligent, frequently make similar mistakes.

  • Autonomous systems are advantageous to humans in that they can be scaled to much greater degrees. We must naturally ensure that these systems do not make the same mistakes humans do.

  • When I think lot of use cases LLMs are planned for. I think not happy paths are critical. There is not insignificant number of people who would ramble about other things to customer support person if given opportunity. Or lack capability to only state needed and not add extra context.

    There might be happy path when you isolated to one or a few things. But not in general use cases...

After almost three years, the knee-jerk "I'm sure humans would also screw this up" response has become so tired that it feels AI-generated at this point. (Not saying you're doing this, actually the opposite.)

I think a lot of humans would not just disregard the odd information at the end, but say something about how odd it was, and ask the prompter to clarify their intentions. I don't see any of the AI answers doing that.

to put it in better context, the problem is "does having a ton of MCP tool definitions available ruin the LLM's ability to design and write the correct code?"

and the answer seems to be yes. its a very actionable result about keeping tool details out of the context if they arent immediately useful

“We need to move past the humans vs ai discourse it's getting tired.”

We can do both, the metaphysics of how different types of intelligence manifest will expand our knowledge of ourselves.