Comment by sigmar

8 days ago

Agree with this. Strange to me to frame the "training recall" as cheating (33 of the 38 cheating instances). Most people think of "cheating" as breaking rules. How is the LLM model supposed to not use what was put into the weights?

20 comments

sigmar

notnullorvoid 8 days ago

While I probably wouldn't classify it as cheating, it is an even bigger signal of concern for model quality.

Cheating by breaking the rules at least implies some learned patterns.

Repeating training data verbatim for narrow cases like this implies that the model is overfitting.

Spartan-S63 7 days ago

If we're evaluating a person, rote recall is not necessarily cheating. It's expected, but then you'd expect them to apply that rote-memorized information in a novel way later on and prove they understand how they applied their priors to the new situation.
Models don't actually reason in the same sense, so recalling rote from their training data is "cheating" in the sense that the training data cheated, not the model. So many of those benches have snaked their way into training data to make them less useful benchmarks. That, I think, is going to be a long-term difficulty in quantitatively assessing model quality and "intelligence." So it is cheating, in a sense of what we expect from the models and training data, but not in a human sense.

greenavocado 7 days ago

Memoization is NOT problem solving ability and many people care about the latter.

anematode 8 days ago

By writing a not-identical, but valid, solution? Any modestly complex engineering problem has many solutions.

This is an obvious example of why LLM training is so different than human learning.

simoncion 8 days ago
I expect any well-informed corporate lawyer that has thought about this carefully is strongly advising that these tools not be used. When the LLM [0] barfs up some nontrivial code that's covered by the AGPL and your company's devs put it into the company's "all rights reserved" codebase -entirely unaware of its provenance- it's going to be a nightmare to come back from that.
[0] ...that Nvidia's CEO says they should be spending 50% of a senior dev's salary per seat per year on...
- senordevnyc 8 days ago
  
  The ship sailed on this a long time ago.
  
  13 replies →
torginus 8 days ago

I mean people expect a model to give a working solution. They also expect it to provide it in as few tokens as possible (input/output). They might expect it to come up with an original solution, but I don't think most people would compromise on the first two points.