Comment by irthomasthomas
2 days ago
Naturally. That's how LLMs work. During training you measure the loss, the difference between the model output and the ground-truth and try to minimize it. We prize models for their ability to learn. Here we can see that the large model does a great job at learning to draw bob, while the small model performs poorly.
We don't value LLMs for rote memorization though. Perfect memorization is a long solved task. We value LLMs for their generalization capabilities.
A scuffed but fully original ASCII SpongeBob is usually more valuable than a perfect recall of an existing one.
One major issue with highly sparse MoE is that it appears to advance memorization more than it advances generalization. Which might be what we're seeing here.
I'd argue that actually, the smaller model is doing a better job at "learning" - in that it's including key characteristics within an ascii image while poor.
The larger model already has it in the training corpus so it's not particularly a good measure though. I'd much rather see the capabilities of a model in trying to represent in ascii something that it's unlikely to have in it's training.
Maybe a pelican riding a bike as ascii for both?
> That's how LLMs work
And that is also exactly how we want them not to work: we want them to be able to solve new problems. (Because Pandora's box is open, and they are not sold as a flexible query machine.)
"Where was Napoleon born": easy. "How to resolve the conflict effectively": hard. Solved problems are interesting to students. Professionals have to deal with non trivial ones.
> how we want them not to work
speak for yourself, I like solving problems and I'd like to retire before physical labor becomes the only way to support yourself
> they are not sold as a flexible query machine
yeah, SamA is a big fucking liar
I get your fear, d., but I am afraid we urgently need them tools, and to work properly. At some point in time the gap between workforce and objectives forced us to adopt cranes; at this point in time I see that "the carbon" is not "competing" enough. An IQ boost in the toolbox, when we will finally reach it, will be an enabler: for doom in the hands of fools, for the best in the hands of the wise - proportions worrisome but the game is not decided.
Meanwhile, there is no turning back and as the mockery of intelligence was invented, the Real Thing must be urgently found.
Edit: I have just read the title "Amateurish plan exposed failing diplomacy". The giants' list includes McNamara, Kissinger, Brzezinski: if some say that their efforts have not been sufficient - and failures are very costly -, what do we need?
Not really.
Typically less than 1% of training data is memorized.