Comment by dev_hugepages

17 hours ago

memorized: https://www.asciiart.eu/cartoons/spongebob-squarepants

15 comments

dev_hugepages

Naturally. That's how LLMs work. During training you measure the loss, the difference between the model output and the ground-truth and try to minimize it. We prize models for their ability to learn. Here we can see that the large model does a great job at learning to draw bob, while the small model performs poorly.

ACCount37 14 hours ago

We don't value LLMs for rote memorization though. Perfect memorization is a long solved task. We value LLMs for their generalization capabilities.
A scuffed but fully original ASCII SpongeBob is usually more valuable than a perfect recall of an existing one.
One major issue with highly sparse MoE is that it appears to advance memorization more than it advances generalization. Which might be what we're seeing here.
endymion-light 17 hours ago

I'd argue that actually, the smaller model is doing a better job at "learning" - in that it's including key characteristics within an ascii image while poor.
The larger model already has it in the training corpus so it's not particularly a good measure though. I'd much rather see the capabilities of a model in trying to represent in ascii something that it's unlikely to have in it's training.
Maybe a pelican riding a bike as ascii for both?
mdp2021 13 hours ago
> That's how LLMs work
And that is also exactly how we want them not to work: we want them to be able to solve new problems. (Because Pandora's box is open, and they are not sold as a flexible query machine.)
"Where was Napoleon born": easy. "How to resolve the conflict effectively": hard. Solved problems are interesting to students. Professionals have to deal with non trivial ones.
- dingnuts 3 hours ago
  
  > how we want them not to work
  speak for yourself, I like solving problems and I'd like to retire before physical labor becomes the only way to support yourself
  > they are not sold as a flexible query machine
  yeah, SamA is a big fucking liar
WhitneyLand 13 hours ago

Not really.
Typically less than 1% of training data is memorized.

ricardobeat 15 hours ago

For the model to have memorized the entire sequence of characters precisely, this must appear hundreds of times in the training data?

ginko 17 hours ago

Conveniently removed the artist's signature though.

irthomasthomas 17 hours ago

Yes - they all do that. Actually, most attempts start well but unravel toward the end.

  llm -m chutes/Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 \
  "An ASCII of spongebob"
  Here's an ASCII art of SpongeBob SquarePants:
  
  ```
      .--..--..--..--..--..--.
    .' \  (`._   (_)     _   \
  .'    |  '._)         (_)  |
  \ _.')\      .----..--.   /
  |(_.'  |    /    .-\-.  \
  \     0|    |   ( O| O) |
   |  _  |  .--.____.'._.-.
   /.' )  | (_.' .-'"`-. _.-._.-.--.-.
  / .''.  |  .' `-. .-'-. .-'"`-.`-._)
   .'.' |  |   |  |  |  |  |  |  |  |
  .'.'   |  |   |  |  |  |  |  |  |  |
  .'.'   |  |   |  |  |  |  |  |  |  |
  .'.'   |  |   |  |  |  |  |  |  |  |
  .'.'   |  |   |  |  |  |  |  |  |  |
  .'.'   |  |   |  |  |  |  |  |  |  |
  ```

cbm-vic-20 16 hours ago

Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn.

2 replies →
mlvljr 16 hours ago

Going through shredder

eurekin 17 hours ago
Certainly not defending LLMs here, don't mistake with that.
Humans do it too. I have given up on my country's non-local information sources, because I could recognize original sources that are being deliberately omitted. There's a satiric webpage that is basically a reddit scrape. Most of users don't notice and those who do, don't seem to care.
- yorwba 17 hours ago
  
  Yes, the most likely reason the model omitted the signature is that humans reposted more copies of this image omitting the signature than ones that preserve it.