← Back to context

Comment by dev_hugepages

17 hours ago

memorized: https://www.asciiart.eu/cartoons/spongebob-squarepants

Naturally. That's how LLMs work. During training you measure the loss, the difference between the model output and the ground-truth and try to minimize it. We prize models for their ability to learn. Here we can see that the large model does a great job at learning to draw bob, while the small model performs poorly.

  • We don't value LLMs for rote memorization though. Perfect memorization is a long solved task. We value LLMs for their generalization capabilities.

    A scuffed but fully original ASCII SpongeBob is usually more valuable than a perfect recall of an existing one.

    One major issue with highly sparse MoE is that it appears to advance memorization more than it advances generalization. Which might be what we're seeing here.

  • I'd argue that actually, the smaller model is doing a better job at "learning" - in that it's including key characteristics within an ascii image while poor.

    The larger model already has it in the training corpus so it's not particularly a good measure though. I'd much rather see the capabilities of a model in trying to represent in ascii something that it's unlikely to have in it's training.

    Maybe a pelican riding a bike as ascii for both?

  • > That's how LLMs work

    And that is also exactly how we want them not to work: we want them to be able to solve new problems. (Because Pandora's box is open, and they are not sold as a flexible query machine.)

    "Where was Napoleon born": easy. "How to resolve the conflict effectively": hard. Solved problems are interesting to students. Professionals have to deal with non trivial ones.

    • > how we want them not to work

      speak for yourself, I like solving problems and I'd like to retire before physical labor becomes the only way to support yourself

      > they are not sold as a flexible query machine

      yeah, SamA is a big fucking liar

For the model to have memorized the entire sequence of characters precisely, this must appear hundreds of times in the training data?

Conveniently removed the artist's signature though.

  • Yes - they all do that. Actually, most attempts start well but unravel toward the end.

      llm -m chutes/Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 \
      "An ASCII of spongebob"
      Here's an ASCII art of SpongeBob SquarePants:
      
      ```
          .--..--..--..--..--..--.
        .' \  (`._   (_)     _   \
      .'    |  '._)         (_)  |
      \ _.')\      .----..--.   /
      |(_.'  |    /    .-\-.  \
      \     0|    |   ( O| O) |
       |  _  |  .--.____.'._.-.
       /.' )  | (_.' .-'"`-. _.-._.-.--.-.
      / .''.  |  .' `-. .-'-. .-'"`-.`-._)
       .'.' |  |   |  |  |  |  |  |  |  |
      .'.'   |  |   |  |  |  |  |  |  |  |
      .'.'   |  |   |  |  |  |  |  |  |  |
      .'.'   |  |   |  |  |  |  |  |  |  |
      .'.'   |  |   |  |  |  |  |  |  |  |
      .'.'   |  |   |  |  |  |  |  |  |  |
      ```

  • Certainly not defending LLMs here, don't mistake with that.

    Humans do it too. I have given up on my country's non-local information sources, because I could recognize original sources that are being deliberately omitted. There's a satiric webpage that is basically a reddit scrape. Most of users don't notice and those who do, don't seem to care.

    • Yes, the most likely reason the model omitted the signature is that humans reposted more copies of this image omitting the signature than ones that preserve it.