← Back to context

Comment by pineapple_opus

6 days ago

All I see is mention of how various models generate image of "pelican riding bicycle(s)"

Yes, the "pelican riding a bicycle" is the ultimate test of not understanding how LLMs work.

Well, a combination of that and believing that replication of test data is a good measure of progress.

  • Spicy — why does it show ultimate non-understanding?

    • because success comes from reproducing a memorized pattern rather than transferable reasoning?

      At the same time failure proves little because most humans also could not manually create a correct SVG of a pelican riding a bicycle.

      What is it exactly that such a test is testing?

      In which situation would you measure the "competence" of a human being by asking them to write an SVG of a pelican riding a bicycle?

      1 reply →