← Back to context

Comment by emil-lp

6 days ago

Yes, the "pelican riding a bicycle" is the ultimate test of not understanding how LLMs work.

Well, a combination of that and believing that replication of test data is a good measure of progress.

Spicy — why does it show ultimate non-understanding?

  • because success comes from reproducing a memorized pattern rather than transferable reasoning?

    At the same time failure proves little because most humans also could not manually create a correct SVG of a pelican riding a bicycle.

    What is it exactly that such a test is testing?

    In which situation would you measure the "competence" of a human being by asking them to write an SVG of a pelican riding a bicycle?

    • > most humans also could not manually create a correct SVG of a pelican riding a bicycle.

      Most humans absolutely can write this with a suitable vector graphics tool such as inkscape or illustrator.

      Surely, you're not suggesting that a fair comparison would be using a text editor?

      If so, would you suggest an equivalent raster based task would only be fair, if the human would manually assigning RGB values to each pixel?