← Back to context

Comment by emil-lp

6 days ago

Yes, the "pelican riding a bicycle" is the ultimate test of not understanding how LLMs work.

Well, a combination of that and believing that replication of test data is a good measure of progress.

3 comments

emil-lp

Reply

vessenes 6 days ago

Spicy — why does it show ultimate non-understanding?

JohnKemeny 5 days ago
because success comes from reproducing a memorized pattern rather than transferable reasoning?
At the same time failure proves little because most humans also could not manually create a correct SVG of a pelican riding a bicycle.
What is it exactly that such a test is testing?
In which situation would you measure the "competence" of a human being by asking them to write an SVG of a pelican riding a bicycle?
- okamiueru 4 days ago
  
  > most humans also could not manually create a correct SVG of a pelican riding a bicycle.
  Most humans absolutely can write this with a suitable vector graphics tool such as inkscape or illustrator.
  Surely, you're not suggesting that a fair comparison would be using a text editor?
  If so, would you suggest an equivalent raster based task would only be fair, if the human would manually assigning RGB values to each pixel?