Comment by emil-lp
6 days ago
Yes, the "pelican riding a bicycle" is the ultimate test of not understanding how LLMs work.
Well, a combination of that and believing that replication of test data is a good measure of progress.
6 days ago
Yes, the "pelican riding a bicycle" is the ultimate test of not understanding how LLMs work.
Well, a combination of that and believing that replication of test data is a good measure of progress.
Spicy — why does it show ultimate non-understanding?
because success comes from reproducing a memorized pattern rather than transferable reasoning?
At the same time failure proves little because most humans also could not manually create a correct SVG of a pelican riding a bicycle.
What is it exactly that such a test is testing?
In which situation would you measure the "competence" of a human being by asking them to write an SVG of a pelican riding a bicycle?
> most humans also could not manually create a correct SVG of a pelican riding a bicycle.
Most humans absolutely can write this with a suitable vector graphics tool such as inkscape or illustrator.
Surely, you're not suggesting that a fair comparison would be using a text editor?
If so, would you suggest an equivalent raster based task would only be fair, if the human would manually assigning RGB values to each pixel?