Comment by travisgriggs

1 day ago

That’s ok, once bicycle “riding” pelicans become normative, we can ask it for images of pelicans humping bicycles.

The number of subject-verb-objects are near infinite. All are imaginable, but most are not plausible. A plausibility machine (LLM) will struggle with the implausible, until it can abstract well.

3 comments

travisgriggs

zahlman 1 day ago

I can't fathom this working, simply because building a model that relates the word "ride" to "hump" seems like something that would be orders of magnitude easier for an LLM than visualizing the result of SVG rendering.

diggan 1 day ago

> The number of subject-verb-objects are near infinite. All are imaginable, but most are not plausible

Until there is enough unique/new subject-verb-objects examples/benchmarks so the trained model actually generalized it just like you did. (Public) Benchmarks needs to constantly evolve, otherwise they stop being useful.

demosthanos 1 day ago

To be fair, once it does generalize the pattern then the benchmark is actually measuring something useful for deciding if the model will be able to product a subject-verb-object SVG.