Comment by travisgriggs
1 day ago
That’s ok, once bicycle “riding” pelicans become normative, we can ask it for images of pelicans humping bicycles.
The number of subject-verb-objects are near infinite. All are imaginable, but most are not plausible. A plausibility machine (LLM) will struggle with the implausible, until it can abstract well.
I can't fathom this working, simply because building a model that relates the word "ride" to "hump" seems like something that would be orders of magnitude easier for an LLM than visualizing the result of SVG rendering.
> The number of subject-verb-objects are near infinite. All are imaginable, but most are not plausible
Until there is enough unique/new subject-verb-objects examples/benchmarks so the trained model actually generalized it just like you did. (Public) Benchmarks needs to constantly evolve, otherwise they stop being useful.
To be fair, once it does generalize the pattern then the benchmark is actually measuring something useful for deciding if the model will be able to product a subject-verb-object SVG.