Comment by Workaccount2

10 hours ago

It would be easy to out models that train on the bike pelican, because they would probably suck at the kayaking bumblebee.

So far though, the models good at bike pelican are also good at kayak bumblebee, or whatever other strange combo you can come up with.

So if they are trying to benchmaxx by making SVG generation stronger, that's not really a miss, is it?

That depends on if "SVG generation" is a particularly useful LLM/coding model skill outside of benchmarking. I.e., if they make that stronger with some params that otherwise may have been used for "rust type system awareness" or somesuch, it might be a net loss outside of the benchmarks.