← Back to context

Comment by recursive

8 days ago

No, not every combination. The question is about the specific combination of a pelican on a bicycle. It might be easy to come up with another test, but we're looking at the results from a particular one here.

More likely you would just train for emitting svg for some description of a scene and create training data from raster images.

  • None of this works if the testers are collaborating with the trainers. The tests ostensibly need to be arms-length from the training. If the trainers ever start over-fitting to the test, the tester would come up with some new test secretly.

You can easily make a RLAIF loop.

- Take a list of n animals * m vehicule

- Ask a LLM to generate SVG for this n*m options

- Generate png from the svg

- Ask a Model with vision to grade the result

- Change your weight accordingly

No need to human to draw the dataset, no need of human to evaluate.