Comment by mooreds
19 hours ago
My biggest gripe is that he outsourced evaluation of the pelicans to another LLM.
I get it was way easier to do and that doing it took pennies and no time. But I would have loved it if he'd tried alternate methods of judging and seen what the results were.
Other ways:
* wisdom of the crowds (have people vote on it)
* wisdom of the experts (send the pelican images to a few dozen artists or ornithologists)
* wisdom of the LLMs (use more than one LLM)
Would have been neat to see what the human consensus was and if it differed from the LLM consensus
Anyway, great talk!
It would have been interesting to see if the LLM that Claude judged worst would have attempted to justify itself....