Comment by toraway
8 days ago
Huh? AI labs are routinely spending millions to billions to various 3rd party contractors specializing in creating/labeling/verifying specialized content for pre/post-training.
This would just be one more checkbox buried in hundreds of pages of requests, and compared to plenty of other ethical grey areas like copyright laundering with actual legal implications, leaking that someone was asked to create a few dozen pelican images seems like it would be at the very bottom of the list of reputational risks.
How do you think who's in on that? Not only pelicans, I mean, the whole thing. CEOs, top researchers, select mathematicians, congressmen? Does China participate in maintaining the bubble?
I, myself, prefer the universal approximation theorem and empirical finding that stochastic gradient descent is good enough (and "no 'magic' in the brain", of course).
Well, since we're all talking about sourcing training material to "benchmaxx" for social proof, and not litigating the whole "AI bubble" debate, just the entire cottage industry of data curation firms:
https://scale.com/data-engine
https://www.appen.com/llm-training-data
https://www.cogitotech.com/generative-ai/
https://www.telusdigital.com/solutions/data-for-ai-training/...
https://www.nexdata.ai/industries/generative-ai
---
P.S. Google Comms would have been consulted re putting a pelican in the I/O keynote :-)
https://x.com/simonw/status/1924909405906338033
Cool. At least they are working across the board and benchmaxing random things like the theory of mind.