Comment by SequoiaHope
9 hours ago
Run out of training data? They’re going to put these things in humanoids (they are weirdly cheap now) and record high resolution video and other sensor data of real world tasks and train huge multimodal Vision Language Action models etc.
The world is more than just text. We can never run out of pixels if we point cameras at the real world and move them around.
I work in robotics and I don’t think people talking about this stuff appreciate that text and internet pictures is just the beginning. Robotics is poised to generate and consume TONS of data from the real world, not just the internet.
Yeah, another source of "unlimited data" is genetics. The human reference genome is about 6.5 GB, but these days, they're moving to pangenomes, wanting to map out not just the genome of one reference individual, but all the genetic variation in a clade. Depending on how ambitious they are about that "all", they can be humongous. And unlike say video data, this is arguably a language. We're completely swimming in unmapped, uninterpreted language data.