Comment by xcodevn

16 hours ago

For one, these models should be able to understand the physical world via images, audio, and video. I do agree that current models are quite good at coding, but that's mainly because coding is entirely text-based and easily verifiable. It's not obvious that this capability will transfer to other domains that aren't text-based and aren't as easily verifiable.