Comment by segmondy
3 months ago
I would argue that some would add time to that as well, a lot of our data are missing spatial and temporal information. But if we're able to take text2text models and add in audio/vision then I suspect we can apply the same technique to add in spatial and temporal intelligence. However the data for those are non existent unlike audio and visual data.
No comments yet
Contribute on Hacker News ↗