Comment by aew2120
6 days ago
Interestingly, a small company called Ogma already did something very similar back in 2021 (on an embedded system, no less). This (https://ogma.ai/2021/07/unsupervised-behavioral-learning-ubl...) is a description/video of how they got a small RC car to predict the next frame of its video feed given the action it was about to take, and thereby made the car navigate to a given location when fed with a still frame of that location (all this with online learning, and no backprop).
Instead of vicreg, they induced their latent state with sparse auto-encoding. Also they predicted in pixel, as opposed to latent, space. The white paper describing their tech is a little bit of a mess, but schematically, at least, the hierarchical architecture they describe bears a strong resemblance to the hierarchical JEPA models LeCunn outlined in his big paper from a few years ago. A notable difference, though, is that their thing is essentially a reflex agent, as opposed to possessing a planning/optimization loop.
Just wanted to say thank very much for sharing this.
Over the last few months I've been inventing this almost exact approach in my head as a hobby without consciously knowing it had already been done. I love their little RC car demo.
The ideas at Ogma are inspired by Numenta's work.