Comment by the8472
8 days ago
Many tasks are amenable to simulation training and synthetic data. Math proofs, virtual game environments, programming.
And we haven't run out of all data. High-quality text data may be exhausted, but we have many many life-years worth of video. Being able to predict visual imagery means building a physical world model. Combine this passive observation with active experimentation in simulated and real environments and you get millions of hours of navigating and steering a causal world. Deepmind has been hooking up their models to real robots to let them actively explore and generate interesting training data for a long time. There's more to DL than LLMs.
This is true, a lot of progress can still happen based on simulation and synthetic data. But I am considering the long term game. In the long term we can't substitute simulation to reality. We can't even predict if a 3-body system will eventually eject an object, or if a piece of code will halt for all possible inputs. Physical systems implementing Turing machines are undecidable. Even fluid flows. The core problem is that recursive processes create an knowledge gap, and we can't cross that gap unless we walk the full recursion, there is no way to predict the outcome from outside. The real world is such an undecidable recursive process. AI can still make progress, but not at exponentially speed decoupled from the real world and not in isolation.