Comment by ActorNightly
19 days ago
They form control policy from vision data directly, which is why they need to have a massive model generate simulation vision data.
19 days ago
They form control policy from vision data directly, which is why they need to have a massive model generate simulation vision data.
System architecture aside, how else would you test end to end behavior other than starting from sensor inputs?
My other comment in the thread explains it.
Basically, driving policy needs to be MCTS search on a space that represents physical objects.
If I were to build a self driving system here is how I would do it:
* Define a 3d representation of the physical space around the car and how it evolves. Basically a very compressed simulator that has an input of initial conditions, and then predicts the evolution of the scene. The big difference here is that you would be manually coding this sim (i.e not training it), because you would be defining rules for things like collisions. You can also conveniently integrate your car control in this sim, with motion based on tire behavior that happens when you turn the steering wheel.
* Build probablistic behaviour of other objects (i.e cars/pedestrians) from real world driving data. I.e given a time span of driving, these essentially represent the probability of what the human pedestrian or the human driver would do.
* On the sensor side, you would train models to take lidar/camera and create the initial conditions for the sim. I.e things like big trucks would map to big trucks with a lot of mass and inertia, things like obstruction on the road would represent essentially "walls" that you cannot hit, and traffic control objects that represent "soft" boundaries.
* On the driver side, you would train something like MuZero to essentially play the driving game within the sim, building both the prediction model at training, and at inference time running MCTS to chose the best optimal policy. Scoring would be done based on things like following traffic control signals and not hitting things, minimizing traffic disturbance, following the GPS route, and so on e.t.c.
And this is how you would get superhuman driving. Just like in the cases where a neural net learns to play a particular game and finds really unique strategies, you would see similar things with this. For example, It would be able to avoid collision situations where you would get rear ended, because it would predict a collision, see that emergency lane is open, and create a control plan to move the car out of the way. And from a product perspective, you can imagine how advantageous this would be in terms of development and improvement.
And to answer your question, you wouldn't really even need to do end to end as a test for bugs, you would just need to make sure your sensor model is accurate, which can be done simply by driving the car and it observing the world. Its much simpler to do than comparable systems because you don't care about what the object is, you just care whether its part of the terrain or not, and if its not, you really just care about its size in terms of taking up space and its trajectory.