Suddenly all this focus on world models by Deep mind starts to make sense. I've never really thought of Waymo as a robot in the same way as e.g. a Boston Dynamics humanoid, but of course it is a robot of sorts.
Google/Alphabet are so vertically integrated for AI when you think about it. Compare what they're doing - their own power generation , their own silicon, their own data centers, search Gmail YouTube Gemini workspace wallet, billions and billions of Android and Chromebook users, their ads everywhere, their browser everywhere, waymo, probably buy back Boston dynamics soon enough (they're recently partnered together), fusion research, drugs discovery.... and then look at ChatGPT's chatbot or grok's porn. Pales in comparison.
Google has been doing more R&D and internal deployment of AI and less trying to sell it as a product. IMHO that difference in focus makes a huge difference. I used to think their early work on self-driving cars was primarily to support Street View in thier maps.
There was a point in time when basically every well known AI researcher worked at Google. They have been at the forefront of AI research and investing heavily for longer than anybody.
It’s kind of crazy that they have been slow to create real products and competitive large scale models from their research.
But they are in full gear now that there is real competition, and it’ll be cool to see what they release over the next few years.
Tesla built something like this for FSD training, they presented many years ago. I never understood why they did productize it. It would have made a brilliant Maps alternative, which country automatically update from Tesla cars on the road. Could live update with speed cameras and road conditions. Like many things they've fallen behind
Without Lidar + the terrible quality of tesla onboard cameras.. street view would look terrible. The biggest L of elon's career is the weird commitment to no-lidar. If you've ever driven a Tesla, it gives daily messages "the left side camera is blocked" etc.. cameras+weather don't mix either.
Not really I think, they built a simulation engine for autonomous driving, for which tons of such exist out there including ones from Nvidia and also at least 1 open-source one. Using world models is different.
Maybe they were focusing on a real world use that basically requires AI, but not LLMs.
Tesla claimed that all their "real world" recording would give them a moat on FSD.
Waymo is showing that a) you need to be able to incorporate stuff that isn't "real" when training, and b) you get a lot more information from alternate sensors to visible spectrum only.
I just listened to a fantastic multi-hour Acquired (https://www.acquired.fm/) podcast episode on Google and AI that talks about the history of Google and AI and all the ways they have been using it since 2012. It's really fascinating. You can forgive them for not focusing on Reader or any of their other properties when you realize they were pulling in hundreds of billions of dollars of value by making big bets in AI and incorporating it into their core business.
They started working on humanoid robots because Musk always has to have the next moonshot, trillion-dollar idea to promise "in 3 years" to keep the stock price high.
As soon as Waymo's massive robotaxi lead became undeniable, he pivoted to from robotaxis to humanoid robots.
Pretty much. They banked on "if we can solve FSD, we can partially solve humanoid robot autonomy, because both are robots operating in poorly structured real world environments".
The drop in demand for Tesla's clapped out model range would have meant embarrassing factory closures, so now they're being closed to start manufacturing a completely different product. Bait and switch for Tesla investors.
I wonder how long they'll be closed for "modifications" and whether the Optimus Prime robot factories will go into production before the "Trump Kennedy Center" is reopened after its "renovations".
So is this a model baked into the VLLM layer? Or a scaffold that the agent sits in for testing?
If the former then it’s relevant to the broader discourse on LLM generality. If the latter, then it seems less relevant to chatbots and business agents.
>> Suddenly all this focus on world models by Deep mind starts to make sense.
The apparent applicability to Waymo is incidental, more likely because a few millions+ were spent on Genie and they have to do something with it. DeepMind started to train "world models" because that's the current overhyped buzzword in the industry. First it was "natural language understanding" and "question answering" back in the days of old BERT, then it was "agentic", then "reasoning", now it's "world models", next years it's going to be "emotions" or "social intelligence" or some other anthropomorphic, over-drawn neologism. If you follow a few AI accounts on social media you really can't miss when those things suddenly start trending, then pretty much die out and only a few stragglers still try to publish papers on them because they failed to get the memo that we're now all running behind the Next Big Thing™.
notice that all these buzzwords you give actually correspond to real advances in the field. All of these were improvements on something existing, not a big revolution for sure, but definitely measurable improvements.
Practically ALL course introductory materials that regard robotics and AI that I've seen began with "you might imagine a talking bipedal humanoid when you hear the word `robot`, but perhaps the most commonplace robot that you have seen is a vending machine", with the illustration of a typical 80s-90s outdoor soda vendor with no apparent moving parts.
So "maybe cars are a bit of robots too" is more like 30-50 years behind the time.
Erm, a dishwasher, washing machine, automated vacuum can be considered robots. Im confused as to this obsession of the term - there are many robots that already exist. Robotics have been involved in the production of cars for decades.
I think the (gray) line is the degree of autonomy. My washing machine makes very small, predictable decisions, while a Waymo has to manage uncertainty most of the time.
I know it’s gross, but I would not discount this. Remember why Blu-ray won over HDDVD? I know it won for many other technical reasons, but I think there are a few historical examples of sexual content being a big competitive advantage.
The vertical integration argument should apply to Grok. They have Tesla driving data (probably much more data than Waymo), Twitter data, plus Tesla/SpaceX manufacturing data. When/if Optimus starts on the production line, they'll have that data too. You could argue they haven't figured out how to take advantage of it, but the potential is definitely there.
Agreed. Should they achieve Google level integration, we will all make sure they are featured in our commentary. Their true potential is surely just around the corner...
"Tesla has more data than Waymo" is some of the lamest cope ever. Tesla does not have more video than Google! That's crazy! People who repeat this are crazy! If there was a massive flow of video from Tesla cars to Tesla HQ that would have observable side effects.
But somehow google fails to execute. Gemini is useless for programming and I don’t think even bother to use it as chat app. Claude code + gpt 5.2 xhigh for coding and gpt as chat app are really the only ones that are worth it(price and time wise)
I've recently switched to Claude for chat. GPT 5.2 feels very engagement-maxxed for me, like I'm reading a bad LinkedIn post. Claude does a tiny bit of this too, but an order of magnitude less in my experience. I never thought I'd switch from ChatGPT, but there is only so much "here's the brutal truth, it's not x it's y" I can take.
Gemini is by far the best UI/UX designer model. Codex seems to the worst: it'll build something awkward and ugly, then Gemini will take 30-60 seconds to make it look like something that would have won a design award a couple years ago.
It is a bit mind boggling how behind they were considering they invented transformers and were also sitting on the best set of training data in the world, but they've caught up quite a bit. They still lag behind in coding, but I've found Gemini to be pretty good at more general knowledge tasks. Flash 3 in particular is much better than anything of comparable price and speed from OpenAI or Anthropic.
Yesterday GPT 5.2 wrote a python function for me that had the import in the middle of the code, for no reason. (It was a simple import of requests module in a REST client...)
Claude I agree is a lot better for backend,Gemini is very good for frontend
> The Waymo World Model can convert those kinds of videos, or any taken with a regular camera, into a multimodal simulation—showing how the Waymo Driver would see that exact scene.
Subtle brag that Waymo could drive in camera-only mode if they chose to. They've stated as much previously, but that doesn't seem widely known.
I think I'm misunderstanding - they're converting video into their representation which was bootstrapped with LIDAR, video and other sensors. I feel you're alluding to Tesla, but Tesla could never have this outcome since they never had a LIDAR phase.
(edit - I'm referring to deployed Tesla vehicles, I don't know what their research fleet comprises, but other commenters explain that this fleet does collect LIDAR)
I think what we are seeing is that they both converged on the correct approach, one of them decided to talk about it, and it triggered disclosure all around since nobody wants to be seen as lagging.
Tesla does collect LIDAR data (people have seen them doing it, it's just not on all of the cars) and they do generate depth maps from sensor data, but from the examples I've seen it is much lower resolution than these Waymo examples.
Human depth perception uses stereo out to only about 2 or 3 meters, after which the distance between your eyes is not a useful baseline. Beyond 3m we use context clues and depth from motion when available.
(Always worth noting, human depth perception is not just based on stereoscopic vision, but also with focal distance, which is why so many people get simulator sickness from stereoscopic 3d VR)
> Humans do this, just in the sense of depth perception with both eyes.
Humans do this with vibes and instincts, not just depth perception. When I can't see the lines on the road because there's too much slow, I can still interpret where they would be based on my familiarity with the roads and my implicit knowledge of how roads work, e.g. We do similar things for heavy rain or fog, although, sometimes those situations truly necessitate pulling over or slowing down and turning on your 4s - lidar might genuinely given an advantage there.
I think there are two steps here: converting video to sensor data input, and using that sensor data to drive. Only the second step will be handled by cars on road, first one is purely for training.
It’s way easier to “jam” a camera with bright light than a lidar, which uses both narrow band optical filters and pulsed signals with filters to detect that temporal sequence. If I were an adversary, going after cameras is way way easier.
Autonomous cars need to be significantly better than humans to be fully accepted especially when an accident does happen. Hence limiting yourself to only cameras is futile.
Surely as soon as they're safer than humans they should be deployed as fast as possible to save some of the 3000 people who are killed by human drivers every day
I've always wondered... if Lidar + Cameras is always making the right decision, you should theoretically be able to take the output of the Lidar + Cameras model and use it as training data for a Camera only model.
That's exactly what Tesla is doing with their validation vehicles, the ones with Lidar towers on top. They establish the "ground truth" from Lidar and use that to train and/or test the vision model. Presumably more "test", since they've most often been seen in Robotaxi service expansion areas shortly before fleet deployment.
No, I don't think that will be successful. Consider a day where the temperature and humidity is just right to make tail pipe exhaust form dense fog clouds. That will be opaque or nearly so to a camera, transparent to a radar, and I would assume something in between to a lidar. Multi-modal sensor fusion is always going to be more reliable at classifying some kinds of challenging scene segments. It doesn't take long to imagine many other scenarios where fusing the returns of multiple sensors is going to greatly increase classification accuracy.
By leveraging Genie’s immense world knowledge, it can simulate exceedingly rare events—from a tornado to a casual encounter with an elephant—that are almost impossible to capture at scale in reality. The model’s architecture offers high controllability, allowing our engineers to modify simulations with simple language prompts, driving inputs, and scene layouts. Notably, the Waymo World Model generates high-fidelity, multi-sensor outputs that include both camera and lidar data.
How do you know the generated outputs are correct? Especially for unusual circumstances?
Say the scenario is a patch of road is densely covered with 5 mm ball bearings. I'm sure the model will happily spit out numbers, but are they reasonable? How do we know they are reasonable? Even if the prediction is ok, how do we fundamentally know that the prediction for 4 mm ball bearings won't be completely wrong?
There seems to be a lot of critical information missing.
The idea is that, over time, the quality and accuracy of world-model outputs will improve. That, in turn, lets autonomous driving systems train on a large amount of “realistic enough” synthetic data.
For example, we know from experience that Waymo is currently good enough to drive in San Francisco. We don’t yet trust it in more complex environments like dense European cities or Southeast Asian “hell roads.” Running the stack against world models can give a big head start in understanding what works, and which situations are harder, without putting any humans in harm’s way.
We don’t need perfect accuracy from the world model to get real value. And, as usual, the more we use and validate these models, the more we can improve them; creating a virtuous cycle.
I don't think you say "ok now the car is ball bearing proof."
Think of it more like unit tests. "In this synthetic scenario does the car stop as expected, does it continue as expected." You might hit some false negatives but there isn't a downside to that.
If it turns out your model has a blind spot for albino cows in a snow storm eating marshmallows, you might be able to catch that synthetically and spend some extra effort to prevent it.
The blackouts circumstance was because they escalate blinking/out of service traffic lights to a human confirmed decision, and they experienced a bottleneck spike in those requests for how little they were staffed. The Waymo itself was fine and was prepared to make the correct decision, it just needed a human in the loop.
In the video from the parade... there's just... people in the road. Like, a lot of small children and actual people on this tiny, super narrow bridge. I think that erring on the side of "don't think you can make it but accidentally drag a small child instead" is probably the right call, though admittedly, these cases are a bit wonky.
Isn't that true for any scenario previously unencountered, whether it is a digital simulation or a human? We can't optimize for the best possible outcome in reality (since we can't predict the future), but we can optimize for making the best decisions given our knowledge of the world (even if it is imperfect).
In other words it is a gradient from "my current prediction" to "best prediction given my imperfect knowledge" to "best prediction with perfect knowledge", and you can improve the outcome by shrinking the gap between 1&2 or shrinking the gap between 2&3 (or both)
seems like the obvious answer to that is you cover a patch of road with 5mm ball bearings, and send a waymo to drive across it. if the ball bearings behave the way the simulation says they would, and the car behaves the way the simulation said it would, then you've validated your simulation.
do that for enough different scenarios, and if the model is consistently accurate across every scenario you validate, then you can start believing that it will also be accurate for the scenarios you haven't (and can't) validate.
>> How do you know the generated outputs are correct? Especially for unusual circumstances?
You know the outputs are correct because the models have many billions of parameters and were trained on many years of video on many hectares of server farms. Of course they'll generate correct outputs!
I mean that's literally the justification. There aren't even any benchmarks that you can beat with video generation, not even any bollocks ones like for LLMs.
As someone who lives in the Bay Area we already have trains, and they're literally past the point of bankruptcy because they (1) don't actually charge enough maintain the variable cost of operations, (2) don't actually make people pay at all, and (3) don't actually enforce any quality of life concerns short of breaking up literal fights. All of this creates negative synergies that pushes a huge, mostly silent segment of the potential ridership away from these systems.
So many people advocate for public transit, but are unwilling to deal with the current market tradeoffs and decisions people are making on the ground. As long as that keeps happening, expect modes of transit -- like Waymo -- that deliver the level of service that they promise to keep exceeding expectations.
I've spent my entire adult life advocating for transportation alternatives, and at every turn in America, the vast majority of other transit advocates just expect people to be okay with anti-social behavior going completely unenforced, and expecting "good citizens" to keep paying when the expected value for any rational person is to engage in freeloading. Then they point to "enforcing the fare box" as a tradeoff between money to collect vs cost of enforcement, when the actually tradeoff is the signalling to every anti-social actor in the system that they can do whatever they want without any consequences.
I currently only see a future in bike-share, because it's the only system that actually delivers on what it promises.
> they (1) don't actually charge enough maintain the variable cost of operations
Why do you expect them to make money? Roads don't make money and no one thinks to complain about that. One of the purposes of government is to make investment in things that have more nebulous returns. Moving more people to public transit makes better cities, healthier and happier citizens, stronger communities, and lets us save money on road infrastructure.
You're definitely right on (2) and (3). I've used many transit systems across the world (including TransMilenio in Bogota and other latam countries "renowned" for crime) and I have never felt as unsafe as I have using transit in the SFBA. Even standing at bus stops draws a lot of attention from people suffering with serious addiction/mental health problems.
1) is a bit simplistic though. I don't know of any European system that would cover even operating costs out of fare/commercial revenue. Potentially the London Underground - but not London buses. UK National Rail had higher success rates
The better way to look at it imo is looking at the economic loss as well of congestion/abandoned commutes. To do a ridiculous hypothetical, London would collapse entirely if it didn't have transit. Perhaps 30-40% of inner london could commute by car (or walk/bike), so the economic benefit of that variable transit cost is in the hundreds of billions a year (compared to a small subsidy).
It's not the same in SFBA so I guess it's far easier to just "write off" transit like that, it is theoretically possible (though you'd probably get some quite extreme additional congestion on the freeways as even that small % moving to cars would have an outsized impact on additional congestion).
As a fellow public transit fan, you're on the money. Even the shining stars of transit in the US --- NYC MTA subway and CTA --- have huge qualuty of life issues. I can't fault someone for not wanting to ride trains ever again when someone who hasn't showered in 41 years pulls up with a cart full of whatever the fuck and decides to squat the corner seat closest to the car door and be a living biological weapon during rush hour. Or "showtime."
That's before you consider how it takes 2-4x as long to get somewhere by public transit outside of peak hours and/or well-covered areas. A 20 minute trip from a bar in Queens to Brooklyn by car takes an hour by train after 2300, not including walking time. I made that trip many, many times, and hated it each time.
It's worth noting that, at least for bart, the reason that it is facing bankruptcy is precisely because it was mostly rider supported and profitable, and not government supported.
When ridership plummeted by >50% during the pandemic, fixed costs stayed the same, but income dropped. Last time I checked, if Bart ridership returned to 2019 levels, with no other changes, it would be profitable again.
over the long term, this is solved with a wealth tax, but undoing what rich ppl have done to society (i.e. making lots of poor people) will unfortunately take many, many years; so many years that it will never actually happen
Trains need well behaved people, otherwise they are shit.
I don't want to hear tiktok or full volume soap operas blasting at some deaf mouth breather.
I don't want to be near loud chewing of smelly leftovers.
I don't want to be begged for money, or interact with high or psychotic people.
The current culture doesn't allow enforcement of social behaviour: so public transport will always be a miserable containment vessel for the least functional, and everyone with sense avoids the whole thing.
Or the majority of the residents of New York City on their daily commute? I like to think I have sense, and I happily use public transport most days. I prefer it to sitting in traffic, isolated in a car. At least I can read a book. If you work too hard to insulate yourself from the world, the spaces you'll feel comfortable in will get more and more narrow. I think that's a bad thing.
I quite agree with the overall point but can we leave this kind of discourse on X, please? It doesn't add much, it just feels caustic for effect and engagement farming.
No matter what, people are going to still use cars because they are an absolute advantage over public transportation for certain use cases. It is better that the existing status quo is improved to reduce death rates, than hope for a much larger scale change in infrastructure (when we have already seen that attempts at infrastructure overhaul in the US, like high-speed rail, is just an infinitely deep money pit)
Even though the train system in Japan is 10x better than the US as a whole, the per-capita vehicle ownership rate in Japan is not much lower than the US (779 per 1000 vs 670 per 1000). It would be a pipe dream for American trains/subways to be as good as Japan, but even a change that significant would lead to a vehicle ownership share reduced by only about 13%.
I don't think individual vehicles can ever achieve the same envirnmental economies of scale as trains. Certainly they're far more convenient (especially for short-haul journeys) but I also think they're somewhat alienating, in that they're engineering humans out of the loop completely which contributes to social atomization.
Trains only require subsidies in a world where human & robot cars are subsidized.
As soon as a mode of transport actually has to compete in a market for scarce & valuable land to operate on, trains and other forms of transit (publicly or privately owned) win every time.
Source? The biggest source of environmental issues from EVs, tire wear from a heavier vehicle, absolutely applies to AVs. VC subsidizing low prices only to hike them later isn't exactly "without subsidy" - we pay for it either way
Me too but given our extensive car brain culture, Waymo is an amazing step to getting less drivers & cars off the road, and to further cement future generations not ever needing to drive or own cars
Pretty much this. Wild that you can traverse most of China in affordable high speed trains, yet the Amtrak from Seattle to Portland barely crawls along and has to regularly stop for long periods of time because the tracks get too hot in the Summer.
Enough with the trains. I’m all for trains but theyre good for in city or 1-3 hour journeys. Taking a train across the US would take a day even with high speed trains.
I’d much rather have my own vehicle than share my space with a bunch of people.
This is the real story buried under the simulation angle. If you can generate
reliable 3D LiDAR from 2D video, every dashcam on earth becomes training data.
Every YouTube driving video, every GoPro clip, every security camera feed.
Waymo's fleet is ~700 cars. The internet has millions of hours of driving
footage. This technique turns the entire internet into a sensor suite. That's a bigger deal than the simulation itself.
It's not unheard of, there are a handful [0] of metric monodepth methods that output data that's not unlike a really inaccurate 3D lidar, though theirs certainly looks SOTA.
It’s impressive to see simulation training for floods, tornadoes, and wildfires. But it’s also kind of baffling that a city full of Waymos all seemed to fail simultaneously in San Francisco when the power went out on Dec 22.
A power outage feels like a baseline scenario—orders of magnitude more common than the disasters in this demo. If the system can’t degrade gracefully when traffic lights go dark, what exactly is all that simulation buying us?
All this simulation buys a single vehicle that drives better. That failure was a fleet-wide event (overloading the remote assistance humans).
That is, both are true: this high-fidelity simulation is valuable and it won't catch all failure modes. Or in other words, it's still on Waymo for failing during the power outage, but it's not uniquely on Waymo's simulation team.
We started with physics-based simulators for training policies. Then put them in the real world using modular perception/prediction/planning systems. Once enough data was collected, we went back to making simulators. This time, they're physics "informed" deep learning models.
That's a very interesting way of looking at it. Yes, you start with simulating something simpler than the real world. Then you use the real world. Then you need to go back to simulations for real-world things that are too rare in the real world to train with.
Seems like there ought to be a name for this, like so-and-so's law.
Regardless of the corporate structure DeepMind is a lot more than just another Alphabet subsidiary at this point considering Demis Hassabis is leading all of Google AI.
Finally I understand the use case for Genie 3. All the talk about "you can make any videogame or movie" seems to have been pure distraction from real uses like this: limited, time-boxed simulated footage.
IIUC, there's a confusion of meaning for "World Model", between Waymo/Deepmind's which is something that can create a consistent world (for use to train Waymo's Driver), vs Yann LeCun/Advanced Machine Intelligence (AMI) which is something that can understand a world.
The "world model" is a convenient fiction. Whether we’re talking about a carbon-based brain or a silicon-based transformer, there is no miniature, objective map of reality tucked away inside. What we mistake for a "model" is actually just the layered residue of experience.
From the perspective of enactivism and radical empiricism, intelligence doesn't "represent" the world; it simply navigates it. A biological organism doesn't need a 3D CAD file of a tree to survive; it only needs a history of sensory-motor contingencies—the "if I move this way, I see that" patterns. It’s a synthesis of interactions, not a library of blueprints.
AI operates on the same logic, albeit through a different medium. It isn't simulating the physical laws of the universe or "understanding" gravity. Instead, it navigates the high-dimensional geometry of human data. It’s a sophisticated engine of association, performing a high-speed synthesis of the patterns we've left behind.
In this view, "knowing" isn't about matching an internal image to an external truth. It is the seamless flow of past inputs into future predictions. There is no world model—only the habit of being.
I'd like to see Waymo have a few of their Drivers do some sim racing training and then compete in some live events. It wouldn't matter much to me if they were fast at all, I'd like to see them go into the rookie classes in various games and see how they avoid crashes from inexperienced players. I believe that it would be the ultimate "shitty drivers vs. AI" test.
Racing and street driving are completely different. Racing involves detailed knowledge of vehicle dynamics and grip. Street driving is mainly obstacle recognition and avoidance. No waymo ever operates anywhere close to the limit of grip, which is where you are all the time when racing.
Interesting, but it feels like it's going to cope very poorly with actually safety-critical situations. Having a world model that's trained on successful driving data feels like it's going to "launder" a lot of implicit assumptions that would cause a car to get into a crash in real life (e.g. there's probably no examples in the training data where the car is behind a stopped car, and the driver pulls over to another lane and another car comes from behind and crashes into the driver because it didn't check its blindspot). These types of subtle biases are going to make AI-simulated world models a poor fit for training safety systems where failure cannot be represented in the training data, since they basically give models "free reign" to do anything that couldn't be represented in world model training.
You're forgetting that they are also training with real data from the 100+ million miles they've driven on real roads with riders, and using that data to train the world model AI.
> there's probably no examples in the training data where the car is behind a stopped car, and the driver pulls over to another lane and another car comes from behind and crashes into the driver because it didn't check its blindspot
While there most likely is going to be some bias in the training of those kinds of models, we can also hope that transfer learning from other non-driving videos will at least help generate something close enough to the very real but unusual situations you are mentioning. We could imagine an LLM serving as some kind of fuzzer to create a large variety of prompts for the world model, which as we can see in the article seems pretty capable at generating fictive scenarios when asked to.
As always tho the devil lies in the details: is an LLM based generation pipeline good enough? What even is the definition of "good enough"? Even with good prompts will the world model output something sufficiently close to reality so that it can be used as a good virtual driving environment for further training / testing of autonomous cars? Or do the kind of limitations you mentioned still mean subtle but dangerous imprecisions will slip through and cause too poor data distribution to be a truly viable approach?
My personal feeling is that this we will land somewhere in between: I think approaches like this one will be very useful, but I also don't think the current state of AI models mean we can have something 100% reliable with this.
The question is: is 100% reliability a realistic goal? Human drivers are definitely not 100% reliable. If we come up with a solution 10x more reliable than the best human drivers, that maybe has some also some hard proof that it cannot have certain classes of catastrophic failure modes (probably with verified code based approaches that for instance guarantees that even if the NN output is invalid the car doesn't try to make moves out of a verifiably safe envelope) then I feel like the public and regulators would be much more inclined to authorize full autonomy.
As a Londoner who used to have to ride up Abbey Road at least once per week there are people on that crossing pretty much all day every day reproducing that picture. So now Waymo are in Beta in London[1] they have only to drive up there and they'll get plenty of footage they could use for taht.
[1] I've seen a couple of them but they're not available to hire yet and are still very rare.
The term "world model" seems almost meaningless. This is a world model in the same sense as ChatGPT is a world model. Both have some ability to model aspects of the real world.
Interesting, but I am very sceptical. I'd be interested in seeing actual verified results of how it handles a road with heavy snow, where the only lane references are the wheel tracks of other vehicles, and you can't tell where the road ends and the snow-filled ditch begins.
Very concerned with this direction of training
“counterfactual events such as whether the Waymo Driver could have safely driven more confidently instead of yielding in a particular situation.” Seems dicey. This could lead in the direction to a less safe Waymo. Since the counterfactual will be generated, I suspect that that the generations will be biased towards survivor situations where most video footage in its training data will be from environments where people reacted well not those that ended in tragedy. Emboldening Waymo on generated best case data. THIS IS DANGEROUS!!!
Not at all. It's not the counter-factual they're generating, it's the "too rare to capture often enough to train a response to" they're generating.
They're implying that without the model having knowledge, even approximate, of a scene to react to, it simply doesn't react at all; it simply "yields" to the situation until it passes. In my experience taking Waymo's almost daily this holds.
I would rather not have the Waymo yield to a tornado, rising flood-waters, or charging elephant...
Driving is always a balance between speed and safety. If you want ultimate safety you just sit in the driveway. But obviously that isn't useful. So functionally one of the most important things a self-driving system will decide is "how fast is it safe to drive right now". Slower is not always better and it has to balance safety with productivity.
Not entering a roundabout when it's clearly safe to do so is a mark against you at a driving exam. So would be always driving at 5mph. It's just not that simple.
On that specific count, not really. There's a skate park north end of the Mission, and Stevenson St is a two way road that borders it, but it's narrow enough that you need to drive up on the curb to get two vehicles side by side on the street. Waymo's can't handle that on a regular basis. Being San Francisco and not London, you can just skip that road, but if you find yourself in a Waymo on that street and are unlucky to have other traffic on it, the Waymo will just have to back up the entire street. Hope there's no one behind you as well as in front of you!
Anyway, we'll see how the London rollout goes, but I get the impression London's got a lot more of those kinds of roads.
Another comment mentioned the Philippines as the manifest frontier. SF is not on the same plane of reality in terms of density or narrow streets as PH, I would argue in comparison it does not have both.
This is an alley in Coimbra, Portugal. A couple years ago I stayed at a hotel in this very street and took a cab from the train station. The driver could have stopped in the praça below and told me to walk 15m up. Instead the guy went all the way up then curved through 5-10 alleys like that to drop me off right right in front of my place. At a significant speed as well. It was one of the craziest car rides I've ever experienced.
I live in such an area. The route to my house involves steep topography via small windy streets that are very narrow and effectively one-way due to parked cars.
Human drivers routinely do worse than Waymo, which I take 2 or 3 times a week. Is it perfect? No. Does it handle the situation better than most Lyft or Uber drivers? Yes.
As a bonus: unlike some of those drivers the Waymo doesn't get palpably angry at me for driving the route.
It's clear that lidar is superior to the Tesla cameras only approach, but I wonder if being able to take all that data and be able to better correlate what is being seen visually with what the lidar is showing.
That said, for autonomous driving I'd like to see as many sensor options as possible: lidar, radar, cameras, sonar. Belt and suspenders. I imagine as pricing continues to drop all will be embraced.
Neat! What happens when the simulated data is hallucinated/incorrect?
In the example videos, the Golden Gate bridge with snow shows the bridge as 1 road, with total of 3 lanes. But in reality, it’s a split highway with divider, so 2 sides both have 3 lanes, 6 total lanes.
What happens when the car “learns” to drive on the simulated incorrect 3 lane example? For example will next time it goes on the real GG bridge hug to the rightmost lane?
Ideally it would learn a relationship between the sensor input and the correct actions, even if the sensor input is not realistic for the GG in reality.
No human needs to have seen an elephant standing in the road before to know that you should not drive through an elephant standing in the road. These are not "long tail" events as the waymo says. It's a big object in the road. You have seen that hundreds of thousands of times. Calling that a long tail event is an admission that your model has zero ability to generalize.
It is great being able to generate a much larger universe of possibilities than what they can gather from real world data collection, but I'd be curious to learn how they check that the generated data is a superset of the possibility-space seen in the real world (e.g. confirm that their models closely match what is seen in the real world too)
1. Still hard not to think that this is a huge waste of time as opposed to something that's a little more like a public transport train-ish thing, i.e. integrate with established infrastructure.
2. No seriously, is the filipino driver thing confirmed? It really feels like they're trying to bury that.
"The Filipino driver thing" is simply that there's a manual override ability when this profoundly complex and marvelously novel technology gets trapped in edge cases.
(2) I really don't understand why people are surprised that Waymo has fallbacks? The fact that they had a team ready to take over as necessary was well known. I've seen a bunch of comments about this and it seems like people are confused.
They are not trying to "bury" remote assistance at all. They wrote a white paper about it in 2020 and a blog post about it in 2024.
Anyway you can think it's a waste but they're wasting their money, not yours. If you want a train in your town, go get one. Waymo has only spent, cumulatively, about 4 months of the budgets of American transit agencies. If you had all that money it wouldn't amount to anything.
I am very pro public transit. But there is still a place for cars (ideally mostly taxis). Going to more rural areas or when you need to carry more stuff. I think an ideal society would have both urban transit, inter-city transit and taxis for the other trips and going out into the country.
My view on Waymo and autonomous taxis in general is they will eventually make public transit obsolete. Once there is a robotaxi available to pick up and drop off every passenger directly from a to b, the whole system could be made to be super efficient. It will take time to get there though.
But eventually I think we will get there. Human drivers will be banned, the roads will be exclusively used by autonomous vehicles that are very efficient drivers (we could totally remove stoplights, for example. Only pedestrian crossing signs would be needed. Robo-vehicles could plug into a city-wide network that optimizes the routing of every vehicle.) At that point, public transit becomes subsidized robotaxi rides. Why take a subway when a car can take you door to door with an optimized route?
So in terms of why it isn’t a waste of time, it’s a step along the path towards this vision. We can’t flip a switch and make this tech exist, it will happen in gradual steps.
Automated taxis would still be stuck in traffic. Automation gets couple times in capacity, but the induced demand and extra cars looking for rides and parking will mean traffic.
Automation makes public transit better. There will be automated minibuses that are more flexible and frequent than today's buses. Automation also means that buses get a virtual bus lane. Taxis solve the last mile problem, by taking taxi to the station, riding train with thousands of people, and then taking more transit.
Also, we might discover the advantage of human powered transit. Ebikes are more efficient than cars and give health benefits. They will be much safer than automated cars. Could use the extra capacity for bike and bus lanes.
> Human drivers will be banned, the roads will be exclusively used by autonomous vehicles
I basically agree with your premise that public transit as it exists today will be rendered obsolete, but I think this point here is where your prediction hits a wall. I would be stunned if we agreed to eliminate human drivers from the road in my lifetime, or the lifetime of anyone alive today. Waymo is amazing, but still just at the beginning of the long tail.
> Once there is a robotaxi available to pick up and drop off every passenger directly from a to b, the whole system could be made to be super efficient.
Fundamentally impossible. You're moving some 2 tons of mass in a 2x5m box on polluting rubber tires to move a single 100kg human.
I can always take whatever efficiency gain you've thought up and simply make the vehicle bigger, decreasing the cost and space used per passenger, and maybe even put it on rails, making it less polluting, and more energy efficient.
You can't engineer your way out of the laws of physics.
In high density regions, vehicles on surface roads can’t meet the passenger demand required. Even if you banned human drivers, the other human users introduce too much variability and delay (passengers loading and unloading, errant objects, cyclists and pedestrians, etc). Roll a dumpster in the street, and have a couple of jaywalkers, and the entire system crawls to a stop.
Controlled access is required to get even medium-high throughput. But these systems already exist, they are called personal rapid transit systems.
So the solution to autonomous driving will be taking all the data from Google Street View, build a world model for training, and in the end you have an AI model basically remembering every street on planet earth?
It seems inevitable that they'll soon be used as the starting points for developing almost all video game environments.
Not for the rendering (that's still way too expensive), but for the initial world generation that gets iteratively refined and then still ultimately gets converted into textured triangles.
Very impressive work from Waymo. The driving with a tornado in the horizon example kind of struck my imagination, many people actually panic in such scenarios. I wonder though the compute requirements to run these simulations and producing so many data points.
I don't get how this solves the problem of edge cases with self driving
Even if you can generate simulated training data, don't you still have the problem where you don't even know what the edge cases you need to simulate are in the first place?
Well it certainly helps,doesn't it? This system is going to encounter more edge cases than a single human ever would. Hopefully the lessons from known unknowns generqlise to unknowns. And once they've been seen once they took can become part of the corpus.
Similar to Kiva Systems which was Amazon's best acquisition, Waymo is simply Google's best acquisition. (We live in San Francisco and it feels much safer around these Waymo cars than average "drivers".)
It's easier to build trust for such a safety-critical service when you're more open about how it works an performs. For the complete opposite approach, see Tesla.
Given the announcement from a few days ago of google trying to get external investment, this is their follow up, showing what that investment is good for. Also, it’s pretty light on details that are of much use to competitors. “We made an accurate simulation system to test our system in before deployment” would be pretty mundane if you were talking about any other field of engineering.
Under the same circumstances (kid suddenly emerging between two parked cars and running out onto the street), it could be debated that the outcome could have been worse if a human were driving.
I don't know about the remote driver conspiracy, but waymo slowing down and that kid surviving a crash after jumping on the road from behind a tall vehicle was the best PR waymo could have asked for.
I would love to see more visibility into how this model’s simulation fidelity maps onto measurable safety improvements on public roads, especially in unusual edge conditions like partial sensor occlusion or atypical weather.
One interesting thing from this paper is how big of a LiDaR shadow there is around the waymo car which suggests they rely on cameras for anything close (maybe they have radar too?). Seems LiDaR is only useful for distant objects.
How large is the controlled area around the car? And how high do they look for objects? Like something is falling from a bridge, a falling pole or more extreme an falling plane.
I'm a little sad that they talk about counterfactuals in the simulations, but then don't show any examples of even a single sharknado or giant loop-de-loop.
Seems interesting, but why is it broken. Waymo repeatedly directed multiple automated vehicles into the private alley off of 5th near Brannan in SF even after being told none of them have any business there ever, period. If they can sense the weather and stuff then maybe they could put out a virtual sign or fence that notes what appears to be a road is neither a through way nor open to the public? I'm really bullish on automated driving long term, but now that vehicles are present for real we need to start to think about potentially getting serious about finding some way to get them to comply with the same laws that limit what people can do.
I doubt Waymo would publicly talk about this if it did happen.
I also doubt the IP is worth that much. Most of the secret sauce to starting a competitor probably isn't an end model tuned for a specific configuration of a car but the ability to produce end models, which wouldn't be stealable from the car.
The amount to which supposedly engineering based rationales are applied to Elon's decisions is cult like behavior. Here's what's really going on:
That thing I called full self driving just to troll government regulators? Just make it work and don't complain about what I called it.
That rocket that's way too big for orbital payloads but can't go beyond orbit without sending 20 more rockets full of fuel? Just make it work. Occupy Mars!
The two car models that work and have a halo effect on the rest of the product line? I'mma cancel that! Just make the stupid truck we're keeping work!
We've all had a boss like that. I'm sure they salute each other.
been playing around with world models for sim-to-real transfer lately. the waymo approach looks solid, but curious how you're handling the distribution shift between generated scenes and real sensor data. any tricks for that besides the usual domain randomization?
This is cool, but they are still not going about it the right way.
Its much easier to build everything into the compressed latent space of physical objects and how they move, and operate from there.
Everyone jumped on the end-2-end bandwagon, which then locks you into the input to your driving model being vision, which means that you have to have things like genie to generate vision data, which is wasteful.
I posted this before, but Ill post again - this is one of the few things I feel confident enough to say that most people in the space are doing wrong. You can save my post and reference it when we actually get full self driving (i.e you can take a nap in the backseat while your car drives you), because its going to be implemented pretty much like this:
Humans don't drive well because we map vision policy to actions. We drive well (an in general, manipulate physical objects well), because we can do simulations inside our head to predict what the outcome will be. We aren't burdened by our inability to recognize certain things - when something is in the road, no matter what it is, we auto predict that we would likely collide with that thing because we understand the concept of 3d space and moving within it, and take appropriate action. Sure, there is some level of direct mapping as many people can drive while "spaced out", but attentive driving involves mostly the above.
The self driving system that can actually self drive needs to do the same. When you have this, you will no longer need to do things like simulate driving conditions in a computationally expensive sim. You aren't going to be concerned with training model on edge cases. All you would need to to ensure that your sensor processing results in a 3d representation of the driving conditions, and the model will then be able to do what humans do and explore a latent space of things it can do and predict outcomes then chose the best one.
You want proof? It exists in the form of Mu Zero, and it worked amazingly well. And driving can be easily reformated as a game that the engine plays in a simulator that doesn't involve vision, and learns both the available moves and also the optimal policy.
The reason everyone is doing end to end today is because they are basically trying to catch up to Tesla, and from a business perspective, nobody is willing to put money and pay smart enough people to research this, especially because there is also a legal bridge to cross when it comes to proving that the system can self drive while you napping. But nevertheless, if you ever want self driving, this is the right approach.
Meanwhile, Google who came up with Mu Zero, is now doing more advanced robotic stuff than anyone out there.
Whenever something like this comes out, it's a good moment to find people with no critical thinking skills who can safely be ignored. Driving a waymo like an RC car from the philippines? you can barely talk over zoom with someone in the philippines without bitrate and lag issues.
I haven't read anything about this but I would also suppose long distance human intervention cannot be done for truly critical situations where you need a very quick reaction, whereas it would be more appropriate in situations where the car has stopped and is stuck not knowing what to do. Probably just stating the obvious here but indeed this seems like something very different from an RC car kind of situation.
The word "loop" here has multiple meanings. Only one is what you mean and the other person responding to you has understood another.
The first is the DDT control loop, what a human driver does. Waymo's remote assistants aren't involved in that. The computer always has responsibility for the safety of the vehicle and decisionmaking while operating, which is why Waymo's humans are remote assistants and not remote drivers. Their safety drivers do participate in the DDT loop, hence the name.
But there's also another "loop" of human involvement. Sometimes the vehicle doesn't understand the scene and asks humans for advice about the appropriate action to take. It's vaguely similar to captchas. The human will usually confirm the computer's proposed actions, but they can also suggest different actions. The computer the advice as a prior to continue operating instead of giving up the DDT responsibility. There's very likely a closely monitored SLA between a few seconds to a few minutes on how long it takes humans to start looking at the scene.
If something causes the computer to believe the advice isn't safe, it will ignore it. There have been cases where Waymos have erroneously detected collisions and remote assistants were unable to override that decisionmaking. When that happens, a vehicle recovery team is physically sent out to the location. The SLA here is likely between tens of minutes and a couple hours.
Interesting question. If the Waymo was driving aggressively to remove us from the situation but relatively safely I might stay in it.
This does bring up something, though: Waymo has a "pull over" feature, but it's hidden behind a couple of touch screen actions involving small virtual buttons and it does not pull over immediately. Instead, it "finds a spot to pull over". I would very much like a big red STOP IMMEDIATELY button in these vehicles.
>it's hidden behind a couple of touch screen actions involving small virtual buttons and it does not pull over immediately
It was on the home screen when I've taken it, and when I tested it, it seemed to pull to the first safe place. I don't trust the general pubic with a stop button.
Can you not just unlock and open the door? Wouldn't that cause it to immediately stop? Or can you not unlock the door manually? I'd be surprised if there was not an emergency door release.
The model generates camera and Lidar data. As if it was a Waymo car that drove through the simulated scenario with its cameras running. This synthetic training data can then be used to train the driving models.
Wonder how it'll do. The trees change shape (presumably the Lidar patterns do too). I get the premise/why but it seems odd to me (armchair) to use fake data. Real trees don't change shape (in real time) although it can be windy.
It probably doesn't matter though, "this general blob over there"
What if we put this mechanism of recording the world on people. We have mics listening to people talking to us and noises we hear.
Also we record body position actuation and self speech. As output then we put this on thousands of people to get as much data as Waymo gets.
I mean that’s what we need to imitate agi right? I guess the only thing missing is the memory mechanism. We train everything as if it’s an input and output function without accounting for memory.
What's going to happen to all the millions of drivers who will lose their job overnight? In a country with 100 million guns, are we really sure we've thought this through?
Autonomous private cars is not the technological progress you think it is. We’ve had autonomous trains for decades, and while it provides us with a more efficient and cost effective public transit system, it didn’t open the doors for the next revolutionary technology.
Self driving cars is a dead end technology, that will introduce a whole host of new problems which are already solved with public transit, better urban planning, etc.
I don't think Uber goes out of business. There is probably a sweet spot for Waymo's steady state cars, and you STILL might want 'surge' capabilities for part time workers who can repurpose their cars to make a little extra money here and there.
> What's going to happen to all the millions of drivers who will lose their job overnight? In a country with 100 million guns, are we really sure we've thought this through?
People keep referencing history but this really is unprecedented. We are approaching singularity and many people will become obsolete in all areas. There are no new hypothetical jobs waiting on the horizon.
As to the revolt, America doesn't do that any more. Years of education have removed both the vim and vigor of our souls. People will complain. They will do a TikTok dance as protest. Some will go into the streets. No meaningful uprising will occur.
The poor and the affected will be told to go to the trades. That's the new learn to program. Our tech overlords will have their media tell us that everything is ok (packaging it appropriately for the specific side of the aisle).
Ultimately the US will go down hill to become a Belgium. Not terrible, but not a world dominating, hand cutting entity it once was.
> Ultimately the US will go down hill to become a Belgium.
I'm curious why you say this given you start by highlighting several characteristics that are not like Belgium (to wit, poor education, political media capture, effective oligarchy). I feel there are several other nations that may be better comparators, just want to understand your selection.
Can you explain? I lived in PH, and my guess is that you mean navigating and modeling the unending and constantly changing chaos of the street systems (and lack thereof) is going to be a monumental task which I completely agree with. It would be an impressive feat if possible.
Software doesn’t get confused - it fails. Referring to your software as autonomous when you have to staff a 24/7 response center of humans to control it is not just misleading, it’s a lie.
This is not false, but gives the wrong idea that foreigners are driving them in real time.
> After being pressed for a breakdown on where these overseas operators operate, Peña said he didn’t have those stats, explaining that some operators live in the US, but others live much further away, including in the Philippines.
> “They provide guidance,” he argued. “They do not remotely drive the vehicles. Waymo asks for guidance in certain situations and gets an input, but the Waymo vehicle is always in charge of the dynamic driving tasks, so that is just one additional input.”
“When the Waymo vehicle encounters a particular situation on the road, the autonomous driver can reach out to a human fleet response agent for additional information to contextualize its environment,” the post reads. “The Waymo Driver [software] does not rely solely on the inputs it receives from the fleet response agent and it is in control of the vehicle at all times.” [from Waymo's own blog https://waymo.com/blog/2024/05/fleet-response/]
In my opinion there's nothing wrong with it per se, but (a) it's still worth mentioning, because most people have the impression that Waymo cars are completely unassisted, and (b) it makes me wonder how feasible Waymo's operations would be if it weren't for global income inequality.
Have you read the article ? The guys in the Philippines are providing high level executive indications, they don't drive remotely the car or have any low level control of the car.
My understanding is that support is basically playing an RTS (point and click), not a 1P driving game. Which makes sense, if they were directly controlling the vehicles they'd put support in central America for better latency, like the food delivery bot drivers
This isn't news, they've always acknowledged that they have remote navigators that tell the cars what to do when they get stuck or confused. It's just that they don't directly drive the car.
Yeah I have some videos of these drivers in action, I think the sensors are as assistance and but not the whole story, so yeah there’s models lidars etc etc but human factor is there, unfortunately this means we should see soon many cobitics are teleopetated remotely from India and Philippines and the likes to satisfy the greed of these companies to pay peanuts to operate them.
Suddenly all this focus on world models by Deep mind starts to make sense. I've never really thought of Waymo as a robot in the same way as e.g. a Boston Dynamics humanoid, but of course it is a robot of sorts.
Google/Alphabet are so vertically integrated for AI when you think about it. Compare what they're doing - their own power generation , their own silicon, their own data centers, search Gmail YouTube Gemini workspace wallet, billions and billions of Android and Chromebook users, their ads everywhere, their browser everywhere, waymo, probably buy back Boston dynamics soon enough (they're recently partnered together), fusion research, drugs discovery.... and then look at ChatGPT's chatbot or grok's porn. Pales in comparison.
Google has been doing more R&D and internal deployment of AI and less trying to sell it as a product. IMHO that difference in focus makes a huge difference. I used to think their early work on self-driving cars was primarily to support Street View in thier maps.
There was a point in time when basically every well known AI researcher worked at Google. They have been at the forefront of AI research and investing heavily for longer than anybody.
It’s kind of crazy that they have been slow to create real products and competitive large scale models from their research.
But they are in full gear now that there is real competition, and it’ll be cool to see what they release over the next few years.
48 replies →
It has always felt to me that the LLM chatbots were a surprise to Google, not LLMs, or machine learning in general.
7 replies →
Google and OpenAI are both taking very big gambles with AI, with an eye towards 2036 not 2026. As are many others, but them in particular.
It'll be interesting to see which pays off and which becomes Quibi
2 replies →
Use your own sh*t is one of the best way to build excellent products.
Tesla built something like this for FSD training, they presented many years ago. I never understood why they did productize it. It would have made a brilliant Maps alternative, which country automatically update from Tesla cars on the road. Could live update with speed cameras and road conditions. Like many things they've fallen behind
No Lidar anymore on the 2026 Volvo models ES60 and EX60. See for example: https://www.jalopnik.com/2032555/volvo-ends-luminar-lidar-20...
11 replies →
Without Lidar + the terrible quality of tesla onboard cameras.. street view would look terrible. The biggest L of elon's career is the weird commitment to no-lidar. If you've ever driven a Tesla, it gives daily messages "the left side camera is blocked" etc.. cameras+weather don't mix either.
89 replies →
Not really I think, they built a simulation engine for autonomous driving, for which tons of such exist out there including ones from Nvidia and also at least 1 open-source one. Using world models is different.
> Suddenly all this focus on world models by Deep mind starts to make sense
Google's been thinking about world models since at least 2018: https://arxiv.org/abs/1803.10122
FWIW I understood GP to mean that it suddenly makes sense to them, not that there’s been a sudden focus shift at google.
Maybe they were focusing on a real world use that basically requires AI, but not LLMs.
Tesla claimed that all their "real world" recording would give them a moat on FSD.
Waymo is showing that a) you need to be able to incorporate stuff that isn't "real" when training, and b) you get a lot more information from alternate sensors to visible spectrum only.
I just listened to a fantastic multi-hour Acquired (https://www.acquired.fm/) podcast episode on Google and AI that talks about the history of Google and AI and all the ways they have been using it since 2012. It's really fascinating. You can forgive them for not focusing on Reader or any of their other properties when you realize they were pulling in hundreds of billions of dollars of value by making big bets in AI and incorporating it into their core business.
Grok/xAI is a joke at this point. A true money pit without any hopes for a serious revenue stream.
They should be bought by a rocket company. Then they would stand a chance.
I always understood this to be why Tesla started working on humanoid robots
They started working on humanoid robots because Musk always has to have the next moonshot, trillion-dollar idea to promise "in 3 years" to keep the stock price high.
As soon as Waymo's massive robotaxi lead became undeniable, he pivoted to from robotaxis to humanoid robots.
1 reply →
Pretty much. They banked on "if we can solve FSD, we can partially solve humanoid robot autonomy, because both are robots operating in poorly structured real world environments".
I don't want a humanoid robot. I want a purpose built robot.
6 replies →
The drop in demand for Tesla's clapped out model range would have meant embarrassing factory closures, so now they're being closed to start manufacturing a completely different product. Bait and switch for Tesla investors.
I wonder how long they'll be closed for "modifications" and whether the Optimus Prime robot factories will go into production before the "Trump Kennedy Center" is reopened after its "renovations".
It's so they can stick a Tesla logo on a bunch of chinese tech and call it innovation.
So is this a model baked into the VLLM layer? Or a scaffold that the agent sits in for testing?
If the former then it’s relevant to the broader discourse on LLM generality. If the latter, then it seems less relevant to chatbots and business agents.
Edit to add: this is not part of the model, it’s in a separate pillar (Simulator vs Driver). More at https://waymo.com/blog/2025/12/demonstrably-safe-ai-for-auto....
>> Suddenly all this focus on world models by Deep mind starts to make sense.
The apparent applicability to Waymo is incidental, more likely because a few millions+ were spent on Genie and they have to do something with it. DeepMind started to train "world models" because that's the current overhyped buzzword in the industry. First it was "natural language understanding" and "question answering" back in the days of old BERT, then it was "agentic", then "reasoning", now it's "world models", next years it's going to be "emotions" or "social intelligence" or some other anthropomorphic, over-drawn neologism. If you follow a few AI accounts on social media you really can't miss when those things suddenly start trending, then pretty much die out and only a few stragglers still try to publish papers on them because they failed to get the memo that we're now all running behind the Next Big Thing™.
notice that all these buzzwords you give actually correspond to real advances in the field. All of these were improvements on something existing, not a big revolution for sure, but definitely measurable improvements.
3 replies →
Also known as a monopoly. This should terrify us all.
No, it's known as vertical integration, which is legally permitted by default.
Monopolies are essentially 100% horizontal integration. Vertical integration is a completely different concept.
So for the record, with this realization you're 3+ years behind Tesla.
https://www.youtube.com/watch?v=ODSJsviD_SU&t=3594s
Practically ALL course introductory materials that regard robotics and AI that I've seen began with "you might imagine a talking bipedal humanoid when you hear the word `robot`, but perhaps the most commonplace robot that you have seen is a vending machine", with the illustration of a typical 80s-90s outdoor soda vendor with no apparent moving parts.
So "maybe cars are a bit of robots too" is more like 30-50 years behind the time.
Aren't they still using safety drivers or safety follow cars and in fewer cities? Seems Tesla is pretty far behind.
10 replies →
What an upsetting comment. I'm glad you came around but what did you think was going to be effective before you came around to world models?
Which is why it's embarrassing how much worse Gemini is at searching the web for grounding information, and how incredibly bad gemini cli is.
Not my experience in either of those areas.
Internal firewalls and poor management means that the vast majority of integration opportunities are missed.
The flywheel is starting to spin......
> I've never really thought of Waymo as a robot in the same way as e.g. a Boston Dynamics humanoid, but of course it is a robot of sorts.
I view Tesla also more as a robot company than anything else.
[dead]
"Waymo as a robot in the same way"
Erm, a dishwasher, washing machine, automated vacuum can be considered robots. Im confused as to this obsession of the term - there are many robots that already exist. Robotics have been involved in the production of cars for decades.
......
I think the (gray) line is the degree of autonomy. My washing machine makes very small, predictable decisions, while a Waymo has to manage uncertainty most of the time.
5 replies →
It's a 3500lb robot that can kill you.
Boston Robotics is working on a smaller robot that can kill you.
Anduril is working on even smaller robots that can kill you.
The future sucks.
and they're all controlled by (poorly compensated) humans anyway [1] [2]
[1] https://www.wsj.com/tech/personal-tech/i-tried-the-robot-tha...
[2] https://futurism.com/advanced-transport/waymos-controlled-wo...
2 replies →
>or grok's porn
I know it’s gross, but I would not discount this. Remember why Blu-ray won over HDDVD? I know it won for many other technical reasons, but I think there are a few historical examples of sexual content being a big competitive advantage.
The vertical integration argument should apply to Grok. They have Tesla driving data (probably much more data than Waymo), Twitter data, plus Tesla/SpaceX manufacturing data. When/if Optimus starts on the production line, they'll have that data too. You could argue they haven't figured out how to take advantage of it, but the potential is definitely there.
Agreed. Should they achieve Google level integration, we will all make sure they are featured in our commentary. Their true potential is surely just around the corner...
"Tesla has more data than Waymo" is some of the lamest cope ever. Tesla does not have more video than Google! That's crazy! People who repeat this are crazy! If there was a massive flow of video from Tesla cars to Tesla HQ that would have observable side effects.
1 reply →
But somehow google fails to execute. Gemini is useless for programming and I don’t think even bother to use it as chat app. Claude code + gpt 5.2 xhigh for coding and gpt as chat app are really the only ones that are worth it(price and time wise)
I've recently switched to Claude for chat. GPT 5.2 feels very engagement-maxxed for me, like I'm reading a bad LinkedIn post. Claude does a tiny bit of this too, but an order of magnitude less in my experience. I never thought I'd switch from ChatGPT, but there is only so much "here's the brutal truth, it's not x it's y" I can take.
4 replies →
Gemini is by far the best UI/UX designer model. Codex seems to the worst: it'll build something awkward and ugly, then Gemini will take 30-60 seconds to make it look like something that would have won a design award a couple years ago.
Gemini works well enough in Search and in Meet. And it's baked into the products so it's dead simple to use.
I don't think Google is targeting developers with their AI, they are targeting their product's users.
It is a bit mind boggling how behind they were considering they invented transformers and were also sitting on the best set of training data in the world, but they've caught up quite a bit. They still lag behind in coding, but I've found Gemini to be pretty good at more general knowledge tasks. Flash 3 in particular is much better than anything of comparable price and speed from OpenAI or Anthropic.
Yesterday GPT 5.2 wrote a python function for me that had the import in the middle of the code, for no reason. (It was a simple import of requests module in a REST client...) Claude I agree is a lot better for backend,Gemini is very good for frontend
> The Waymo World Model can convert those kinds of videos, or any taken with a regular camera, into a multimodal simulation—showing how the Waymo Driver would see that exact scene.
Subtle brag that Waymo could drive in camera-only mode if they chose to. They've stated as much previously, but that doesn't seem widely known.
I think I'm misunderstanding - they're converting video into their representation which was bootstrapped with LIDAR, video and other sensors. I feel you're alluding to Tesla, but Tesla could never have this outcome since they never had a LIDAR phase.
(edit - I'm referring to deployed Tesla vehicles, I don't know what their research fleet comprises, but other commenters explain that this fleet does collect LIDAR)
They can and they do.
https://youtu.be/LFh9GAzHg1c?t=872
They've also built it into a full neural simulator.
https://youtu.be/LFh9GAzHg1c?t=1063
I think what we are seeing is that they both converged on the correct approach, one of them decided to talk about it, and it triggered disclosure all around since nobody wants to be seen as lagging.
4 replies →
Tesla does collect LIDAR data (people have seen them doing it, it's just not on all of the cars) and they do generate depth maps from sensor data, but from the examples I've seen it is much lower resolution than these Waymo examples.
2 replies →
The purpose of lidar is to prove error correction when you need it most in terms of camera accuracy loss.
Humans do this, just in the sense of depth perception with both eyes.
Human depth perception uses stereo out to only about 2 or 3 meters, after which the distance between your eyes is not a useful baseline. Beyond 3m we use context clues and depth from motion when available.
11 replies →
(Always worth noting, human depth perception is not just based on stereoscopic vision, but also with focal distance, which is why so many people get simulator sickness from stereoscopic 3d VR)
7 replies →
> Humans do this, just in the sense of depth perception with both eyes.
Humans do this with vibes and instincts, not just depth perception. When I can't see the lines on the road because there's too much slow, I can still interpret where they would be based on my familiarity with the roads and my implicit knowledge of how roads work, e.g. We do similar things for heavy rain or fog, although, sometimes those situations truly necessitate pulling over or slowing down and turning on your 4s - lidar might genuinely given an advantage there.
3 replies →
Another way humans perceive depth is by moving our heads and perceiving parallax.
How expensive is their lidar system?
11 replies →
I think there are two steps here: converting video to sensor data input, and using that sensor data to drive. Only the second step will be handled by cars on road, first one is purely for training.
That is still important for safety reasons in case someone uses a LiDAR jamming system to try to force you into an accident.
It’s way easier to “jam” a camera with bright light than a lidar, which uses both narrow band optical filters and pulsed signals with filters to detect that temporal sequence. If I were an adversary, going after cameras is way way easier.
1 reply →
If somebody wants to hurt you while you are traveling in a car, there are simpler ways.
Autonomous cars need to be significantly better than humans to be fully accepted especially when an accident does happen. Hence limiting yourself to only cameras is futile.
Surely as soon as they're safer than humans they should be deployed as fast as possible to save some of the 3000 people who are killed by human drivers every day
1 reply →
They may be trying to suggest that, that claim does not follow from the quoted statement.
I've always wondered... if Lidar + Cameras is always making the right decision, you should theoretically be able to take the output of the Lidar + Cameras model and use it as training data for a Camera only model.
That's exactly what Tesla is doing with their validation vehicles, the ones with Lidar towers on top. They establish the "ground truth" from Lidar and use that to train and/or test the vision model. Presumably more "test", since they've most often been seen in Robotaxi service expansion areas shortly before fleet deployment.
4 replies →
> you should theoretically be able to take the output of the Lidar + Cameras model and use it as training data for a Camera only model.
Why should you be able to do that exactly? Human vision is frequently tricked by it's lack of depth data.
1 reply →
No, I don't think that will be successful. Consider a day where the temperature and humidity is just right to make tail pipe exhaust form dense fog clouds. That will be opaque or nearly so to a camera, transparent to a radar, and I would assume something in between to a lidar. Multi-modal sensor fusion is always going to be more reliable at classifying some kinds of challenging scene segments. It doesn't take long to imagine many other scenarios where fusing the returns of multiple sensors is going to greatly increase classification accuracy.
3 replies →
Sure, but those models would never have online access to information only provided in lidar data…
1 reply →
By leveraging Genie’s immense world knowledge, it can simulate exceedingly rare events—from a tornado to a casual encounter with an elephant—that are almost impossible to capture at scale in reality. The model’s architecture offers high controllability, allowing our engineers to modify simulations with simple language prompts, driving inputs, and scene layouts. Notably, the Waymo World Model generates high-fidelity, multi-sensor outputs that include both camera and lidar data.
How do you know the generated outputs are correct? Especially for unusual circumstances?
Say the scenario is a patch of road is densely covered with 5 mm ball bearings. I'm sure the model will happily spit out numbers, but are they reasonable? How do we know they are reasonable? Even if the prediction is ok, how do we fundamentally know that the prediction for 4 mm ball bearings won't be completely wrong?
There seems to be a lot of critical information missing.
The idea is that, over time, the quality and accuracy of world-model outputs will improve. That, in turn, lets autonomous driving systems train on a large amount of “realistic enough” synthetic data.
For example, we know from experience that Waymo is currently good enough to drive in San Francisco. We don’t yet trust it in more complex environments like dense European cities or Southeast Asian “hell roads.” Running the stack against world models can give a big head start in understanding what works, and which situations are harder, without putting any humans in harm’s way.
We don’t need perfect accuracy from the world model to get real value. And, as usual, the more we use and validate these models, the more we can improve them; creating a virtuous cycle.
It's a pareto principal.
You can get 80% of the way to "perfect" with 20% of the effort.
2 replies →
I don't think you say "ok now the car is ball bearing proof."
Think of it more like unit tests. "In this synthetic scenario does the car stop as expected, does it continue as expected." You might hit some false negatives but there isn't a downside to that.
If it turns out your model has a blind spot for albino cows in a snow storm eating marshmallows, you might be able to catch that synthetically and spend some extra effort to prevent it.
Looks like they need to blackouts and parades to that simulator...
https://www.yahoo.com/news/articles/waymo-paralyzed-parade-b...
The blackouts circumstance was because they escalate blinking/out of service traffic lights to a human confirmed decision, and they experienced a bottleneck spike in those requests for how little they were staffed. The Waymo itself was fine and was prepared to make the correct decision, it just needed a human in the loop.
In the video from the parade... there's just... people in the road. Like, a lot of small children and actual people on this tiny, super narrow bridge. I think that erring on the side of "don't think you can make it but accidentally drag a small child instead" is probably the right call, though admittedly, these cases are a bit wonky.
2 replies →
Isn't that true for any scenario previously unencountered, whether it is a digital simulation or a human? We can't optimize for the best possible outcome in reality (since we can't predict the future), but we can optimize for making the best decisions given our knowledge of the world (even if it is imperfect).
In other words it is a gradient from "my current prediction" to "best prediction given my imperfect knowledge" to "best prediction with perfect knowledge", and you can improve the outcome by shrinking the gap between 1&2 or shrinking the gap between 2&3 (or both)
seems like the obvious answer to that is you cover a patch of road with 5mm ball bearings, and send a waymo to drive across it. if the ball bearings behave the way the simulation says they would, and the car behaves the way the simulation said it would, then you've validated your simulation.
do that for enough different scenarios, and if the model is consistently accurate across every scenario you validate, then you can start believing that it will also be accurate for the scenarios you haven't (and can't) validate.
> from a tornado to a casual encounter with an elephant
A sims style game with this technology will be pretty nice!
You could train it in simulation and then test it in reality.
Would it actually be a good idea to operate a car near an active tornado?
4 replies →
>> How do you know the generated outputs are correct? Especially for unusual circumstances?
You know the outputs are correct because the models have many billions of parameters and were trained on many years of video on many hectares of server farms. Of course they'll generate correct outputs!
I mean that's literally the justification. There aren't even any benchmarks that you can beat with video generation, not even any bollocks ones like for LLMs.
They probably just look at the results of the generation.
I mean would I like a in-depth tour of this? Yes.
But it's a marketing blog article, what do you expect?
> just look at the results of the generation
And? The entire hallucination problem with text generators is "plausible sounding yet incorrect", so how does a human eyeballing it help at all?
2 replies →
All this work is impressive, but I'd rather have better trains
As someone who lives in the Bay Area we already have trains, and they're literally past the point of bankruptcy because they (1) don't actually charge enough maintain the variable cost of operations, (2) don't actually make people pay at all, and (3) don't actually enforce any quality of life concerns short of breaking up literal fights. All of this creates negative synergies that pushes a huge, mostly silent segment of the potential ridership away from these systems.
So many people advocate for public transit, but are unwilling to deal with the current market tradeoffs and decisions people are making on the ground. As long as that keeps happening, expect modes of transit -- like Waymo -- that deliver the level of service that they promise to keep exceeding expectations.
I've spent my entire adult life advocating for transportation alternatives, and at every turn in America, the vast majority of other transit advocates just expect people to be okay with anti-social behavior going completely unenforced, and expecting "good citizens" to keep paying when the expected value for any rational person is to engage in freeloading. Then they point to "enforcing the fare box" as a tradeoff between money to collect vs cost of enforcement, when the actually tradeoff is the signalling to every anti-social actor in the system that they can do whatever they want without any consequences.
I currently only see a future in bike-share, because it's the only system that actually delivers on what it promises.
> they (1) don't actually charge enough maintain the variable cost of operations
Why do you expect them to make money? Roads don't make money and no one thinks to complain about that. One of the purposes of government is to make investment in things that have more nebulous returns. Moving more people to public transit makes better cities, healthier and happier citizens, stronger communities, and lets us save money on road infrastructure.
9 replies →
You're definitely right on (2) and (3). I've used many transit systems across the world (including TransMilenio in Bogota and other latam countries "renowned" for crime) and I have never felt as unsafe as I have using transit in the SFBA. Even standing at bus stops draws a lot of attention from people suffering with serious addiction/mental health problems.
1) is a bit simplistic though. I don't know of any European system that would cover even operating costs out of fare/commercial revenue. Potentially the London Underground - but not London buses. UK National Rail had higher success rates
The better way to look at it imo is looking at the economic loss as well of congestion/abandoned commutes. To do a ridiculous hypothetical, London would collapse entirely if it didn't have transit. Perhaps 30-40% of inner london could commute by car (or walk/bike), so the economic benefit of that variable transit cost is in the hundreds of billions a year (compared to a small subsidy).
It's not the same in SFBA so I guess it's far easier to just "write off" transit like that, it is theoretically possible (though you'd probably get some quite extreme additional congestion on the freeways as even that small % moving to cars would have an outsized impact on additional congestion).
3 replies →
As a fellow public transit fan, you're on the money. Even the shining stars of transit in the US --- NYC MTA subway and CTA --- have huge qualuty of life issues. I can't fault someone for not wanting to ride trains ever again when someone who hasn't showered in 41 years pulls up with a cart full of whatever the fuck and decides to squat the corner seat closest to the car door and be a living biological weapon during rush hour. Or "showtime."
That's before you consider how it takes 2-4x as long to get somewhere by public transit outside of peak hours and/or well-covered areas. A 20 minute trip from a bar in Queens to Brooklyn by car takes an hour by train after 2300, not including walking time. I made that trip many, many times, and hated it each time.
4 replies →
Well then invest in those things, then. It would probably cost less than the amount they're spending to make a Waymo World Model.
12 replies →
It's worth noting that, at least for bart, the reason that it is facing bankruptcy is precisely because it was mostly rider supported and profitable, and not government supported.
When ridership plummeted by >50% during the pandemic, fixed costs stayed the same, but income dropped. Last time I checked, if Bart ridership returned to 2019 levels, with no other changes, it would be profitable again.
9 replies →
Maybe not BART but the new Caltrain electrification program seems to be a success and ridership and revenue are up
1 reply →
over the long term, this is solved with a wealth tax, but undoing what rich ppl have done to society (i.e. making lots of poor people) will unfortunately take many, many years; so many years that it will never actually happen
2 replies →
Very few transit agencies have fares that cover services. I know others said this, but I wanted to add my take as well
1 reply →
Trains work in every city in Europe and Asia.
Trains need well behaved people, otherwise they are shit.
I don't want to hear tiktok or full volume soap operas blasting at some deaf mouth breather.
I don't want to be near loud chewing of smelly leftovers.
I don't want to be begged for money, or interact with high or psychotic people.
The current culture doesn't allow enforcement of social behaviour: so public transport will always be a miserable containment vessel for the least functional, and everyone with sense avoids the whole thing.
> everyone with sense avoids the whole thing
Or the majority of the residents of New York City on their daily commute? I like to think I have sense, and I happily use public transport most days. I prefer it to sitting in traffic, isolated in a car. At least I can read a book. If you work too hard to insulate yourself from the world, the spaces you'll feel comfortable in will get more and more narrow. I think that's a bad thing.
4 replies →
> some deaf mouth breather
I quite agree with the overall point but can we leave this kind of discourse on X, please? It doesn't add much, it just feels caustic for effect and engagement farming.
Roads (cars) need well behave people too. The only way cars filter some of the out is by the price.
4 replies →
No matter what, people are going to still use cars because they are an absolute advantage over public transportation for certain use cases. It is better that the existing status quo is improved to reduce death rates, than hope for a much larger scale change in infrastructure (when we have already seen that attempts at infrastructure overhaul in the US, like high-speed rail, is just an infinitely deep money pit)
Even though the train system in Japan is 10x better than the US as a whole, the per-capita vehicle ownership rate in Japan is not much lower than the US (779 per 1000 vs 670 per 1000). It would be a pipe dream for American trains/subways to be as good as Japan, but even a change that significant would lead to a vehicle ownership share reduced by only about 13%.
Isn't a vehicle that goes from anywhere to anywhere on your own schedule, safely, privately, cleanly, and without billions in subsidies better?
I don't think individual vehicles can ever achieve the same envirnmental economies of scale as trains. Certainly they're far more convenient (especially for short-haul journeys) but I also think they're somewhat alienating, in that they're engineering humans out of the loop completely which contributes to social atomization.
2 replies →
Trains only require subsidies in a world where human & robot cars are subsidized.
As soon as a mode of transport actually has to compete in a market for scarce & valuable land to operate on, trains and other forms of transit (publicly or privately owned) win every time.
>cleanly >without subsidies
Source? The biggest source of environmental issues from EVs, tire wear from a heavier vehicle, absolutely applies to AVs. VC subsidizing low prices only to hike them later isn't exactly "without subsidy" - we pay for it either way
Cars don't work in dense places.
3 replies →
>without billions in subsidies
Is there a magic road wand?
6 replies →
Not necessarily, and your premise is incorrect.
Billions of subsidies? Im confused you talking about cars or trains.
7 replies →
better for the person vs better for the people
sure, a private vehicle is better for me, but a train is better for the world
[dead]
Me too but given our extensive car brain culture, Waymo is an amazing step to getting less drivers & cars off the road, and to further cement future generations not ever needing to drive or own cars
Ski lifts man, ski lifts all over the city
What a glorious utopia we could have
> Ski lifts man, ski lifts all over the city
Don't they have those somewhere in South America?
1 reply →
Pretty much this. Wild that you can traverse most of China in affordable high speed trains, yet the Amtrak from Seattle to Portland barely crawls along and has to regularly stop for long periods of time because the tracks get too hot in the Summer.
I think future generations will resent us for bureaucratizing our way out of the California HSR.
I'd rather be able to go wherever I want.
Enough with the trains. I’m all for trains but theyre good for in city or 1-3 hour journeys. Taking a train across the US would take a day even with high speed trains.
I’d much rather have my own vehicle than share my space with a bunch of people.
The novel aspect here seems to be 3D LiDAR output from 2D video using post-training. As far as I'm aware, no other video world models can do this.
IMO, access to DeepMind and Google infra is a hugely understated advantage Waymo has that no other competitor can replicate.
This is the real story buried under the simulation angle. If you can generate reliable 3D LiDAR from 2D video, every dashcam on earth becomes training data. Every YouTube driving video, every GoPro clip, every security camera feed.
Waymo's fleet is ~700 cars. The internet has millions of hours of driving footage. This technique turns the entire internet into a sensor suite. That's a bigger deal than the simulation itself.
3d from moving 2d images has been a thing for decades.
This is 3D LiDAR output (multimodal) from 2D images.
1 reply →
It's not unheard of, there are a handful [0] of metric monodepth methods that output data that's not unlike a really inaccurate 3D lidar, though theirs certainly looks SOTA.
[0] https://github.com/YvanYin/Metric3D
It’s impressive to see simulation training for floods, tornadoes, and wildfires. But it’s also kind of baffling that a city full of Waymos all seemed to fail simultaneously in San Francisco when the power went out on Dec 22.
A power outage feels like a baseline scenario—orders of magnitude more common than the disasters in this demo. If the system can’t degrade gracefully when traffic lights go dark, what exactly is all that simulation buying us?
All this simulation buys a single vehicle that drives better. That failure was a fleet-wide event (overloading the remote assistance humans).
That is, both are true: this high-fidelity simulation is valuable and it won't catch all failure modes. Or in other words, it's still on Waymo for failing during the power outage, but it's not uniquely on Waymo's simulation team.
[flagged]
They've also been seen driving directly into flood waters, with one driving through the middle of a flooded parking lot.
https://www.reddit.com/r/SelfDrivingCars/comments/1pem9ep/hm...
curious what your take away from that is given the announcement.
cue the bell curve meme for learning autonomy:
Seems like it, no?
We started with physics-based simulators for training policies. Then put them in the real world using modular perception/prediction/planning systems. Once enough data was collected, we went back to making simulators. This time, they're physics "informed" deep learning models.
That's a very interesting way of looking at it. Yes, you start with simulating something simpler than the real world. Then you use the real world. Then you need to go back to simulations for real-world things that are too rare in the real world to train with.
Seems like there ought to be a name for this, like so-and-so's law.
hazrmard's law
1 reply →
Deepmind's Project Genie under the hood (pun intended). Deepmind & Waymo both Alphabet(Google) subsidiaries obv.
https://news.ycombinator.com/item?id=46812933
Regardless of the corporate structure DeepMind is a lot more than just another Alphabet subsidiary at this point considering Demis Hassabis is leading all of Google AI.
Finally I understand the use case for Genie 3. All the talk about "you can make any videogame or movie" seems to have been pure distraction from real uses like this: limited, time-boxed simulated footage.
IIUC, there's a confusion of meaning for "World Model", between Waymo/Deepmind's which is something that can create a consistent world (for use to train Waymo's Driver), vs Yann LeCun/Advanced Machine Intelligence (AMI) which is something that can understand a world.
I don't think there's a conflict. If you can predict the world you understand it.
The "world model" is a convenient fiction. Whether we’re talking about a carbon-based brain or a silicon-based transformer, there is no miniature, objective map of reality tucked away inside. What we mistake for a "model" is actually just the layered residue of experience.
From the perspective of enactivism and radical empiricism, intelligence doesn't "represent" the world; it simply navigates it. A biological organism doesn't need a 3D CAD file of a tree to survive; it only needs a history of sensory-motor contingencies—the "if I move this way, I see that" patterns. It’s a synthesis of interactions, not a library of blueprints.
AI operates on the same logic, albeit through a different medium. It isn't simulating the physical laws of the universe or "understanding" gravity. Instead, it navigates the high-dimensional geometry of human data. It’s a sophisticated engine of association, performing a high-speed synthesis of the patterns we've left behind.
In this view, "knowing" isn't about matching an internal image to an external truth. It is the seamless flow of past inputs into future predictions. There is no world model—only the habit of being.
I'd like to see Waymo have a few of their Drivers do some sim racing training and then compete in some live events. It wouldn't matter much to me if they were fast at all, I'd like to see them go into the rookie classes in various games and see how they avoid crashes from inexperienced players. I believe that it would be the ultimate "shitty drivers vs. AI" test.
Racing and street driving are completely different. Racing involves detailed knowledge of vehicle dynamics and grip. Street driving is mainly obstacle recognition and avoidance. No waymo ever operates anywhere close to the limit of grip, which is where you are all the time when racing.
Sure but accident avoidance in sim racing is basically the ultimate test for any driver.
I also said it wouldn’t matter if they’re fast, I don’t care about driving at the limit of grip here, just avoiding accidents.
Interesting, but it feels like it's going to cope very poorly with actually safety-critical situations. Having a world model that's trained on successful driving data feels like it's going to "launder" a lot of implicit assumptions that would cause a car to get into a crash in real life (e.g. there's probably no examples in the training data where the car is behind a stopped car, and the driver pulls over to another lane and another car comes from behind and crashes into the driver because it didn't check its blindspot). These types of subtle biases are going to make AI-simulated world models a poor fit for training safety systems where failure cannot be represented in the training data, since they basically give models "free reign" to do anything that couldn't be represented in world model training.
You're forgetting that they are also training with real data from the 100+ million miles they've driven on real roads with riders, and using that data to train the world model AI.
> there's probably no examples in the training data where the car is behind a stopped car, and the driver pulls over to another lane and another car comes from behind and crashes into the driver because it didn't check its blindspot
This specific scenario is in the examples: https://videos.ctfassets.net/7ijaobx36mtm/3wK6IWWc8UmhFNUSyy...
It doesn't show the failure mode, it demonstrates the successful crash avoidance.
While there most likely is going to be some bias in the training of those kinds of models, we can also hope that transfer learning from other non-driving videos will at least help generate something close enough to the very real but unusual situations you are mentioning. We could imagine an LLM serving as some kind of fuzzer to create a large variety of prompts for the world model, which as we can see in the article seems pretty capable at generating fictive scenarios when asked to.
As always tho the devil lies in the details: is an LLM based generation pipeline good enough? What even is the definition of "good enough"? Even with good prompts will the world model output something sufficiently close to reality so that it can be used as a good virtual driving environment for further training / testing of autonomous cars? Or do the kind of limitations you mentioned still mean subtle but dangerous imprecisions will slip through and cause too poor data distribution to be a truly viable approach?
My personal feeling is that this we will land somewhere in between: I think approaches like this one will be very useful, but I also don't think the current state of AI models mean we can have something 100% reliable with this.
The question is: is 100% reliability a realistic goal? Human drivers are definitely not 100% reliable. If we come up with a solution 10x more reliable than the best human drivers, that maybe has some also some hard proof that it cannot have certain classes of catastrophic failure modes (probably with verified code based approaches that for instance guarantees that even if the NN output is invalid the car doesn't try to make moves out of a verifiably safe envelope) then I feel like the public and regulators would be much more inclined to authorize full autonomy.
I wonder if they can simulate the Beatles crossing the street at Abbey Road in the late '60s
As a Londoner who used to have to ride up Abbey Road at least once per week there are people on that crossing pretty much all day every day reproducing that picture. So now Waymo are in Beta in London[1] they have only to drive up there and they'll get plenty of footage they could use for taht.
[1] I've seen a couple of them but they're not available to hire yet and are still very rare.
Will Google finally fund Christopher Wren's post great fire "wide streets" rebuild of the City?
2 replies →
The term "world model" seems almost meaningless. This is a world model in the same sense as ChatGPT is a world model. Both have some ability to model aspects of the real world.
It doesn't look like they're going to open sources or anything, but I could imagine this would be great for city planning.
Or the most realistic game of SimCity you could imagine.
Interesting, but I am very sceptical. I'd be interested in seeing actual verified results of how it handles a road with heavy snow, where the only lane references are the wheel tracks of other vehicles, and you can't tell where the road ends and the snow-filled ditch begins.
Very concerned with this direction of training “counterfactual events such as whether the Waymo Driver could have safely driven more confidently instead of yielding in a particular situation.” Seems dicey. This could lead in the direction to a less safe Waymo. Since the counterfactual will be generated, I suspect that that the generations will be biased towards survivor situations where most video footage in its training data will be from environments where people reacted well not those that ended in tragedy. Emboldening Waymo on generated best case data. THIS IS DANGEROUS!!!
Not at all. It's not the counter-factual they're generating, it's the "too rare to capture often enough to train a response to" they're generating.
They're implying that without the model having knowledge, even approximate, of a scene to react to, it simply doesn't react at all; it simply "yields" to the situation until it passes. In my experience taking Waymo's almost daily this holds.
I would rather not have the Waymo yield to a tornado, rising flood-waters, or charging elephant...
Driving is always a balance between speed and safety. If you want ultimate safety you just sit in the driveway. But obviously that isn't useful. So functionally one of the most important things a self-driving system will decide is "how fast is it safe to drive right now". Slower is not always better and it has to balance safety with productivity.
Not entering a roundabout when it's clearly safe to do so is a mark against you at a driving exam. So would be always driving at 5mph. It's just not that simple.
Still needs to be trained on the final boss: dense cities with narrow streets.
San Francisco isn't uniformly dense and narrow, but it does have both, and it's run remarkably well so far.
On that specific count, not really. There's a skate park north end of the Mission, and Stevenson St is a two way road that borders it, but it's narrow enough that you need to drive up on the curb to get two vehicles side by side on the street. Waymo's can't handle that on a regular basis. Being San Francisco and not London, you can just skip that road, but if you find yourself in a Waymo on that street and are unlucky to have other traffic on it, the Waymo will just have to back up the entire street. Hope there's no one behind you as well as in front of you!
Anyway, we'll see how the London rollout goes, but I get the impression London's got a lot more of those kinds of roads.
2 replies →
Another comment mentioned the Philippines as the manifest frontier. SF is not on the same plane of reality in terms of density or narrow streets as PH, I would argue in comparison it does not have both.
This is the craziest I've seen, but it was 10 months ago which is ~10 years in AI years
https://www.youtube.com/watch?v=3DWz1TD-VZg
What would be an example city? Waymo just announced they're ramping up in Boston: https://waymo.com/blog/?modal=short-back-to-boston
"we’re excited to continue effectively adapting to Boston’s cobblestones, narrow alleyways, roundabouts and turnpikes."
Not grandparent but I was rather thinking of medieval city centers in Italy or Spain.
edit: Case in point:
https://maps.app.goo.gl/xxYQWHrzSMES8HPL8
This is an alley in Coimbra, Portugal. A couple years ago I stayed at a hotel in this very street and took a cab from the train station. The driver could have stopped in the praça below and told me to walk 15m up. Instead the guy went all the way up then curved through 5-10 alleys like that to drop me off right right in front of my place. At a significant speed as well. It was one of the craziest car rides I've ever experienced.
1 reply →
Any small city in Italy is going to be 10X more challenging than Boston
3 replies →
Various European cities come to mind: Narrow streets are something of a trope in certain movies/genres.
1 reply →
I live in such an area. The route to my house involves steep topography via small windy streets that are very narrow and effectively one-way due to parked cars.
Human drivers routinely do worse than Waymo, which I take 2 or 3 times a week. Is it perfect? No. Does it handle the situation better than most Lyft or Uber drivers? Yes.
As a bonus: unlike some of those drivers the Waymo doesn't get palpably angry at me for driving the route.
Yes, something like Ho Chi Minh or Mumbai in a peak hour! With lots of bike riders, pedestrians, and livestock at the same roundabout.
Like London? https://www.youtube.com/watch?v=KvctCbVEvwQ
Does it, though? Maybe Dhaka will never get Waymo. The same way you can’t get advanced gene therapy there.
Waymo cars are driving around London right now.
Not taking paying passengers yet though!
They're being trialled in London right now.
Old Delhi is the the final boss.
Napoli
Have been seeing Waymo test vehicles regularly around central London recently, operating at speed.
For shits and giggles, I did stop randomly while crossing the road and acted like a jerk.
The Waymo did, in fact, stop.
Kudos, Waymo
It's clear that lidar is superior to the Tesla cameras only approach, but I wonder if being able to take all that data and be able to better correlate what is being seen visually with what the lidar is showing.
That said, for autonomous driving I'd like to see as many sensor options as possible: lidar, radar, cameras, sonar. Belt and suspenders. I imagine as pricing continues to drop all will be embraced.
Neat! What happens when the simulated data is hallucinated/incorrect?
In the example videos, the Golden Gate bridge with snow shows the bridge as 1 road, with total of 3 lanes. But in reality, it’s a split highway with divider, so 2 sides both have 3 lanes, 6 total lanes.
What happens when the car “learns” to drive on the simulated incorrect 3 lane example? For example will next time it goes on the real GG bridge hug to the rightmost lane?
Ideally it would learn a relationship between the sensor input and the correct actions, even if the sensor input is not realistic for the GG in reality.
No human needs to have seen an elephant standing in the road before to know that you should not drive through an elephant standing in the road. These are not "long tail" events as the waymo says. It's a big object in the road. You have seen that hundreds of thousands of times. Calling that a long tail event is an admission that your model has zero ability to generalize.
So when will multiple Waymo cars communicate input data to one another to avoid the blind spots?
This would give the ability to see things other cars cannot see as well.
It is great being able to generate a much larger universe of possibilities than what they can gather from real world data collection, but I'd be curious to learn how they check that the generated data is a superset of the possibility-space seen in the real world (e.g. confirm that their models closely match what is seen in the real world too)
1. Still hard not to think that this is a huge waste of time as opposed to something that's a little more like a public transport train-ish thing, i.e. integrate with established infrastructure.
2. No seriously, is the filipino driver thing confirmed? It really feels like they're trying to bury that.
"The Filipino driver thing" is simply that there's a manual override ability when this profoundly complex and marvelously novel technology gets trapped in edge cases.
Once it gets unstuck, it runs autonomously.
(2) I really don't understand why people are surprised that Waymo has fallbacks? The fact that they had a team ready to take over as necessary was well known. I've seen a bunch of comments about this and it seems like people are confused.
I think they're surprised to learn it's being done by a bunch of people on the other side of the world because they don't want to pay American wages.
I think you sort of fundamentally misunderstand the whole "steak vs sizzle" thing in capitalism?
The technology "feels" way less cool knowing that there are human backups, which would absolutely in turn make its percieved value go down.
As someone who half-learned to drive in Manila, the idea that they would use Filipino drivers as backups is ironic.
For context, my "driver's test" was going to the back of the office, and driving some old car backwards and forwards a few meters.
2. Yes, a Waymo exec described it in a Congressional hearing.
https://news.ycombinator.com/item?id=46918043
They are not trying to "bury" remote assistance at all. They wrote a white paper about it in 2020 and a blog post about it in 2024.
Anyway you can think it's a waste but they're wasting their money, not yours. If you want a train in your town, go get one. Waymo has only spent, cumulatively, about 4 months of the budgets of American transit agencies. If you had all that money it wouldn't amount to anything.
"At all?"
Oh come on -- of course they are. That's precisely why you put it in a "white paper" and not, you know, ads.
I am very pro public transit. But there is still a place for cars (ideally mostly taxis). Going to more rural areas or when you need to carry more stuff. I think an ideal society would have both urban transit, inter-city transit and taxis for the other trips and going out into the country.
Filipino driver is false. Filipino guidance person is true.
The difference being?
2 replies →
America is not europe, how would public transport work for the last 1/2miles
Walking, bikes and scooters.
2 replies →
My view on Waymo and autonomous taxis in general is they will eventually make public transit obsolete. Once there is a robotaxi available to pick up and drop off every passenger directly from a to b, the whole system could be made to be super efficient. It will take time to get there though.
But eventually I think we will get there. Human drivers will be banned, the roads will be exclusively used by autonomous vehicles that are very efficient drivers (we could totally remove stoplights, for example. Only pedestrian crossing signs would be needed. Robo-vehicles could plug into a city-wide network that optimizes the routing of every vehicle.) At that point, public transit becomes subsidized robotaxi rides. Why take a subway when a car can take you door to door with an optimized route?
So in terms of why it isn’t a waste of time, it’s a step along the path towards this vision. We can’t flip a switch and make this tech exist, it will happen in gradual steps.
Automated taxis would still be stuck in traffic. Automation gets couple times in capacity, but the induced demand and extra cars looking for rides and parking will mean traffic.
Automation makes public transit better. There will be automated minibuses that are more flexible and frequent than today's buses. Automation also means that buses get a virtual bus lane. Taxis solve the last mile problem, by taking taxi to the station, riding train with thousands of people, and then taking more transit.
Also, we might discover the advantage of human powered transit. Ebikes are more efficient than cars and give health benefits. They will be much safer than automated cars. Could use the extra capacity for bike and bus lanes.
2 replies →
If everyone in NYC tried to commute in a single-occupancy vehicle, there would be gridlock -- AVs or no.
> Human drivers will be banned, the roads will be exclusively used by autonomous vehicles
I basically agree with your premise that public transit as it exists today will be rendered obsolete, but I think this point here is where your prediction hits a wall. I would be stunned if we agreed to eliminate human drivers from the road in my lifetime, or the lifetime of anyone alive today. Waymo is amazing, but still just at the beginning of the long tail.
11 replies →
> Once there is a robotaxi available to pick up and drop off every passenger directly from a to b, the whole system could be made to be super efficient.
Fundamentally impossible. You're moving some 2 tons of mass in a 2x5m box on polluting rubber tires to move a single 100kg human.
I can always take whatever efficiency gain you've thought up and simply make the vehicle bigger, decreasing the cost and space used per passenger, and maybe even put it on rails, making it less polluting, and more energy efficient.
You can't engineer your way out of the laws of physics.
And don't even get me started on e-bikes.
Only in lower density areas.
In high density regions, vehicles on surface roads can’t meet the passenger demand required. Even if you banned human drivers, the other human users introduce too much variability and delay (passengers loading and unloading, errant objects, cyclists and pedestrians, etc). Roll a dumpster in the street, and have a couple of jaywalkers, and the entire system crawls to a stop.
Controlled access is required to get even medium-high throughput. But these systems already exist, they are called personal rapid transit systems.
So the solution to autonomous driving will be taking all the data from Google Street View, build a world model for training, and in the end you have an AI model basically remembering every street on planet earth?
This might be relevant to the timing here: https://eletric-vehicles.com/waymo/waymo-exec-admits-remote-...
Could these world models be used to build some sort of endless GranTurismo type street racing game?
It seems inevitable that they'll soon be used as the starting points for developing almost all video game environments.
Not for the rendering (that's still way too expensive), but for the initial world generation that gets iteratively refined and then still ultimately gets converted into textured triangles.
> It seems inevitable that they'll soon be used as the starting points for developing almost all video game environments.
Almost all video game environments? No way. That statement really needs to be qualified with the genres of games you're considering.
Very impressive work from Waymo. The driving with a tornado in the horizon example kind of struck my imagination, many people actually panic in such scenarios. I wonder though the compute requirements to run these simulations and producing so many data points.
I don't get how this solves the problem of edge cases with self driving
Even if you can generate simulated training data, don't you still have the problem where you don't even know what the edge cases you need to simulate are in the first place?
Well it certainly helps,doesn't it? This system is going to encounter more edge cases than a single human ever would. Hopefully the lessons from known unknowns generqlise to unknowns. And once they've been seen once they took can become part of the corpus.
Right but does this just rely on some human "brainstorming" a bunch of edge cases?
It just strikes me as neverending edge case wack-a-mole
A human doesn't need to see tons of examples of tornados and elephants to know to stop the car
Doesn't that indicate some fundamental difference between the model and a human driver?
2 replies →
I don’t think coming up with novel situations is all that hard. LLMs already do it in text form all the time.
How big is the space of possible things a car can encounter?
It’s practically infinite, your domain is 3D space and time
You can’t just generate every possible scenario
That would be a combinatorially insane amount of data
Similar to Kiva Systems which was Amazon's best acquisition, Waymo is simply Google's best acquisition. (We live in San Francisco and it feels much safer around these Waymo cars than average "drivers".)
those zoox cars though. watchout!
Dumb question - Why would Waymo disclose this much information to public and competitors?
It's easier to build trust for such a safety-critical service when you're more open about how it works an performs. For the complete opposite approach, see Tesla.
Given the announcement from a few days ago of google trying to get external investment, this is their follow up, showing what that investment is good for. Also, it’s pretty light on details that are of much use to competitors. “We made an accurate simulation system to test our system in before deployment” would be pretty mundane if you were talking about any other field of engineering.
Maybe to distract from the story that they use remote drivers after one of their cars hit a kid? [1]
[1] https://people.com/waymo-exec-reveals-company-uses-operators...
edit: fixed kill -> hit
The child did not die, and suffered only minor injuries: https://abc7.com/post/california-teamsters-call-suspension-w...
Under the same circumstances (kid suddenly emerging between two parked cars and running out onto the street), it could be debated that the outcome could have been worse if a human were driving.
It’s awful a child was hit, but they only suffered minor injuries [1]. Nowhere in your linked article does it say they were killed.
[1] https://people.com/waymo-car-hits-child-walking-to-school-du...
I don't know about the remote driver conspiracy, but waymo slowing down and that kid surviving a crash after jumping on the road from behind a tall vehicle was the best PR waymo could have asked for.
I would love to see more visibility into how this model’s simulation fidelity maps onto measurable safety improvements on public roads, especially in unusual edge conditions like partial sensor occlusion or atypical weather.
One interesting thing from this paper is how big of a LiDaR shadow there is around the waymo car which suggests they rely on cameras for anything close (maybe they have radar too?). Seems LiDaR is only useful for distant objects.
At least 6 radar units: https://support.google.com/waymo/answer/9190838?hl=en
So in order to create a self driving car, you must first create the universe
> Simulation of the Waymo Driver evading a vehicle going in the wrong direction.
It really looks like waymo is the one going in the wrong direction and driving dangerously to evade traffic in this simulation.
How large is the controlled area around the car? And how high do they look for objects? Like something is falling from a bridge, a falling pole or more extreme an falling plane.
For whatever it’s worth World models is going to be the dominant computing structure of the future
I started working heavily on realizing them in 2016 and it is unquestionably (finally) the future of AI
I'm a little sad that they talk about counterfactuals in the simulations, but then don't show any examples of even a single sharknado or giant loop-de-loop.
Seems interesting, but why is it broken. Waymo repeatedly directed multiple automated vehicles into the private alley off of 5th near Brannan in SF even after being told none of them have any business there ever, period. If they can sense the weather and stuff then maybe they could put out a virtual sign or fence that notes what appears to be a road is neither a through way nor open to the public? I'm really bullish on automated driving long term, but now that vehicles are present for real we need to start to think about potentially getting serious about finding some way to get them to comply with the same laws that limit what people can do.
>> get them to comply with the same laws that limit what people can do
I think you meant, "Attempt" to limit what people can do.
Driving in SF (for example) provides many opportunities to see "free will" exerted in the most extreme ways -- laws be damned.
Have you ever seen a cop trying to pull over a Waymo? It isn't going well.
Have there been any reported instances of Waymo cars being stolen?
Disabled and then loaded into a lead-lined trailer or something.
I imagine the IP running locally on the cars is worth billions.
I doubt Waymo would publicly talk about this if it did happen.
I also doubt the IP is worth that much. Most of the secret sauce to starting a competitor probably isn't an end model tuned for a specific configuration of a car but the ability to produce end models, which wouldn't be stealable from the car.
Does this model include things like how slippery the road is, or head wind/ cross wind. Or is it only visual / depth model?
The amount to which supposedly engineering based rationales are applied to Elon's decisions is cult like behavior. Here's what's really going on:
That thing I called full self driving just to troll government regulators? Just make it work and don't complain about what I called it.
That rocket that's way too big for orbital payloads but can't go beyond orbit without sending 20 more rockets full of fuel? Just make it work. Occupy Mars!
The two car models that work and have a halo effect on the rest of the product line? I'mma cancel that! Just make the stupid truck we're keeping work!
We've all had a boss like that. I'm sure they salute each other.
Every time I'm in town I use a waymo. It's still a little weird to be a passenger with no driver
been playing around with world models for sim-to-real transfer lately. the waymo approach looks solid, but curious how you're handling the distribution shift between generated scenes and real sensor data. any tricks for that besides the usual domain randomization?
This is cool, but they are still not going about it the right way.
Its much easier to build everything into the compressed latent space of physical objects and how they move, and operate from there.
Everyone jumped on the end-2-end bandwagon, which then locks you into the input to your driving model being vision, which means that you have to have things like genie to generate vision data, which is wasteful.
This is cool, but they are still not going about it the right way.
This is legit hilarious to read from some random HN account.
I posted this before, but Ill post again - this is one of the few things I feel confident enough to say that most people in the space are doing wrong. You can save my post and reference it when we actually get full self driving (i.e you can take a nap in the backseat while your car drives you), because its going to be implemented pretty much like this:
Humans don't drive well because we map vision policy to actions. We drive well (an in general, manipulate physical objects well), because we can do simulations inside our head to predict what the outcome will be. We aren't burdened by our inability to recognize certain things - when something is in the road, no matter what it is, we auto predict that we would likely collide with that thing because we understand the concept of 3d space and moving within it, and take appropriate action. Sure, there is some level of direct mapping as many people can drive while "spaced out", but attentive driving involves mostly the above.
The self driving system that can actually self drive needs to do the same. When you have this, you will no longer need to do things like simulate driving conditions in a computationally expensive sim. You aren't going to be concerned with training model on edge cases. All you would need to to ensure that your sensor processing results in a 3d representation of the driving conditions, and the model will then be able to do what humans do and explore a latent space of things it can do and predict outcomes then chose the best one.
You want proof? It exists in the form of Mu Zero, and it worked amazingly well. And driving can be easily reformated as a game that the engine plays in a simulator that doesn't involve vision, and learns both the available moves and also the optimal policy.
The reason everyone is doing end to end today is because they are basically trying to catch up to Tesla, and from a business perspective, nobody is willing to put money and pay smart enough people to research this, especially because there is also a legal bridge to cross when it comes to proving that the system can self drive while you napping. But nevertheless, if you ever want self driving, this is the right approach.
Meanwhile, Google who came up with Mu Zero, is now doing more advanced robotic stuff than anyone out there.
1 reply →
The article is about using the world model to generate simulations, not for controlling the vehicle.
They form control policy from vision data directly, which is why they need to have a massive model generate simulation vision data.
2 replies →
Interesting that this should come out right as lawmakers are beginning to understand that Waymos have overseas operators making major decisions.
[*] https://futurism.com/advanced-transport/waymos-controlled-wo...
Completely false: https://x.com/i/status/2019213765506670738
Listen to the statement.
The operators help when the Waymo is in a "difficult situation".
Car drives itself 99% of the time, long tail of issues not yet fixed have a human intervene.
Everyone is making out like it's an RC car, completely false.
Whenever something like this comes out, it's a good moment to find people with no critical thinking skills who can safely be ignored. Driving a waymo like an RC car from the philippines? you can barely talk over zoom with someone in the philippines without bitrate and lag issues.
4 replies →
I haven't read anything about this but I would also suppose long distance human intervention cannot be done for truly critical situations where you need a very quick reaction, whereas it would be more appropriate in situations where the car has stopped and is stuck not knowing what to do. Probably just stating the obvious here but indeed this seems like something very different from an RC car kind of situation.
1 reply →
Why is this relevant at all?
Having humans in the loop at some level is necessary for handling rare edge cases safely.
The word "loop" here has multiple meanings. Only one is what you mean and the other person responding to you has understood another.
The first is the DDT control loop, what a human driver does. Waymo's remote assistants aren't involved in that. The computer always has responsibility for the safety of the vehicle and decisionmaking while operating, which is why Waymo's humans are remote assistants and not remote drivers. Their safety drivers do participate in the DDT loop, hence the name.
But there's also another "loop" of human involvement. Sometimes the vehicle doesn't understand the scene and asks humans for advice about the appropriate action to take. It's vaguely similar to captchas. The human will usually confirm the computer's proposed actions, but they can also suggest different actions. The computer the advice as a prior to continue operating instead of giving up the DDT responsibility. There's very likely a closely monitored SLA between a few seconds to a few minutes on how long it takes humans to start looking at the scene.
If something causes the computer to believe the advice isn't safe, it will ignore it. There have been cases where Waymos have erroneously detected collisions and remote assistants were unable to override that decisionmaking. When that happens, a vehicle recovery team is physically sent out to the location. The SLA here is likely between tens of minutes and a couple hours.
If that’s true the system isn’t finished. That’s what reasoning is for.
1 reply →
Why don't they call it 'the Matrix' and should i prepare for the plugs?
I'm curious how they simulate equipment failures, like a flat tire or something
So the waymo driver is dreaming, testing out different scenarios, basically…
Meanwhile. https://eletric-vehicles.com/waymo/waymo-exec-admits-remote-...
Imagine driving in a Waymo 'out of a raging fire'.
Talk about edge cases.
But, what would you do? Trust the Waymo, or get out (or never get in) at the first sign of trouble?
Interesting question. If the Waymo was driving aggressively to remove us from the situation but relatively safely I might stay in it.
This does bring up something, though: Waymo has a "pull over" feature, but it's hidden behind a couple of touch screen actions involving small virtual buttons and it does not pull over immediately. Instead, it "finds a spot to pull over". I would very much like a big red STOP IMMEDIATELY button in these vehicles.
>it's hidden behind a couple of touch screen actions involving small virtual buttons and it does not pull over immediately
It was on the home screen when I've taken it, and when I tested it, it seemed to pull to the first safe place. I don't trust the general pubic with a stop button.
I feel like this ends with drunk morons accidentally creating Waymo barricades and totally ruining Mardi Gras
Can you not just unlock and open the door? Wouldn't that cause it to immediately stop? Or can you not unlock the door manually? I'd be surprised if there was not an emergency door release.
Imagine how many drunk/careless passengers might press it. Stopping in the middle of the street or highway could be a serious safety hazard.
I can! If the Waymo got you into one on the way home because Google didn’t integrate with watch duty yet, that’s plausible
Do wayno models really use side cameras at only like 4 FPS?
This page crashes my browser.
Vivaldi 7.8.3931.63 on iOS 26.2.1 iPhone 16 pro
Seems relevant: Waymo exec admits remote operators in Philippines help guide US Robotaxis
https://news.ycombinator.com/item?id=46918043
and I literally just saw the other headline "Waymo says its robotaxis get help from remote workers in the Philippines"
Instructions to load it on WAYMAX simulator?
so insane that this is the direction things are going, instead of just reducing our reliance on cars
What is the 5/3 tiles? Cameras?
The model generates camera and Lidar data. As if it was a Waymo car that drove through the simulated scenario with its cameras running. This synthetic training data can then be used to train the driving models.
Wonder how it'll do. The trees change shape (presumably the Lidar patterns do too). I get the premise/why but it seems odd to me (armchair) to use fake data. Real trees don't change shape (in real time) although it can be windy.
It probably doesn't matter though, "this general blob over there"
What if we put this mechanism of recording the world on people. We have mics listening to people talking to us and noises we hear.
Also we record body position actuation and self speech. As output then we put this on thousands of people to get as much data as Waymo gets.
I mean that’s what we need to imitate agi right? I guess the only thing missing is the memory mechanism. We train everything as if it’s an input and output function without accounting for memory.
Nvidia has had this for years. What am I missing?
Ah the typical MDS npc comes out in the swarm, saying how FSD is no better than GM solutions.
Time to eat pop corn, while FSD drives me for the next hour while I scroll hackernews
[dead]
[dead]
It feels intellectually negligent to me that models like those used by Openpilot have no auditable concept of what a car, or a sign, or a person is.
Just where lines are and when a car should accelerate or break. The rest of the latent state is "based on pixels."
[dead]
[flagged]
What's going to happen to all the millions of drivers who will lose their job overnight? In a country with 100 million guns, are we really sure we've thought this through?
Yes, let's stop all progress and roll-back all automation to keep hypothetical angry people with guns happy.
Seems like a good description on current events.
Autonomous private cars is not the technological progress you think it is. We’ve had autonomous trains for decades, and while it provides us with a more efficient and cost effective public transit system, it didn’t open the doors for the next revolutionary technology.
Self driving cars is a dead end technology, that will introduce a whole host of new problems which are already solved with public transit, better urban planning, etc.
12 replies →
Waymo has been operating since 2004 (22 years ago), and replacing drivers on the road will take many more decades. Nothing is happening "overnight".
If Waymo's history is any guide, it's not going to happen overnight. Even in San Francisco, their market share is only 20-30%.
I don't think Uber goes out of business. There is probably a sweet spot for Waymo's steady state cars, and you STILL might want 'surge' capabilities for part time workers who can repurpose their cars to make a little extra money here and there.
> What's going to happen to all the millions of drivers who will lose their job overnight? In a country with 100 million guns, are we really sure we've thought this through?
Same was said about electricity, or the internet.
People keep referencing history but this really is unprecedented. We are approaching singularity and many people will become obsolete in all areas. There are no new hypothetical jobs waiting on the horizon.
Reminds me of the history or radio and the absolute uproar that someone played a record on the radio rather than live performances!!
same thing that happened during the industrial revolution, you pay enough of them to 'protect the law' vs the rest.
this sounds like a major benefit.
i dont want my uber driver bragging anout how theyre going to shoot me before i get out of the car
what did all the farmers do when the first tractors rolled into the fields?
They invented offices because the massive increase in productivity required more organisation than "I'll take it all to the market on Saturday"
UBI or war, or both
Those are rookie numbers. The US has 400 million guns. https://www.theglobalstatistics.com/united-states-gun-owners...
As to the revolt, America doesn't do that any more. Years of education have removed both the vim and vigor of our souls. People will complain. They will do a TikTok dance as protest. Some will go into the streets. No meaningful uprising will occur.
The poor and the affected will be told to go to the trades. That's the new learn to program. Our tech overlords will have their media tell us that everything is ok (packaging it appropriately for the specific side of the aisle).
Ultimately the US will go down hill to become a Belgium. Not terrible, but not a world dominating, hand cutting entity it once was.
> Ultimately the US will go down hill to become a Belgium.
Sharing one's opinion in a respectful way is possible. Less spectacle, so less eyeballs, but worth it. Try it.
3 replies →
> Ultimately the US will go down hill to become a Belgium.
I'm curious why you say this given you start by highlighting several characteristics that are not like Belgium (to wit, poor education, political media capture, effective oligarchy). I feel there are several other nations that may be better comparators, just want to understand your selection.
1 reply →
The new frontier is manifestly the Phillipines.
Can you explain? I lived in PH, and my guess is that you mean navigating and modeling the unending and constantly changing chaos of the street systems (and lack thereof) is going to be a monumental task which I completely agree with. It would be an impressive feat if possible.
Edit: or are you talking about the allegations of workers in the Philippines controlling the Waymos: https://futurism.com/advanced-transport/waymos-controlled-wo... I guess both are valid.
How many Filipinos, who do not have US drivers licenses, does it take to drive this new model?
Wow, interesting timing for this PR blast considering the admission in the Senate Commerce Committee hearing. Not transparent at all!
What was the admission? That they use cheap labor to provide the waymo clarity when it is confused? That has been known for a long time.
Software doesn’t get confused - it fails. Referring to your software as autonomous when you have to staff a 24/7 response center of humans to control it is not just misleading, it’s a lie.
2 replies →
The Waymo driving model: hire some guys in Philippines: https://futurism.com/advanced-transport/waymos-controlled-wo...
This is not false, but gives the wrong idea that foreigners are driving them in real time.
> After being pressed for a breakdown on where these overseas operators operate, Peña said he didn’t have those stats, explaining that some operators live in the US, but others live much further away, including in the Philippines.
> “They provide guidance,” he argued. “They do not remotely drive the vehicles. Waymo asks for guidance in certain situations and gets an input, but the Waymo vehicle is always in charge of the dynamic driving tasks, so that is just one additional input.”
This is quite misleading... From the article:
“When the Waymo vehicle encounters a particular situation on the road, the autonomous driver can reach out to a human fleet response agent for additional information to contextualize its environment,” the post reads. “The Waymo Driver [software] does not rely solely on the inputs it receives from the fleet response agent and it is in control of the vehicle at all times.” [from Waymo's own blog https://waymo.com/blog/2024/05/fleet-response/]
What's the problem with this?
In my opinion there's nothing wrong with it per se, but (a) it's still worth mentioning, because most people have the impression that Waymo cars are completely unassisted, and (b) it makes me wonder how feasible Waymo's operations would be if it weren't for global income inequality.
Have you read the article ? The guys in the Philippines are providing high level executive indications, they don't drive remotely the car or have any low level control of the car.
Dig deep enough into any "AI" idea and you'll find the bottom end of the scam looks exactly like this.
We've simply relabeled the "Mechanical Turk" into "AI."
The rest is built on stolen copyrighted data.
The new corporate model: "just lie the government clearly doesn't give a shit anymore."
"Autonomous"
https://cybernews.com/news/waymo-overseas-human-agents-robot...
My understanding is that support is basically playing an RTS (point and click), not a 1P driving game. Which makes sense, if they were directly controlling the vehicles they'd put support in central America for better latency, like the food delivery bot drivers
Yeah. Waymo described how this works a couple of years ago:
https://waymo.com/blog/2024/05/fleet-response/
2 replies →
This isn't news, they've always acknowledged that they have remote navigators that tell the cars what to do when they get stuck or confused. It's just that they don't directly drive the car.
Yeah I have some videos of these drivers in action, I think the sensors are as assistance and but not the whole story, so yeah there’s models lidars etc etc but human factor is there, unfortunately this means we should see soon many cobitics are teleopetated remotely from India and Philippines and the likes to satisfy the greed of these companies to pay peanuts to operate them.