Comment by mattlondon
21 days ago
Suddenly all this focus on world models by Deep mind starts to make sense. I've never really thought of Waymo as a robot in the same way as e.g. a Boston Dynamics humanoid, but of course it is a robot of sorts.
Google/Alphabet are so vertically integrated for AI when you think about it. Compare what they're doing - their own power generation , their own silicon, their own data centers, search Gmail YouTube Gemini workspace wallet, billions and billions of Android and Chromebook users, their ads everywhere, their browser everywhere, waymo, probably buy back Boston dynamics soon enough (they're recently partnered together), fusion research, drugs discovery.... and then look at ChatGPT's chatbot or grok's porn. Pales in comparison.
Google has been doing more R&D and internal deployment of AI and less trying to sell it as a product. IMHO that difference in focus makes a huge difference. I used to think their early work on self-driving cars was primarily to support Street View in thier maps.
There was a point in time when basically every well known AI researcher worked at Google. They have been at the forefront of AI research and investing heavily for longer than anybody.
It’s kind of crazy that they have been slow to create real products and competitive large scale models from their research.
But they are in full gear now that there is real competition, and it’ll be cool to see what they release over the next few years.
>It’s kind of crazy that they have been slow to create real products and competitive large scale models from their research.
Not really. If Google released all of this first instead of companies that have never made a profit and perhaps never will, the case law would simply be the copyright holders suing them for infringement and winning.
2 replies →
> It’s kind of crazy that they have been slow to create real products and competitive large scale models from their research.
It’s not that crazy. Sometimes the rational move is to wait for a market to fully materialize before going after it. This isn’t a Xerox PARC situation, nor really the innovator’s dilemma, it’s about timing: turning research into profits when market conditions finally make it viable. Even mammoths like Google are limited in their ability to create entirely new markets.
1 reply →
I also think the presence of Sergey Brin has been making a difference in this.
36 replies →
> It’s kind of crazy that they have been slow to create real products and competitive large scale models from their research.
I always thought they deliberately tried to contain the genie in the bottle as long as they could
5 replies →
It has always felt to me that the LLM chatbots were a surprise to Google, not LLMs, or machine learning in general.
Not true at all. I interacted with Meena[1] while I was there, and the publication was almost three years before the release of ChatGPT. It was an unsettling experience, felt very science fiction.
[1]: https://research.google/blog/towards-a-conversational-agent-...
5 replies →
It was a surprise to OpenAI too. ChatGPT was essentially a demo app to showcase their API, it was not meant to be a mass consumer product. When you think about it, ChatGPT is a pretty awkward product name, but they had to stick with it.
Google and OpenAI are both taking very big gambles with AI, with an eye towards 2036 not 2026. As are many others, but them in particular.
It'll be interesting to see which pays off and which becomes Quibi
Quibi would be if someone came in 10 years from now and said "if we put a lot more money behind spitting out content using characters and settings from Hollywood IP than we'll obviously be way more popular than a tech company can be!"
1 reply →
Use your own sh*t is one of the best way to build excellent products.
Tesla built something like this for FSD training, they presented many years ago. I never understood why they did productize it. It would have made a brilliant Maps alternative, which country automatically update from Tesla cars on the road. Could live update with speed cameras and road conditions. Like many things they've fallen behind
No Lidar anymore on the 2026 Volvo models ES60 and EX60. See for example: https://www.jalopnik.com/2032555/volvo-ends-luminar-lidar-20...
I love Volvo, am considering buying one in a couple weeks actually, but they're doing nothing interesting in terms of ADAS, as far as I can tell. It seems like they're limited to adaptive cruise control and lane keeping, both of which have been solved problems for more than a decade.
It sounds like they removed Lidar due to supplier issues and availability, not because they're trying to build self-driving cars and have determined they don't need it anymore.
10 replies →
Without Lidar + the terrible quality of tesla onboard cameras.. street view would look terrible. The biggest L of elon's career is the weird commitment to no-lidar. If you've ever driven a Tesla, it gives daily messages "the left side camera is blocked" etc.. cameras+weather don't mix either.
At first I gave him the benefit of the doubt, like that weird decision of Steve Jobs banning Adobe Flash, which ran most of the fun parts of the Internet back then, that ended up spreading HTML5. Now I just think he refused LIDAR on purely aesthetic reasons. The cost is not even that significant compared to the overall cost of a Tesla.
28 replies →
Yeah its absurd. As a Tesla driver, I have to say the autopilot model really does feel like what someone who's never driven a car before thinks it's like.
Using vision only is so ignorant of what driving is all about: sound, vibration, vision, heat, cold...these are all clues on road condition. If the car isn't feeling all these things as part of the model, you're handicapping it. In a brilliant way Lidar is the missing piece of information a car needs without relying on multiple sensors, it's probably superior to what a human can do, where as vision only is clearly inferior.
40 replies →
From the perspective of viewing FSD as an engineering problem that needs solving I tend to think Elon is on to something with the camera-only approach – although I would agree the current hardware has problems with weather, etc.
The issue with lidar is that many of the difficult edge-cases of FSD are all visible-light vision problems. Lidar might be able to tell you there's a car up front, but it can't tell you that the car has it's hazard lights on and a flat tire. Lidar might see a human shaped thing in the road, but it cannot tell whether it's a mannequin leaning against a bin or a human about to cross the road.
Lidar gets you most of the way there when it comes to spatial awareness on the road, but you need cameras for most of the edge-cases because cameras provide the color data needed to understand the world.
You could never have FSD with just lidar, but you could have FSD with just cameras if you can overcome all of the hardware and software challenges with accurate 3D perception.
Given Lidar adds cost and complexity, and most edge cases in FSD are camera problems, I think camera-only probably helps to force engineers to focus their efforts in the right place rather than hitting bottlenecks from over depending on Lidar data. This isn't an argument for camera-only FSD, but from Tesla's perspective it does down costs and allows them to continue to produce appealing cars – which is obviously important if you're coming at FSD from the perspective of an auto marker trying to sell cars.
Finally, adding lidar as a redundancy once you've "solved" FSD with cameras isn't impossible. I personally suspect Tesla will eventually do this with their robotaxis.
That said, I have no real experience with self-driving cars. I've only worked on vision problems and while lidar is great if you need to measure distances and not hit things, it's the wrong tool if you need to comprehend the world around you.
11 replies →
>>The biggest L of elon's career is the weird commitment to no-lidar.
I thought it was the Nazi salutes on stage and backing neo-nazi groups everywhere around the world, but you know, I guess the lidar thing too.
1 reply →
I have HW3, but FSD reliably disengages at this time of year with sunrise and sunset during commute hours.
4 replies →
Not really I think, they built a simulation engine for autonomous driving, for which tons of such exist out there including ones from Nvidia and also at least 1 open-source one. Using world models is different.
> Suddenly all this focus on world models by Deep mind starts to make sense
Google's been thinking about world models since at least 2018: https://arxiv.org/abs/1803.10122
FWIW I understood GP to mean that it suddenly makes sense to them, not that there’s been a sudden focus shift at google.
Maybe they were focusing on a real world use that basically requires AI, but not LLMs.
Tesla claimed that all their "real world" recording would give them a moat on FSD.
Waymo is showing that a) you need to be able to incorporate stuff that isn't "real" when training, and b) you get a lot more information from alternate sensors to visible spectrum only.
I just listened to a fantastic multi-hour Acquired (https://www.acquired.fm/) podcast episode on Google and AI that talks about the history of Google and AI and all the ways they have been using it since 2012. It's really fascinating. You can forgive them for not focusing on Reader or any of their other properties when you realize they were pulling in hundreds of billions of dollars of value by making big bets in AI and incorporating it into their core business.
Grok/xAI is a joke at this point. A true money pit without any hopes for a serious revenue stream.
They should be bought by a rocket company. Then they would stand a chance.
I always understood this to be why Tesla started working on humanoid robots
They started working on humanoid robots because Musk always has to have the next moonshot, trillion-dollar idea to promise "in 3 years" to keep the stock price high.
As soon as Waymo's massive robotaxi lead became undeniable, he pivoted to from robotaxis to humanoid robots.
Yeah, that and running Grok on a trillion GPUs in space lol
Pretty much. They banked on "if we can solve FSD, we can partially solve humanoid robot autonomy, because both are robots operating in poorly structured real world environments".
I don't want a humanoid robot. I want a purpose built robot.
Obviously both will exist and compete with each other on the margins. The thing to appreciate is that our physical world is already built like an API for adult humans. Swinging doors, stairs, cupboards, benchtops. If you want a robot to traverse the space and be useful for more than one task, the humanoid form makes sense.
The key question is whether general purpose robots can outcompete on sheer economies of scale alone.
It's called a dishwasher, washing machine, and dryer. Plus like robomowers, vaccums etc.
I mean, I would take a robot to handle all of my housework.
Purpose built, that probably takes the form of a humanoid robot since all of tasks it needs to do were previously designed for humanoids.
3 replies →
The drop in demand for Tesla's clapped out model range would have meant embarrassing factory closures, so now they're being closed to start manufacturing a completely different product. Bait and switch for Tesla investors.
I wonder how long they'll be closed for "modifications" and whether the Optimus Prime robot factories will go into production before the "Trump Kennedy Center" is reopened after its "renovations".
It's so they can stick a Tesla logo on a bunch of chinese tech and call it innovation.
So is this a model baked into the VLLM layer? Or a scaffold that the agent sits in for testing?
If the former then it’s relevant to the broader discourse on LLM generality. If the latter, then it seems less relevant to chatbots and business agents.
Edit to add: this is not part of the model, it’s in a separate pillar (Simulator vs Driver). More at https://waymo.com/blog/2025/12/demonstrably-safe-ai-for-auto....
>> Suddenly all this focus on world models by Deep mind starts to make sense.
The apparent applicability to Waymo is incidental, more likely because a few millions+ were spent on Genie and they have to do something with it. DeepMind started to train "world models" because that's the current overhyped buzzword in the industry. First it was "natural language understanding" and "question answering" back in the days of old BERT, then it was "agentic", then "reasoning", now it's "world models", next years it's going to be "emotions" or "social intelligence" or some other anthropomorphic, over-drawn neologism. If you follow a few AI accounts on social media you really can't miss when those things suddenly start trending, then pretty much die out and only a few stragglers still try to publish papers on them because they failed to get the memo that we're now all running behind the Next Big Thing™.
notice that all these buzzwords you give actually correspond to real advances in the field. All of these were improvements on something existing, not a big revolution for sure, but definitely measurable improvements.
Those are not "real advances in the field", which is why they are constantly abandoned for the next new buzzword.
Edit:
This just in:
https://news.ycombinator.com/item?id=46870514#46929215
The Next Big Thing™ is going to be "context learning", at least if Tencent have their way. And why do we need that?
>> Current language models do not handle context this way. They rely primarily on parametric knowledge—information compressed into their weights during massive pre-training runs. At inference time, they function largely by recalling this static, internal memory, rather than actively learning from new information provided in the moment.
>> This creates a structural mismatch. We have optimized models to excel at reasoning over what they already know yet users need them to solve tasks that depend on messy, constantly evolving context. We built models that rely on what they know from the past, but we need context learners that rely on what they can absorb from the environment in the moment.
Yep. Reasoning is so 2025.
2 replies →
Also known as a monopoly. This should terrify us all.
No, it's known as vertical integration, which is legally permitted by default.
Monopolies are essentially 100% horizontal integration. Vertical integration is a completely different concept.
So for the record, with this realization you're 3+ years behind Tesla.
https://www.youtube.com/watch?v=ODSJsviD_SU&t=3594s
Practically ALL course introductory materials that regard robotics and AI that I've seen began with "you might imagine a talking bipedal humanoid when you hear the word `robot`, but perhaps the most commonplace robot that you have seen is a vending machine", with the illustration of a typical 80s-90s outdoor soda vendor with no apparent moving parts.
So "maybe cars are a bit of robots too" is more like 30-50 years behind the time.
Aren't they still using safety drivers or safety follow cars and in fewer cities? Seems Tesla is pretty far behind.
What do you think I said that you're contradicting?
IMO the presence of safety chase vehicles is just a sensible "as low as reasonably achievable" measure during the early rollout. I'm not sure that can (fairly) be used as a point against them.
I'm comfortably with Tesla sparing no expense for safety, since I think we all (including Tesla) understand that this isn't the ultimate implementation. In fact, I think it would be a scandal if Tesla failed to do exactly that.
Damned if you do and damned if you don't, apparently.
9 replies →
What an upsetting comment. I'm glad you came around but what did you think was going to be effective before you came around to world models?
Which is why it's embarrassing how much worse Gemini is at searching the web for grounding information, and how incredibly bad gemini cli is.
Not my experience in either of those areas.
Internal firewalls and poor management means that the vast majority of integration opportunities are missed.
The flywheel is starting to spin......
> I've never really thought of Waymo as a robot in the same way as e.g. a Boston Dynamics humanoid, but of course it is a robot of sorts.
I view Tesla also more as a robot company than anything else.
[dead]
"Waymo as a robot in the same way"
Erm, a dishwasher, washing machine, automated vacuum can be considered robots. Im confused as to this obsession of the term - there are many robots that already exist. Robotics have been involved in the production of cars for decades.
......
I think the (gray) line is the degree of autonomy. My washing machine makes very small, predictable decisions, while a Waymo has to manage uncertainty most of the time.
Its irrelevant. A robot is a robot.
Dictionary def: "a machine controlled by a computer that is used to perform jobs automatically."
4 replies →
It's a 3500lb robot that can kill you.
Boston Robotics is working on a smaller robot that can kill you.
Anduril is working on even smaller robots that can kill you.
The future sucks.
and they're all controlled by (poorly compensated) humans anyway [1] [2]
[1] https://www.wsj.com/tech/personal-tech/i-tried-the-robot-tha...
[2] https://futurism.com/advanced-transport/waymos-controlled-wo...
They couldn't even make burger flipping robots work and are paying fast food workers $20/hr in California.
If that doesn't make it obvious what they can and cannot do then I can't respect the tranche of "hackers" who blindly cheer on this unchecked corporate dystopian nightmare.
1 reply →
>or grok's porn
I know it’s gross, but I would not discount this. Remember why Blu-ray won over HDDVD? I know it won for many other technical reasons, but I think there are a few historical examples of sexual content being a big competitive advantage.
The vertical integration argument should apply to Grok. They have Tesla driving data (probably much more data than Waymo), Twitter data, plus Tesla/SpaceX manufacturing data. When/if Optimus starts on the production line, they'll have that data too. You could argue they haven't figured out how to take advantage of it, but the potential is definitely there.
Agreed. Should they achieve Google level integration, we will all make sure they are featured in our commentary. Their true potential is surely just around the corner...
"Tesla has more data than Waymo" is some of the lamest cope ever. Tesla does not have more video than Google! That's crazy! People who repeat this are crazy! If there was a massive flow of video from Tesla cars to Tesla HQ that would have observable side effects.
"More video" (gigabytes) is a straw man.
The key metric is more unusual situations. That scales with miles driven, not gigabytes. With onboard inference the car simply logs anything 'unusual' (low confidence) to selectively upload those needle-in-a-haystack rare events.
But somehow google fails to execute. Gemini is useless for programming and I don’t think even bother to use it as chat app. Claude code + gpt 5.2 xhigh for coding and gpt as chat app are really the only ones that are worth it(price and time wise)
I've recently switched to Claude for chat. GPT 5.2 feels very engagement-maxxed for me, like I'm reading a bad LinkedIn post. Claude does a tiny bit of this too, but an order of magnitude less in my experience. I never thought I'd switch from ChatGPT, but there is only so much "here's the brutal truth, it's not x it's y" I can take.
GPT likes to argue, and most of its arguments are straw man arguments, usually conflating priors. It's ... exhausting; akin to arguing on the internet. (What am I even saying, here!?) Claude's a lot less of that. I don't know if tracks discussion/conversation better; but, for damn sure, it's got way less verbal diarrhea than GPT.
1 reply →
Experiencing the same. It seems Anthropic’s human-focused design choices are becoming a differentiator.
To me ChatGPT seems smarter and knows more. That’s why I use it. Even Claude rates gpt better for knowledge answers. Not sure if that itself is any indication. Claude seems superficial unless you hammer it to generate a good answer.
Gemini is by far the best UI/UX designer model. Codex seems to the worst: it'll build something awkward and ugly, then Gemini will take 30-60 seconds to make it look like something that would have won a design award a couple years ago.
Gemini works well enough in Search and in Meet. And it's baked into the products so it's dead simple to use.
I don't think Google is targeting developers with their AI, they are targeting their product's users.
It is a bit mind boggling how behind they were considering they invented transformers and were also sitting on the best set of training data in the world, but they've caught up quite a bit. They still lag behind in coding, but I've found Gemini to be pretty good at more general knowledge tasks. Flash 3 in particular is much better than anything of comparable price and speed from OpenAI or Anthropic.
Yesterday GPT 5.2 wrote a python function for me that had the import in the middle of the code, for no reason. (It was a simple import of requests module in a REST client...) Claude I agree is a lot better for backend,Gemini is very good for frontend