Gemini Omni

6 hours ago (deepmind.google)

98 comments

meetpateltech

In my day job I program rigid body behaviour in real time amongst other simulations. I think rigid body contact is hard to learn as it is inherently discontinuous.. something you discover when trying to code a solver.

As such I always use this prompt as a test: "A video of a jenga brick tower falling over as a brick is removed. The physics of each brick must be realistic."

It gave me a video of where bricks suddenly disapper or morph into others[1]. The linked video is after 2-3 iterations of me insisting on realistic physics. If you are just glancing at this, you would believe it is realistic.

That said this is still very impressive and one more step towards .. IDK what. But I am a bit reasurred that at least my job won't be fully replaced with AI :)

[1] https://streamable.com/2em1r3

E-Reverance 3 hours ago
> But I am a bit reasurred that at least my job won't be fully replaced with AI :)
I honestly can't comment with certainty that training from videos alone and whatever tokenization scheme they're using will ever get perfect dynamics.
However it is worth noting that transformers can do a pretty good job at learning dynamics with the right pipeline (not video): https://arxiv.org/pdf/2605.15305 https://arxiv.org/pdf/2605.09196
My point here being that representationally, it might be possible to learn good dynamics without a radically different approach/arch. There are already models that extract 3D tracking points from videos, so they could possibly be leveraged for learning dynamics (which on its own gives precedent for end-to-end approaches also possibly working).
- manas96 5 minutes ago
  
  Thanks for the additional reading. I've often thought about LLMs and their ability to represent the physical world with its laws. And always concluded it is not really possible to do so with "just" text tokens and their relations in a latent space. It looks to me there are different approaches being taken to tackle this:
  * You could instruct your LLM to interact with an simulator to run experiments and infer behaviour
  * You could edit the transformer model and inject spatially relevant data rather than text as is done in above paper
  * You could change the architecture to be more condusive for representating a world state. I.e., LeCun's JEPA world model.
  * You could further enhance some of the above by using a differentiable physics engine (eg. NVIDIA Newton) to calculate losses directly.
  But at the end of the day if a model has any hope to always produce realistic physics, it HAS to learn the laws of nature in some form or other. It looks to me that the next big leap could be achieved by combining the last two approaches.
  P.S.: I like discussing such topics. If anyone knows a forum or discord with like-minded people, please let me know :)
oceansweep 2 hours ago
Totally unrelated, but what would you say the feasibility of writing simulation software for simulation of/replicating body movements during/in a martial arts technique would be?
I’ve often thought it would be very handy to have a proper simulator for being able to simulate and identify inefficiencies in one’s technique, but no idea whether it would be feasible to do.
- jackling 1 hour ago
  
  Would be similar to the typical simulations of humanoids. If you need to model the deformations of the human body, or get a proper model of tendons that make up humans, it'll be more difficult, but possible.
  Proper simulators for those exist, you essentially need an engine with a compliant contact model. MuJoCo is the goto here, see:
  https://mujoco.readthedocs.io/en/stable/modeling.html#muscle... https://mujoco.readthedocs.io/en/stable/computation/fluid.ht...
  These explicitly model biological muscles. IIRC it was originally created to model human hands (I could be misremembering though).
  Really depends on the fidelity you want.
  Edit: I also work in rigid body simulation for robotics.
nine_k 4 hours ago

Such videos are essentially dreams: how it feels that the planks should move, not what equations of rigid body physics would compute. And the feeling is realistic (even if overly dramatic in the end). If "stylistic transfer" works for static pictures spread out in space, why won't it work for the character of motion spread out in time?
darkwater 3 hours ago
I wonder what's the training data that makes it generate the final "explosion"...
- jddj 3 hours ago
  
  A little too much Michael Bay
  
  1 reply →
- badsectoracula 1 hour ago
  
  The physics engine glitching is very realistic :-P
christoff12 3 hours ago
thanks for intro to streamable
- staindk 2 hours ago
  
  In my experience (from a couple of years ago), Streamable can be great but it's just worth checking what their current retention policy is like.
  We were sharing game clips with each other and after a while realised our old clips were just gone, being deleted after 30 or 90 days or something.
- manas96 2 hours ago
  
  it was the first link I got after googling free video hosting sites

adenta 5 hours ago

At first usage I'm not impressed. I've probably spent a couple grand on Seedance 2 to date, and I can't find anything google omni flash does better than Seedance from running a handful of samples through the system. You can find some of the videos I've made in my HN bio link.

kamranjon 5 hours ago
Just curious - are you at all concerned about the legal implications of ai-generating property listing videos?
- layer8 4 hours ago
  
  The legal risk probably lies solely with those who are selling the properties. They are responsible if the video misrepresents anything.
  
  4 replies →
red2awn 3 hours ago

I have exactly the same thought. Anyone who had used seedance 2.0 a bit can tell Gemini is a bit behind, and seedance 2.1 is on the horizontal already.
CommanderData 3 hours ago
Seedance 2 is amazing, compared with anything else American tech is producing. It does struggle with consistency like all other models.
The other problem is Seedance is heavily censored because of copyright concerns.
- dotancohen 1 hour ago
  
  > The other problem is Seedance is heavily censored because of copyright concerns.
  Instead of censoring, wouldn't it make sense to simply not train on copyrighted materials?

torginus 2 hours ago

While at a cursory glance it looks as impressive as always, subtle spatial errors, and geometry that changes as it goes out of sight and comes back again hints at the fact that Google has still yet to solve the problem of deep spatial understanding.

Which considering just how pretty and detailed this whole thing looks, imo points at a fundamental issue at how these things are trained - it's as if there's no structure to its knowledge and training, like how an artist trained to draw would first try to understand simple 2d composition, then perspective, then light and shadow, mastering each concept and gradually building up a hierarchical understanding - it seems like its trying to learn everything at once.

I would rather see an AI model that I could give a floorplan of a building and it would generate an accurate flythrough on any path, even if it looked like butt.

Im not just talking out of my arse, I did work for a while in data science/engineering, and one of the big lessons people needed to be reminded of is to clean/downsample the data - a dataset consisting of a million samples could very well take 1000x as long to process as if we downsampled the whole thing to just a couple of thousand samples and we could learn the same conclusions with the fraction of expended time/effort.

I'm sure there's a similar logic in RL, that if you dump a trillion samples into the datacenter that consumes the same power as a city, what the model learns is what it could've learned with a much more curated training set and directed approaches.

enragedcacti 5 hours ago

> Prompt: Make it look like the weird shape of my hand hole super zooms and magnifies the ground it's looking at in sharper quality.

There's got to be a reason this is phrased so insanely, right?

bar94 3 hours ago
Even weirder:
> Prompt: A skeuomorphism stop motion explainer about how the brain hippocampus works with a compelling voiceover. Don’t add seahorses. No voice cuts at the end. Don’t add text
Seahorses???
- gfaure 3 hours ago
  
  The genus of the seahorse is _Hippocampus_.
  
  3 replies →
nightpool 3 hours ago

Yes, if you watch the video closely you can see that the "lensing" effect only really covers a circular area—this prompt probably went through multiple iterations where the author was trying to improve it so that the shape of the hand was reflected more closely.
layer8 4 hours ago

Image-search for “hand hole” at your own peril.

raincole 5 hours ago

At the bottom there is a "Try in Youtube Shorts" button.

Oh god...

kordlessagain 4 hours ago

Pure artificial stupidity: https://www.youtube.com/watch?v=aRJH7HKuD2Y
entropicdrifter 4 hours ago

I mean if we're just blasting past our climate tipping points anyhow, why not just actively dump entire lakes' worth of water out for people to post slop for clout, right?
May as well power off the whole grid now and have the Amish start teaching us how to survive

baq 3 hours ago

We could be solving fusion power and instead we’re generating videos of birds in space or something. The market is a harsh mistress sometimes.

kenjackson 4 hours ago

I'm an AI optimist. But AI video is probably the one thing that does depress me. Seeing that we can make anything visually, there's nothing that impresses me visually. I watch a video that two years ago I would've thought was really cool, and now my first thought is, "Yawn, is this AI?".

Video, more than anything else, is the place where I really care if something is AI or not. If I could get a TikTok that had no AI usage -- I'd be in. Which is weird for me, because I'm typically the guy who is all-in on AI.

raincole 4 hours ago
It ruined the whole category of "cute animals acting goofy" content for sure.
- slfnflctd 2 hours ago
  
  Yeah, I'm kinda sad about that one. Most of my friends and family are aware many of these are fake now, but argue that it still invokes the same response in us so it's okay. For me, though, however intangible or irrational it may be, I do feel a sense of loss.
  Funny enough, this is actually one of the few things which has bothered me with the AI boom, and I'm mostly pro-acceleration. A lot of what's happening seems inevitable. But surprisingly, knowing that cat or dog or bird or lizard or butterfly or whatever has a strong chance of being generated really does take something out of it to my mind. And I say that also knowing the extreme amount of staging which has long gone on with traditional nature videography. Somehow, knowing the animal is real means something... I'm still trying to figure out how to better understand and express this.
nowittyusername 3 hours ago

You get back as much as you put in. Just like with all generative tools the quality of the output depends on the quality of input. Slapping a prompt together will only get you so far, if you want the models to generate something really striking and unique you need to get your hands dirty. Gotta break out ComfyUI and build yourself a specific workflow, once you dig deep and understand how things are put together, why and so on, you can make really amazing stuff with any generative models. But you have to pay for that experience in patience and knowledge.
criddell 4 hours ago
For a few weeks, YouTube thought I wanted to see videos of package thieves being surprised by a booby-trapped box that was actually a glitter bomb. Video after video were these AI created shorts of supposed doorbell camera footage showing a thief running away with a box that explodes into a giant pink cloud.
I eventually picked one and opened the comments and the top comment was something like "This is obviously an AI video. Who watches this?" and the reply was along the lines of "me because I like seeing thieves get what's coming to them".
So you, like me, aren't interested in AI videos but I think there's a lot of people who don't care if it's real or not.
Thankfully, YouTube eventually stopped showing those to me. Now it thinks I'm interested in road rage videos. My YouTube feed outside of the three of four channels I've subscribed to is terrible.
- r_lee 4 hours ago
  
  > and the reply was along the lines of "me because I like seeing thieves get what's coming to them".
  I really wish a subject matter expert would pitch in to tell us what this is about?
  like a totally made up thing that is fake, somehow gives a sense of justice and satisfaction?
  is it something about imagining it happening in reality, or what?
  for me, if I see that something is AI, it's like I just feel nothing. because there's nothing in it, it has nothing of real value? like it doesn't evoke anything in me, it doesn't make me think "this was a great find!" or make me want to send a link over to my friends, etc.
  
  2 replies →
impulser_ 4 hours ago
I think the opposite. It allows more people to be creative. Similar to how the DAW allowed more people to become musicians. You can produce a hit song with just a laptop now.
Now you can have people producing videos without needing a crew of people.
- LetsGetTechnicl 4 hours ago
  
  You never needed a crew of people to make videos. This is just outsourcing people's creativity.
- criddell 4 hours ago
  
  The potential for harm is so much greater with video than creating an mp3. You can stoke hate and fear so easily.
  
  2 replies →
sleno 4 hours ago
It's not all bad: https://www.tiktok.com/@openchub/video/7641631412407274782
- criddell 4 hours ago
  
  I tried to watch it, but TikTok kept throwing up a dialog over top asking me to slide a puzzle piece into place. I did three or four before just closing it.

throw03172019 5 hours ago

Browser crashes while scrolling because of all the auto playing videos. Please use IntersectionObserver to pause the video when not in display.

SyneRyder 3 hours ago

Not to negate your experience, but seems fine on Firefox 150 on my Windows ThinkPad X1.
fuzzy2 3 hours ago

On my iPad Pro from 2017, none of the videos even play. Not sure what's better!
nicce 5 hours ago

Sounds like someone would use LLM to make it and no single human has reviewed
Foomf 4 hours ago
It keeps crashing my browser as well. I'm on Microsoft Edge.
- zarzavat 4 hours ago
  
  Same in Mobile Safari.
SoKamil 4 hours ago

Safari?

meetpateltech 6 hours ago

blog post: https://blog.google/innovation-and-ai/models-and-research/ge...

model card: https://deepmind.google/models/model-cards/gemini-omni-flash...

amelius 1 hour ago

What I'm hoping/waiting for is IMDB users creating alternative endings of movies.

It could make the comments section even more fun.

franze 5 hours ago

> I can create more videos as soon as your limit resets. Check your usage in Settings

I did not create any videos yet.

Google, building great AI that nobody can try out.

But thx for the press release.

andrewstuart 5 hours ago

Google often does this - they show it off and forget to give it to you.
tristanb 4 hours ago

Me too - awesome job.

clapthewind 6 hours ago

I think Hollywood is in for a rough era. The disruption is happening at break neck speeds.

franze 5 hours ago

At one point the only way to know if something is real or by a major US tech company is nudity.
andrewstuart 5 hours ago
Hollywood is already in a rough era but it’s because they can’t create original human stories any more.
This tech won’t change anything.
- mrandish 4 hours ago
  
  Yeah, during most blockbuster movies lately all I can think is: "All pixels, no plot."
  
  8 replies →
- wcxcv 2 hours ago
  
  Theres a Steve Jobs quote about this
mackeye 4 hours ago
you would watch a movie generated with the sterility of an LLM?
- nomel 4 hours ago
  
  AI is already in a bunch of creative workflows. Just look at modern Photoshop. Selecting and hitting delete has AI infill for the background replacement.
  Creates can these video gen AI in various ways. There are some youtube channels of people using these in creative workflows that are really impressive, from mocap replacement, character insertion, background replacement, changing camera angle in post, animating/inserting characters from character boards, animated between stills generated in traditional methods, etc. It's not just "prompt and generate". It can be, because it's easy, but it also doesn't have to be. It's a tool.
  
  2 replies →
- drusepth 2 hours ago
  
  Weirdly phrased, but yes, I would watch a movie generated with an LLM by a person passionate about the movie they're creating.
- raincole 4 hours ago
  
  I think Hollywood's obsession with unnecessary sex scenes[0] is the #1 reason I have been watching less and less movies. So yeah, probably.
  [0] e.g. Don't Look Up
- senko 2 hours ago
  
  Have you seen the past dozen or so Marvel movies?
  
  1 reply →
- garciasn 4 hours ago
  
  Sure; why not? It has to be better than some of the absolute garbage that's out on the various streaming services today; right?
  
  2 replies →
- yojo 4 hours ago
  
  Me? No. My kids? I think they already have. I don’t allow YouTube in our house, but they for sure watch slop with friends.
advisedwang 6 hours ago
At the moment the duration of each shot is a major limitation. When that limitation gets solved is when we'll see actual disruption.
- boredhedgehog 2 hours ago
  
  Average shot length is down to something like 3 seconds in modern cinema. That's a pretty low bar.

dwa3592 3 hours ago

Even though I don't have words to express how impressive this capability looks. I am genuinely scared at the harmful use cases of this.

dsign 5 hours ago

So it's really good, and we have reason to believe, never again, anything that happens in a video. Unless there's a super-product somewhere to authenticate footage?

svieira 4 hours ago

Now that they've broken the ability to trust video, they're looking to build it back, as long as you're allowed to use the tools:
https://blog.google/innovation-and-ai/products/identifying-a...
(and the previous SynthID: https://deepmind.google/blog/identifying-ai-generated-images...)
But it very much is "close the barn door after the horse has bolted and the barn has otherwise burned down".
spogbiper 4 hours ago

It seems like this super-product will have to be a thing soon or we will have to just stop using video evidence in court and other critical applications

andrewstuart 5 hours ago

Who is creative enough to drive this in any meaningful way?

Certainly not me - you have to be a great artist /designer to even imagine what to do with it.

mrandish 5 hours ago

Back in 90s during the first wave of the desktop video revolution when desktop editing became possible and consumer camcorders got pretty good, there was a popular marketing slogan: "Now your imagination is the only limit."
I used to joke that was the moment we discovered "for most people that's a pretty big limit."

uejfiweun 3 hours ago

Does anyone else feel like Google is just always a dollar short and a day late here? Maybe not a dollar short, but it's like they've consistently been focused on the wrong thing. First they missed chatbots, now they're missing coding agents while they double down on chatbots and video gen (which OpenAI has already basically abandoned). Maybe this strategy is actually genius and I'm too stupid to grasp it.