Comment by javier123454321
1 day ago
This is terrifying. With this and z-image-turbo, we've crossed a chasm. And a very deep one. We are currently protected by screens, we can, and should assume everything behind a screen is fake unless rigorously (and systematically, i.e. cryptographically) proven otherwise. We're sleepwalking into this, not enough people know about it.
That was my thought too. You’d have “loved ones” calling with their faces and voices asking for money in some emergency. But you’d also have plausible deniability as anything digital can be brushed off as “that’s not evidence, it could be AI generated”.
Only if you focus on the form instead of the content. For a long time my family has had secret words and phrases we use to identify ourselves to each other over secure, but unauthenticated, channels (i.e. the channel is encrypted, but the source is unknown). The military has had to deal with this for some time, and developed various form of IFF that allies could use to identify themselves. E.g. for returning aircraft, a sequence of wing movements that identified you as friend. I think for a small group (in this case, loved ones), this could be one mitigation of that risk. My parents did this with me as a kid, ostensibly as a defense against some other adult saying "My mom sent me to pick you up...". I never did hear of that happening, though.
this was already possible with chatterbox for a long while.
Yep, this has been the reality now for years. Scammers have already had access to it. I remember an article years ago about a grandma who wired her life savings to a scammer who claimed to have her granddaughter held hostage in a foreign country. Turns out they just cloned her voice from Facebook data and knew her schedule so timed it while she would be unreachable by phone.
or anyone who refuses to use hearing aids.
For now you could ask them to turn away from the camera while keeping their eyes open. If they are a Z-Image they will instantly snap their head to face you.
This scenario is oddly terrifying.
> as anything digital can be brushed off as “that’s not evidence, it could be AI generated”.
This won't change anything about Western style courts which have always required an unbroken chain of custody of evidence for evidence to be admissable in court
Court account for a vanishingly small proportion of most people's lives.
1 reply →
https://www.youtube.com/watch?v=diboERFAjkE pretty much this
That's a reupload of Cybergem's video. https://www.youtube.com/watch?v=-gGLvg0n-uY
Oh wow. Thank you for this. Amazing, terrifying, spot on, all of it.
I knew what it would be before I even opened it. The crazy thing is that video is like 3 years old.
I'd be a bit more worried with Z-Image Edit/Base is release. Flux.2 Klein is our and its on par with Zit, and with some fine tuning can just about hit Flux.2. Adding on top of that is Qwen Image Edit 2511 for additional refinement. Anything is possible. Those folks at r/StableDiffusion and falling over the possible release of Z-Image-Omni-Base, a hold me over until actual base is out. I've heard its equal to Flux.2. Crazy time.
> This is terrifying.
Far more terrifying is Big Tech having access to a closed version of the same models, in the hands of powerful people with a history of unethical behavior (i.e. Zuckerberg's "Dumb Fucks" comments). In fact it's a miracle and a bit ironic that the Chinese would be the ones to release a plethora of capable open source models, instead of the scraps like we've seen from Google, Meta, OpenAI, etc.
> Far more terrifying is Big Tech having access to a closed version of the same model
Agreed. The only thing worse than everyone having access to this tech is only governments, mega corps and highly-motivated bad actors having access. They've had it a while and there's no putting the genii back in the bottle. The best thing the rest of us can do is use it widely so everyone can adapt to this being the new normal.
I know genii is the plural of genie, but for a second I thought it was a typo of genai and I kind of like that better.
I do strongly agree. Though the societal impact is only mitigated by open models, not curtailed at all.
The really terrifying thing is the next logical step from the instinctual reaction. Eschew miracle, eschew the cognitive bias of feeling warm and fuzzy for the guy who gives you it for free.
Socratic version: how can the Chinese companies afford to make them and give them out for free? Cui bono?
n.b. it's not because they're making money on the API, ex. open openrouter and see how Moonshot or DeepSeek's 1st party inference speed compares to literally any other provider. Note also that this disadvantage can't just be limited to LLMs, due to GPU export rules.
>Far more terrifying is Big Tech having access to a closed version of the same models, in the hands of powerful people with a history of unethical behavior (i.e. Zuckerberg's "Dumb Fucks" comments).
Lol what exactly do you think Zuck would do with your voice, drain your bank account??
Admittedly I have not dove into it much but, I wonder if we might finally have a usecase for NFTs and web3? We need some sort of way to denote items are persion generated not AI. Would certainly be easier than trying to determine if something is AI generated
That's the idea behind C2PA[1], your camera and the tools put a signature on the media to prove its provenance. That doesn't make manipulation impossible (e.g. you could photograph an AI image of a screen), but it does give you a trail of where a photo came from and thus an easier way to filter it or lookup the original.
[1] https://c2pa.org/
How would NFTs/web3 help differentiate between something created by a human and something that a human created with AI and then tagged with their signature using those tools?
In a live conversation context you can mention the term NFTs/web3 and if the far end is human they'll wince a little.
1 reply →
> With this and z-image-turbo, we've crossed a chasm.
And most of all: they're both local models. The cat is out of the box and it's never going back in. There's no censoring of this. No company that can pull the plug. Anyone with a semi-modern GPU can use these models.
We're going to be okay.
There are far more good and interesting use cases for this technology. Games will let users clone their voices and create virtual avatars and heroes. People will have access to creative tools that let them make movies and shows with their likeness. People that couldn't sing will make music.
Nothing was more scary than the invention of the nuclear weapon. And we're all still here.
Life will go on. And there will be incredible benefits that come out of this.
I'm not denigrating the tech, all I'm saying is that we've crossed to new territory and there will be consequences that we don't understand from this. The same way that social media has been particularly detrimental to young people (especially women) in a way we were not ready for. This __smells__ like it could be worse, alongside with (or regardless of) the benefits of both.
I simply think people don't really know that the new world requires a new set of rules of engagement for anything that exists behind a screen (for now).
We'll be okay eventually, when society adapts to this and becomes fully aware of the capabilities and the use cases for abuse. But, that may take some time. The parent is right to be concerned about the interim, at the very least.
That said, I am likewise looking forward to the cool things to come out of this.
> We're going to be okay.
> And there will be incredible benefits that come out of this.
Your username is echelon.
I just wanted to point that out.
Yeah. Not using voice, but...https://nymag.com/intelligencer/article/white-house-posts-fa...
> Nothing was more scary than the invention of the nuclear weapon. And we're all still here.
Except that building a nuclear weapon was not available to everyone, certainly not to dumb people whose brain have been feeded with social media content.
I usually don't correct typos and/or grammar, but you asked for it. Calling random people "dumb" while using an incorrect past tense is pretty funny. It is "fed", not "feeded"...
> People that couldn't sing will make music.
I was with you, until
But, yeah. Life will go on.
There are plenty of electronic artists who can't sing. Right now they have to hire someone else to do the singing for them, but I'd wager a lot of them would like to own their music end-to-end. I would.
I'm a filmmaker. I've done it photons-on-glass production for fifteen years. Meisner trained, have performed every role from cast to crew. I'm elated that these tools are going to enable me to do more with a smaller budget. To have more autonomy and creative control.
18 replies →