I actually can’t wait for the future where I upgrade hardware in order to upgrade my ai as an alternative to an expensive subscription.
There are many problems I want to work on which require billions of tokens. These are completely inaccessible without corporate project sponsorship at the moment. An asic generation machine which can pump out a few 10s of thousands of tokens per second at opus4.6 quality is more than sufficient.
A company called Taalas is working on something like that. Not Opus4.6 quality, but I'm sure they're targeting larger models. Currently they're using a LLama 8B model. It runs at ~17k tokens per second, and you can test it at https://chatjimmy.ai/.
Right now - there's some heavily subsidized subscriptions that are more or less cheating. For instance, Github CoPilot at $39/month gives you claude opus 4.6. They're going to close that off, but right now it's like a freebie for those doing API agentic harnesses.
That said, if you are doing always on agents and you spend $3k-$4k on a GB10 or, $5+ k on Apple Silicon as your sunk cost, you will probably come out ahead.
I've got 5 agents running a purely experimental social experiment. AThey operate in an evennia mud (a familiar sounding city called "gothmud). I've built a channel, idle prompts, sleep schedule. I feed in real world news, weather. There's a character up in a clock tower that reads evennia's audit logs every 20 minutes to surveil the city, and a cast of people wandering around, investigating things, having coffee, repairing robots. This is all hitting qwen3.6-35-A3B on the Asus GB10, which cost me $3k.
Over the last 30 days, I've hit 394M input tokens, 1.6B output tokens. I would have spent between $1600 to $1700 if I was using openrouter. Not calculated - I also have comfyui running in the spare space, and the agents "take photos" of the rooms they're in, selfies, workshop photos, etc.
How much did I spend on electricity? I don't have a meter on my box. My total electric bill for the last 30 days was $220, so I know it's less than that. My rate to compare is 11.7/kwh, but it's closer to 15c/Kwh total. The Asus GX10 has a 240W power supply, and it's probably only pulling 180. I estimate $15-$20/month. But worst case red-lining. 240 Watts, 720 hours = 172KWH , and at $0.20, I come to $35
Here's the kicker thought - that github copilot subscription I mentioned? I have another agent running on that, reading all my other agent logs, managing my obsidian notes, doing research, sending briefings. And all by itself, it used almost the same amount of claude-opus tokens for that $39/month subscription. I was actually a bit shocked when I pulled a recent report and saw that. I'm working to migrate functionality away from copilot subscription to the local model. A lot of the initial setup might have needed it, but not the ongoing review style work it does.
For open models, usually not well. You get 5+ providers competing on cost, all with cheaper electricity and better hardware utilization than your local setup
The TL;DR though is that a 10-15b param model baked into an ASIC with the latest fab tech would take around 62W of power draw when active. At ~10k+ t/s though it likely would only be active for short bursts of time. It'd fit perfectly fine within the thermal envelope of a laptop.
The approach makes a lot of sense. Once you get to those speeds, latency of the network becomes one of the bigger bottlenecks, so local has a real advantage over a subscription.
"Design me a 3d printable rocket engine for a hobby rocket project. Verify it's design in a full simulation. Iterate until it works reliably in simulation based on a verified printable design on a consumer laser sintering device (or substitute contract manufacture for under 1000 dollars)."
This is a hobby version of a project, but you can imagine commercial versions of the same prompt for new databases, genomics studies, material analysis, operating systems etc.
Ok heres the thing you will nevwr be able to truly do this due to logic.
Logically five people pooling their resources beats one guy.
therefore datacenters will always win because they get higher time utilization.
so forget it.
I always wonder the same but i let logic tell me its a fantasy, on average you cant outspend a whole group of people making better use of the hardware.
you will get better hardware though, cutting edge will always be cloud
Laptops/desktops are cheaper per flop than any datacenter hardware by a good order of magnitude.
The problem is that expectations rise in datacenters, hardware/power/security/availability guarantees cost real money. Then the operator providing these guarantees expects some margin.
You can see this most clearly with "developer desktops", a gcp instance costs about 10x a hetzner instance which costs between 5 and 10x the same hardware sitting in the back of an office somewhere. While all of these premiums matter for 24/7 systems under active development, they don't really matter for ephemeral small scale workloads.
Twenty years ago, I don't think any of us were excited about a future internet where we couldn't trust whether what we were seeing or reading was genuine. I hope one day we'll be able to look back on this era as an aberration, like that scene in Mad Men where the Drapers fling their picnic rubbish onto the grass and drive away.
It seems to me the era of being able to trust pictures was an aberration. Before the camera images created in any form might depict something that really happened, an exaggeration, or a total fabrication. The camera represented a technological leap that made capturing reality significantly more easy than faking it, though faking it was never actually all that hard. Now technology has progressed again and we're back where we started. Any image might be real, edited, or totally fabricated, and we can no longer fool ourselves about "photographic evidence." Trust is and always will be about credibility of the claimant. Additional evidence is itself only as trustworthy as its providence. An attempt to destroy the ability to create images that resemble photographs is doomed to fail and wrongheaded to begin with. The only reason such an idea would occur to someone is they were born in an aberrant era where the culture had ingrained in them the semi-grounded belief that certain types of images are representative of reality. That wasn't the case historically and won't be again.
Twenty years ago my teachers were telling me not to use Wikipedia because you can't trust anything on the internet. You should never date someone you met through an app or website because they are 100% murderers. "The internet is for porn". Things have a way of improving over time, and people always overestimate societal risks of new tech in the beginning.
Young girls suicides, Brexit&Trump (post-truth politics, general democratic decline), demographic catastrophe, obesity crisis are often partially attributed to social media.
Not saying that tech is inherently evil, these could have been prevented, but to me it seems we have underestimated the social risks and failed to regulate accordingly.
You should reconsider the scope. Nobody is losing sleep over Airbnb pics.
We’re in an era now where every image and video (and for that matter audio) is potentially fake; where knowing what’s real and true is no longer possible.
> I think the inability to see the freedom AI gives people is one of the saddest things I've seen.
No one’s failing to see the good things, hypothetical or not. Most of us are aware just fine, we just don’t all agree that the negative trade-offs are worth it.
I enjoy the technology too, but the tradeoffs are pretty grim. It takes stepping outside of my bubble to see it in full force, but AI misinformation is already rampant.
I think it may turn out postive; That the less we are able to take images and video at face value the better.
Motivated actors have been able to doctor, fake, or spin media content since time immemorial. But peoples default mode was to trust what they saw. Now that fake imagery is ubiquitous, maybe we'll all get a bit more skeptical.
being skeptical and determining the truth takes a lot of work. I fear that we may just refuse to wade through all the lies and just accept a enforced willful ignorance.
You don't remember the discussion around Narrative Science (https://en.wikipedia.org/wiki/Narrative_Science) then. They were a university spin-out that could write plausible-sounding baseball news articles (and later finance) from the stats. Their software enabled local news websites to publish articles about every game, which was seen as a boon to sports fans and a key driver for web traffic. There was a lot of criticism about how it wasn't 'real' though.
For as long as we've had computers people have tried to make them sound human. It's not a new thing that people are concerned about knowing if they're talking to (or reading) a robot imitating a person.
> could write plausible-sounding baseball news articles (and later finance) from the stats
Back in the day, baseball commentators sometimes did this for live games they couldn't see based on very limited information they were being passed. One such commentator was .. Ronald Reagan.
Literally the first thing I wrote after OpenAI's chat completions API came out was a Python script that took in a JSON description of a football (soccer) game from an API and used gpt-3.5-turbo to generate an article about it.
I don't know what else you'd call the widespread and enthusiastic adoption of a technology that is designed to exploit people's trust in the veracity of images by mimicking reality as seamlessly as possible. I think it's both aberrant and abhorrent for the tech industry to be actively developing something that's permanently polluting our information environment.
Are there any examples where it has happened in tech? Maybe internet Pop-ups are closest, that are now automatically blocked. But seems unlikely that we would not use image generation. Just not trust any image by default.
There's always existed misinformation in text and in images. It's been possible to manipulate photographs for as long as photography has been around. It's becoming easier for sure, but it's not really a qualitative change. Trusting anything you saw on the internet twenty years ago would have been as ridiculous as it is today.
This is seriously underplaying it. It's become trivial to generate and inundate the internet with fake content (either for laughs, for internet points, or for more nefarious purposes). Manipulating photos required a lot of skill to make something plausible. We're reaching a point (if we're not there yet) where most content produced on the internet is fake.
I am pretty excited. The factuality of important events has been distorted for most of history. Moving to a low information trust society is something that I think will be positive.
I don’t see it leading anywhere but a flat earth. When no one can be trusted whoever can tell you want to hear will be who people listen to and snake oil salesmen will reign supreme. Even if he was CIA, Cronkite’s world was closer to the truth than Alex Jones’.
Low trust societies are poorer because everyone has to spend much more effort on verifying everything. People give up business opportunities because they can't trust their partners. It becomes more nepotistic because people trust family over strangers.
Low information trust societies get destroyed by pandemics of both physical viruses (due to anti-vax and medical distrust; we can see this happening again with Ebola) and destructive memetic lies (see 20th century fascism).
Ignore the naysayers, they are just jealous. You got it totally tight, not everyone get's it like we do. We are facing alot of backlash for our beliefs these days.
Listen, I'm hosting this Telegram channel for people like us, where we can exchange free information without media bias, share the real facts and plan coordinated activities against these poisoning mainstream scumbags.
I also have a 20% coupon code for Wamp® Wolf-Testosteron for you, Wamp® really helped me stay awake and alert in these dire times.
Got it to run on iPhone but was surprised to see they have some form of censorship and moderation on the input side on their client app. I thought a big part of local/offline AI was sovereignty, unfiltered, and censorship/bias resistance.
I saw '1-bit' and my mind first went to 1-bit dithered B&W image generation, not 1-bit model weights....
and so now I'm wondering how cool /fast / compressed a diffusion image generator could be if the images it was trained on / space it worked in was limited to 1 bit (Floyd-Steinberg / Atkinson / your favorite algo here) dithered images.
Training would surely be pretty quick and probably fit onto one modern GPU.
IME, the bottleneck when using diffusion models isn't storage space or memory, it's generation time. Lots of models will run on 8-12 GB 1080-generation GPUs onwards, or on Macs with similar memory, which are probably the bottom end from a GPU power perspective anyway. I also note that these models are marginally slower than the small FLUX.2 model they're based on.
Okay, maybe this allows running a local model on something that has a reasonably powerful GPU and limited memory, like an iPhone, but is that really a common requirement?
It's useful progress. Decent-fidelity local-scale inference means that you can create a product that generates throwaway images frequently without worrying about cost. Thus far every product I've seen that generates images is metered, which severely limits the value. I don't know if this is actually at the "decent fidelity" point yet.
We are in an era of extreme demand for GPU and limited supply. Every inference we push to the edge frees cloud resources for other tasks. Every efficiency gain increases what we can achieve with existing resources. If images can be rendered with half as much compute, we need half as many GPUs.
I think the value of it is currently more academic than useful in the real world. Everything at the frontier is still only marginally Good Enough (in image generation, most of it is shit even from the best models), so things far behind the frontier in terms of capability (as a tiny 1-bit model necessarily must be) are unusable.
But, getting remarkably higher density of capability per unit of compute is a big thing. It means the frontier can get better and cheaper to operate and less resource hungry, and it means what can be accomplished at the edge, on personal laptops or phones, becomes a broader spectrum of tasks.
And, for privacy, there are a lot of things that should run on-device and not everyone has big dedicated GPUs.
It solves part of the download issue if they actually delivers a 1-bit whole package (currently their download is around 3.5GiB, still not ideal since FLUX.2 [klein] 4B you can get a package including text encoder ~6 GiB).
For speed, no. Draw Things runs on iPhone just fine and generally faster than their implementation on the same model (FLUX.2 [klein] 4B).
Genuine question: doesn't it blow your mind that there exists a 1 Gigabyte file/program that can generate any image you can think of just from a rough description of it?
Their 1-bit quantized Diffusion Transformer is just under 1 GB. You also need the text-encoder (4-bit quantized) and VAE (unquantized) for inference and their combined weight is ~3.42 GB.
Yeah, it's pretty incredible. And I guess that's mostly what's behind the question: whether this is more of an impressive research/technique demonstrator, or a real product advancement solving a need.
> doesn't it blow your mind that there exists a 1 Gigabyte file/program that can generate any image you can think of just from a rough description of it?
I can make this into a 5-lines Python program. I’m not saying the images will match the description, but that isn’t part of your spec ;)
It’s like asking how did Memoji generation on iPhone solved a real problem?
It does not need to directly solve any particular problem to be overall good for consumers, by putting pressure to all those subscription based solutions… at least it’s private and does not require you to provide all your data…
> Lots of models will run on 8-12 GB 1080-generation GPUs onwards, or on Macs with similar memory, which are probably the bottom end from a GPU power perspective anyway.
Not the bottom end - most people are on laptops or mobile devices that are much lower GPU power than this.
Probably the bottom end an individual would want to consider using due to slow generation time.
Sure, you could theoretically take a model compressed in this manner and deploy it on an old netbook and run the calculations on the CPU, but each image would probably take an hour…
Yes, size and performance are not only problems for local LLMs, they are problems for frontier LLM companies like OpenAI and Anthropic. The latter still lose a ton of money on inference and advances in efficient, performant models helps their bottom line.
Yes its a huge deal because these are starting to get bound by memory bandwidth not compute. therefore one bit wirfhts stream way faster leading to substantially better results. At least thats what Id guess!
Not quite as I understand it. The ternary approach bonsai uses leverages a FP16 scaling factor that each value in the ternary maps to. You're still using 16 bit multiplication, it's just that the weights are far more compressed.
> To our knowledge, Bonsai Image 4B is the first image model in its parameter class to run directly on an iPhone.
This is wrong. But they worded it carefully to be not entirely wrong.
FLUX.2 [klein] 4B (the same parameter class, basically the same model) runs on iPhone through Draw Things app, with 8-bit or 6-bit quantization (hence not "directly", I guess, but that is the technicality that sounds fishy enough).
Couldn't try it because the demo app is iOS only and the web version just crashes my browser. The small model is impressive but if you front load a 1.8GB text encoder model, the savings aren't quite as useful.
I extracted the code from the web demo to add to make a web image generation node to my in browser ai workflow tool, and it’s pretty sweet. Waiting for xenova to add to transformersjs 4.3 and I’ll release as well. Couldn’t wait though to test.
can you describe your "in browser ai workflow tool"? I may or may not be working on something similar and am very interested in what others are building in the space.
I believe it's the way the HN algorithm works. In order to give new and obscure posts a shot, it will add them to peoples feeds in their front page and see how they measure. Otherwise new posts wouldn't get seen and the flywheel would never get started.
So everyone acts as a sort of beta tester for obscure posts.
On weekends, yes. During the week, that’s also true if they arrive within a short time frame, e.g., three minutes. Almost no one looks at “New”. That is the real issue.
The white paper says "mean-active memory pressure down to 1.95 GB for 1-bit Bonsai Image 4B and 2.38 GB for
Ternary Bonsai Image 4B". Storage is on the linked page, and is about half that.
That is very low, looks like it should run in base MacMini M4 with 16GB RAM. I understand it is not released yet? What sort of harness is necessary for this type of model? (I have only used coding agents through GH Copilot in VS Code, the JetBrains AI tool and Pi, this last one was sort of a pain to setup…)
Stuff like this is great - more promises of things that can run on phones please!
Sadly right now the expensive developer subscription means the few folks willing to hold a forever subscription make something that barely works then move on… or make something with so many ads it is an app. For example Google’s “Model Garden” app has no ads but still has major UX issues and isn’t suitable for daily use, even though the models are amazing.
Raising awareness of how capable today’s phone hardware is will make normal people demand to run what they choose on their phones. It’d be a much stronger way back to general purpose computing than via all legislation that has been tried so far..
I run a moderately popular image comparison benchmark site called GenAI Image Showdown [1]. You can click “View All Models” and filter the list down to just locally runnable options (Flux, Qwen, Hunyuan, etc.).
Except the two (GPT-Image-2 and Nano Banana Pro), anything displayed here can run on the 16 GiB MacBook (including the FLUX.2 [dev]): https://tests.drawthings.ai/generate
Just a side note, that this website is classified by Apple as an Adult website. I have Limit Adult Websites set in Content & Privacy Restrictions switched on.
Led me to wonder what happens if a domain gets a new owner, and they want to petition Apple to remove the block.
what trade off would one need to clear to justify the hardware and the work to get this running locally as part of a broader system? It’s a lot of work setting up and maintaining a production harness/system on a local device. I don’t personally repeatedly generate images at a scale where using a lab’s app somehow burns all my tokens. I like the ideas of local ai but I don’t see widespread adoption of it happening in commercial or customer situations anytime soon no matter how little/good enough they get. Even Uber- token burn whiplash but I doubt their answer will be “run some of it local”. IT nightmare, I’d imagine.
This is why I don't think the big AI companies and nvidia will dominate the market. AIs will just run locally, on whatever hardware you have. Perhaps that's why they worked on this yet-to-be-defined partnership with ARM.
Can't speak for browser demos, but I just got the ternary model working on my M5 generating images. The 1 bit didn't work, as it has a known bug with XCode 24.5 and I wasn't in the mood for installing 24.4 alongside.
The online demos require WebGPU so Firefox on mobilr and privacy enhanced browsers will break. WebGPU support on Linux and other open source systems is also trash, you can force it to work in Chrome but it won't be happy.
I actually can’t wait for the future where I upgrade hardware in order to upgrade my ai as an alternative to an expensive subscription.
There are many problems I want to work on which require billions of tokens. These are completely inaccessible without corporate project sponsorship at the moment. An asic generation machine which can pump out a few 10s of thousands of tokens per second at opus4.6 quality is more than sufficient.
A company called Taalas is working on something like that. Not Opus4.6 quality, but I'm sure they're targeting larger models. Currently they're using a LLama 8B model. It runs at ~17k tokens per second, and you can test it at https://chatjimmy.ai/.
I'm rooting for them HARD but they've been quiet since their last (and only) blog. X and LinkedIn are empty too. I really hope it wasn't a pipe dream.
It starts to be interesting when latency is better than average website.
2 replies →
That's cool, I just tested it out and it is fast but unfortunately its accuracy is not great.
1 reply →
Round Robin the free tier APIs, should be effectively free. Just say say “sike” if discussing sensitive issues so the LLM never flags you.
I'm curious how hardware and power cost would stack up to subscription cost
Right now - there's some heavily subsidized subscriptions that are more or less cheating. For instance, Github CoPilot at $39/month gives you claude opus 4.6. They're going to close that off, but right now it's like a freebie for those doing API agentic harnesses.
That said, if you are doing always on agents and you spend $3k-$4k on a GB10 or, $5+ k on Apple Silicon as your sunk cost, you will probably come out ahead.
I've got 5 agents running a purely experimental social experiment. AThey operate in an evennia mud (a familiar sounding city called "gothmud). I've built a channel, idle prompts, sleep schedule. I feed in real world news, weather. There's a character up in a clock tower that reads evennia's audit logs every 20 minutes to surveil the city, and a cast of people wandering around, investigating things, having coffee, repairing robots. This is all hitting qwen3.6-35-A3B on the Asus GB10, which cost me $3k.
Over the last 30 days, I've hit 394M input tokens, 1.6B output tokens. I would have spent between $1600 to $1700 if I was using openrouter. Not calculated - I also have comfyui running in the spare space, and the agents "take photos" of the rooms they're in, selfies, workshop photos, etc.
How much did I spend on electricity? I don't have a meter on my box. My total electric bill for the last 30 days was $220, so I know it's less than that. My rate to compare is 11.7/kwh, but it's closer to 15c/Kwh total. The Asus GX10 has a 240W power supply, and it's probably only pulling 180. I estimate $15-$20/month. But worst case red-lining. 240 Watts, 720 hours = 172KWH , and at $0.20, I come to $35
Here's the kicker thought - that github copilot subscription I mentioned? I have another agent running on that, reading all my other agent logs, managing my obsidian notes, doing research, sending briefings. And all by itself, it used almost the same amount of claude-opus tokens for that $39/month subscription. I was actually a bit shocked when I pulled a recent report and saw that. I'm working to migrate functionality away from copilot subscription to the local model. A lot of the initial setup might have needed it, but not the ongoing review style work it does.
6 replies →
For open models, usually not well. You get 5+ providers competing on cost, all with cheaper electricity and better hardware utilization than your local setup
I did an estimate of that if you're interested: https://x.com/pwnies/status/2028831699736637912
The TL;DR though is that a 10-15b param model baked into an ASIC with the latest fab tech would take around 62W of power draw when active. At ~10k+ t/s though it likely would only be active for short bursts of time. It'd fit perfectly fine within the thermal envelope of a laptop.
The approach makes a lot of sense. Once you get to those speeds, latency of the network becomes one of the bigger bottlenecks, so local has a real advantage over a subscription.
4 replies →
Can you give an example of such a problem?
"Design me a 3d printable rocket engine for a hobby rocket project. Verify it's design in a full simulation. Iterate until it works reliably in simulation based on a verified printable design on a consumer laser sintering device (or substitute contract manufacture for under 1000 dollars)."
This is a hobby version of a project, but you can imagine commercial versions of the same prompt for new databases, genomics studies, material analysis, operating systems etc.
8 replies →
Decompiling a binary and recreating the source, doing a full line-by-line security audit, always-on agents monitoring state minute-by-minute, etc.
I would very easily find ways to hit that level of token usage if it was cheaper/faster.
Not OP but if I had a couple RTX 6000 I'd throw them at decompiling bloodborne to play on PC without emulation.
Ok heres the thing you will nevwr be able to truly do this due to logic.
Logically five people pooling their resources beats one guy.
therefore datacenters will always win because they get higher time utilization.
so forget it.
I always wonder the same but i let logic tell me its a fantasy, on average you cant outspend a whole group of people making better use of the hardware.
you will get better hardware though, cutting edge will always be cloud
Laptops/desktops are cheaper per flop than any datacenter hardware by a good order of magnitude.
The problem is that expectations rise in datacenters, hardware/power/security/availability guarantees cost real money. Then the operator providing these guarantees expects some margin.
You can see this most clearly with "developer desktops", a gcp instance costs about 10x a hetzner instance which costs between 5 and 10x the same hardware sitting in the back of an office somewhere. While all of these premiums matter for 24/7 systems under active development, they don't really matter for ephemeral small scale workloads.
2 replies →
Just like cloud is "cheaper" than colo/metal, right?
> cutting edge will always be cloud
Don't think anyone was refuting that?
And of course when you pool resources you have access to more resources.
1 reply →
> so forget it.
Which explains why you're using a dumb terminal to access compute services?
1 reply →
Where I think you're wrong is that everything in technology has been cyclical, it's just a matter of time.
Twenty years ago, I don't think any of us were excited about a future internet where we couldn't trust whether what we were seeing or reading was genuine. I hope one day we'll be able to look back on this era as an aberration, like that scene in Mad Men where the Drapers fling their picnic rubbish onto the grass and drive away.
It seems to me the era of being able to trust pictures was an aberration. Before the camera images created in any form might depict something that really happened, an exaggeration, or a total fabrication. The camera represented a technological leap that made capturing reality significantly more easy than faking it, though faking it was never actually all that hard. Now technology has progressed again and we're back where we started. Any image might be real, edited, or totally fabricated, and we can no longer fool ourselves about "photographic evidence." Trust is and always will be about credibility of the claimant. Additional evidence is itself only as trustworthy as its providence. An attempt to destroy the ability to create images that resemble photographs is doomed to fail and wrongheaded to begin with. The only reason such an idea would occur to someone is they were born in an aberrant era where the culture had ingrained in them the semi-grounded belief that certain types of images are representative of reality. That wasn't the case historically and won't be again.
Twenty years ago my teachers were telling me not to use Wikipedia because you can't trust anything on the internet. You should never date someone you met through an app or website because they are 100% murderers. "The internet is for porn". Things have a way of improving over time, and people always overestimate societal risks of new tech in the beginning.
Young girls suicides, Brexit&Trump (post-truth politics, general democratic decline), demographic catastrophe, obesity crisis are often partially attributed to social media.
Not saying that tech is inherently evil, these could have been prevented, but to me it seems we have underestimated the social risks and failed to regulate accordingly.
8 replies →
I LOVE being able to create images of things I could previously only imagine.
I think the inability to see the freedom AI gives people is one of the saddest things I've seen.
I remember when the internet was young people would complain about how it was becoming read-only.
Now we have a tool to let people express themselves and people complain the fact there are fake pics on AirBNB means the collapse of society. Please!
>I LOVE being able to create images of things I could previously only imagine. Did you only just get access to a pencil?
1 reply →
You should reconsider the scope. Nobody is losing sleep over Airbnb pics.
We’re in an era now where every image and video (and for that matter audio) is potentially fake; where knowing what’s real and true is no longer possible.
3 replies →
> I think the inability to see the freedom AI gives people is one of the saddest things I've seen.
No one’s failing to see the good things, hypothetical or not. Most of us are aware just fine, we just don’t all agree that the negative trade-offs are worth it.
I enjoy the technology too, but the tradeoffs are pretty grim. It takes stepping outside of my bubble to see it in full force, but AI misinformation is already rampant.
I think it may turn out postive; That the less we are able to take images and video at face value the better.
Motivated actors have been able to doctor, fake, or spin media content since time immemorial. But peoples default mode was to trust what they saw. Now that fake imagery is ubiquitous, maybe we'll all get a bit more skeptical.
The death of consensus reality is also the death of democratic politics. Too many people regard that as a positive.
3 replies →
being skeptical and determining the truth takes a lot of work. I fear that we may just refuse to wade through all the lies and just accept a enforced willful ignorance.
1 reply →
You don't remember the discussion around Narrative Science (https://en.wikipedia.org/wiki/Narrative_Science) then. They were a university spin-out that could write plausible-sounding baseball news articles (and later finance) from the stats. Their software enabled local news websites to publish articles about every game, which was seen as a boon to sports fans and a key driver for web traffic. There was a lot of criticism about how it wasn't 'real' though.
Slate published this about it in 2012: https://slate.com/technology/2012/03/narrative-science-robot...
For as long as we've had computers people have tried to make them sound human. It's not a new thing that people are concerned about knowing if they're talking to (or reading) a robot imitating a person.
> could write plausible-sounding baseball news articles (and later finance) from the stats
Back in the day, baseball commentators sometimes did this for live games they couldn't see based on very limited information they were being passed. One such commentator was .. Ronald Reagan.
Literally the first thing I wrote after OpenAI's chat completions API came out was a Python script that took in a JSON description of a football (soccer) game from an API and used gpt-3.5-turbo to generate an article about it.
I was surprised how well it worked, even then.
I didn't say there wasn't bot-generated content in the past. I said we weren't excited about a future where it was de rigeur.
>we couldn't trust whether what we were seeing or reading was genuine
You could never trust ANYTHING you read or see on the internet, this isn't new. There are thousands of old hoaxes that many people still believe.
Aberration? That seems like an extreme overreaction.
I don't know what else you'd call the widespread and enthusiastic adoption of a technology that is designed to exploit people's trust in the veracity of images by mimicking reality as seamlessly as possible. I think it's both aberrant and abhorrent for the tech industry to be actively developing something that's permanently polluting our information environment.
Here's a local story published after I made my comment, about tour operators using AI images to misrepresent destinations in the area: https://www.abc.net.au/news/2026-06-01/ai-videos-spark-conce...
Increasing the availability of fake image generators directly enables more harms like these.
1 reply →
The picnic scene: https://www.youtube.com/watch?v=FDIvzDGBLWU
Are there any examples where it has happened in tech? Maybe internet Pop-ups are closest, that are now automatically blocked. But seems unlikely that we would not use image generation. Just not trust any image by default.
There's always existed misinformation in text and in images. It's been possible to manipulate photographs for as long as photography has been around. It's becoming easier for sure, but it's not really a qualitative change. Trusting anything you saw on the internet twenty years ago would have been as ridiculous as it is today.
> It's becoming easier for sure
This is seriously underplaying it. It's become trivial to generate and inundate the internet with fake content (either for laughs, for internet points, or for more nefarious purposes). Manipulating photos required a lot of skill to make something plausible. We're reaching a point (if we're not there yet) where most content produced on the internet is fake.
I am pretty excited. The factuality of important events has been distorted for most of history. Moving to a low information trust society is something that I think will be positive.
I don’t see it leading anywhere but a flat earth. When no one can be trusted whoever can tell you want to hear will be who people listen to and snake oil salesmen will reign supreme. Even if he was CIA, Cronkite’s world was closer to the truth than Alex Jones’.
1 reply →
Curious about this take, how do you mean?
I understand the point of distorted facts, but what I’m not sure how things are improved by basically having no trust in any facts?
5 replies →
Low trust societies are poorer because everyone has to spend much more effort on verifying everything. People give up business opportunities because they can't trust their partners. It becomes more nepotistic because people trust family over strangers.
Low information trust societies get destroyed by pandemics of both physical viruses (due to anti-vax and medical distrust; we can see this happening again with Ebola) and destructive memetic lies (see 20th century fascism).
Ignore the naysayers, they are just jealous. You got it totally tight, not everyone get's it like we do. We are facing alot of backlash for our beliefs these days.
Listen, I'm hosting this Telegram channel for people like us, where we can exchange free information without media bias, share the real facts and plan coordinated activities against these poisoning mainstream scumbags.
I also have a 20% coupon code for Wamp® Wolf-Testosteron for you, Wamp® really helped me stay awake and alert in these dire times.
It's true: Wamp, it really whips the Llama's ass!
What could make it stop?
well we could [HN terms of service violation]
1 reply →
Got it to run on iPhone but was surprised to see they have some form of censorship and moderation on the input side on their client app. I thought a big part of local/offline AI was sovereignty, unfiltered, and censorship/bias resistance.
I saw '1-bit' and my mind first went to 1-bit dithered B&W image generation, not 1-bit model weights....
and so now I'm wondering how cool /fast / compressed a diffusion image generator could be if the images it was trained on / space it worked in was limited to 1 bit (Floyd-Steinberg / Atkinson / your favorite algo here) dithered images.
Training would surely be pretty quick and probably fit onto one modern GPU.
I think you'd still be better off training in greyscale and dithering after the fact.
This was exactly where my mind went as well and I think there would be some really cool ideas to explore here
Genuine question: is this solving a real problem?
IME, the bottleneck when using diffusion models isn't storage space or memory, it's generation time. Lots of models will run on 8-12 GB 1080-generation GPUs onwards, or on Macs with similar memory, which are probably the bottom end from a GPU power perspective anyway. I also note that these models are marginally slower than the small FLUX.2 model they're based on.
Okay, maybe this allows running a local model on something that has a reasonably powerful GPU and limited memory, like an iPhone, but is that really a common requirement?
It's useful progress. Decent-fidelity local-scale inference means that you can create a product that generates throwaway images frequently without worrying about cost. Thus far every product I've seen that generates images is metered, which severely limits the value. I don't know if this is actually at the "decent fidelity" point yet.
We are in an era of extreme demand for GPU and limited supply. Every inference we push to the edge frees cloud resources for other tasks. Every efficiency gain increases what we can achieve with existing resources. If images can be rendered with half as much compute, we need half as many GPUs.
… or generate twice as many images. Maybe not quite, but if we’ve seen anything with AI so far is that it fits Parkinson’s law pretty well.
I think the value of it is currently more academic than useful in the real world. Everything at the frontier is still only marginally Good Enough (in image generation, most of it is shit even from the best models), so things far behind the frontier in terms of capability (as a tiny 1-bit model necessarily must be) are unusable.
But, getting remarkably higher density of capability per unit of compute is a big thing. It means the frontier can get better and cheaper to operate and less resource hungry, and it means what can be accomplished at the edge, on personal laptops or phones, becomes a broader spectrum of tasks.
And, for privacy, there are a lot of things that should run on-device and not everyone has big dedicated GPUs.
It solves part of the download issue if they actually delivers a 1-bit whole package (currently their download is around 3.5GiB, still not ideal since FLUX.2 [klein] 4B you can get a package including text encoder ~6 GiB).
For speed, no. Draw Things runs on iPhone just fine and generally faster than their implementation on the same model (FLUX.2 [klein] 4B).
Genuine question: doesn't it blow your mind that there exists a 1 Gigabyte file/program that can generate any image you can think of just from a rough description of it?
Where are you getting the 1 Gigabyte number from?
Their 1-bit quantized Diffusion Transformer is just under 1 GB. You also need the text-encoder (4-bit quantized) and VAE (unquantized) for inference and their combined weight is ~3.42 GB.
TBF, even at that size it's no less mind blowing.
2 replies →
Yeah, it's pretty incredible. And I guess that's mostly what's behind the question: whether this is more of an impressive research/technique demonstrator, or a real product advancement solving a need.
[dead]
> doesn't it blow your mind that there exists a 1 Gigabyte file/program that can generate any image you can think of just from a rough description of it?
I can make this into a 5-lines Python program. I’m not saying the images will match the description, but that isn’t part of your spec ;)
It’s like asking how did Memoji generation on iPhone solved a real problem?
It does not need to directly solve any particular problem to be overall good for consumers, by putting pressure to all those subscription based solutions… at least it’s private and does not require you to provide all your data…
> Lots of models will run on 8-12 GB 1080-generation GPUs onwards, or on Macs with similar memory, which are probably the bottom end from a GPU power perspective anyway.
Not the bottom end - most people are on laptops or mobile devices that are much lower GPU power than this.
Probably the bottom end an individual would want to consider using due to slow generation time.
Sure, you could theoretically take a model compressed in this manner and deploy it on an old netbook and run the calculations on the CPU, but each image would probably take an hour…
2 replies →
Yes, size and performance are not only problems for local LLMs, they are problems for frontier LLM companies like OpenAI and Anthropic. The latter still lose a ton of money on inference and advances in efficient, performant models helps their bottom line.
For free users, I guess local generation is going to be faster than waiting in a queue.
Yes its a huge deal because these are starting to get bound by memory bandwidth not compute. therefore one bit wirfhts stream way faster leading to substantially better results. At least thats what Id guess!
ideally if ternary models work, the math is extremely easy for computers (addition/subtraction vs 16 bit multiplication)
Not quite as I understand it. The ternary approach bonsai uses leverages a FP16 scaling factor that each value in the ternary maps to. You're still using 16 bit multiplication, it's just that the weights are far more compressed.
1 reply →
they have a webGPU demo [0] at 4 steps it takes 7 seconds to generate an image on my M4
https://huggingface.co/spaces/webml-community/bonsai-image-w...
> To our knowledge, Bonsai Image 4B is the first image model in its parameter class to run directly on an iPhone.
This is wrong. But they worded it carefully to be not entirely wrong.
FLUX.2 [klein] 4B (the same parameter class, basically the same model) runs on iPhone through Draw Things app, with 8-bit or 6-bit quantization (hence not "directly", I guess, but that is the technicality that sounds fishy enough).
They call it a diffusion model, but it's based on Flux.2 which is a rectified flow model.
Personally I think it's fine to use "diffusion" to refer to the whole family of models
https://github.com/kordless/bonsai-docker if you want to run without fiddling with the local filesystem.
Within a day, someone will have trained a LoRA for this 1-bit model that enables hentai content generation on your Apple Watch.
Great.
To our knowledge, Bonsai Image 4B is the first image model in its parameter class to run directly on an iPhone.
Isn't SD XL 3.5B? And the refiner model is even larger. Those can run on an iPhone 13 Pro.
Couldn't try it because the demo app is iOS only and the web version just crashes my browser. The small model is impressive but if you front load a 1.8GB text encoder model, the savings aren't quite as useful.
I do wonder how these compare to existing image generation models. I've tried https://github.com/alichherawalla/off-grid-mobile-ai for a while but I find the image generation models rather lacking.
I extracted the code from the web demo to add to make a web image generation node to my in browser ai workflow tool, and it’s pretty sweet. Waiting for xenova to add to transformersjs 4.3 and I’ll release as well. Couldn’t wait though to test.
can you describe your "in browser ai workflow tool"? I may or may not be working on something similar and am very interested in what others are building in the space.
Lately I've noticed posts with barely 10 points getting to HN frontpage. Was it always like this?
I believe it's the way the HN algorithm works. In order to give new and obscure posts a shot, it will add them to peoples feeds in their front page and see how they measure. Otherwise new posts wouldn't get seen and the flywheel would never get started.
So everyone acts as a sort of beta tester for obscure posts.
On weekends, yes. During the week, that’s also true if they arrive within a short time frame, e.g., three minutes. Almost no one looks at “New”. That is the real issue.
Maybe the algorithm has some kind of "momentum" to it, taking into consideration the velocity of upvotes.
Not as much competition on the weekend?
If you are looking to see the "true" HN frontpage (i.e. most upvoted posts), I'd recommend using https://hckrnews.com
If you want a list of posts simply ordered by upvotes: https://news.ycombinator.com/best
I just assume bots
Bots doing what? How would the poster being a bot influence why the post itself makes it to the front page with just 10 points?
1 reply →
Anyone could pickup the minimal hardware requirements for this? Like both RAM and Storage?
The white paper says "mean-active memory pressure down to 1.95 GB for 1-bit Bonsai Image 4B and 2.38 GB for Ternary Bonsai Image 4B". Storage is on the linked page, and is about half that.
That is very low, looks like it should run in base MacMini M4 with 16GB RAM. I understand it is not released yet? What sort of harness is necessary for this type of model? (I have only used coding agents through GH Copilot in VS Code, the JetBrains AI tool and Pi, this last one was sort of a pain to setup…)
1 reply →
For ternary mlx, size on disk is 3.8GB. 512x512 peak memory use is ~3.7
Stuff like this is great - more promises of things that can run on phones please!
Sadly right now the expensive developer subscription means the few folks willing to hold a forever subscription make something that barely works then move on… or make something with so many ads it is an app. For example Google’s “Model Garden” app has no ads but still has major UX issues and isn’t suitable for daily use, even though the models are amazing.
Raising awareness of how capable today’s phone hardware is will make normal people demand to run what they choose on their phones. It’d be a much stronger way back to general purpose computing than via all legislation that has been tried so far..
Is there a benchmark of local image generation models? Local = can run on a 16 GB MacBook or 8 GB+ NVIDIA card.
I run a moderately popular image comparison benchmark site called GenAI Image Showdown [1]. You can click “View All Models” and filter the list down to just locally runnable options (Flux, Qwen, Hunyuan, etc.).
https://genai-showdown.specr.net
Except the two (GPT-Image-2 and Nano Banana Pro), anything displayed here can run on the 16 GiB MacBook (including the FLUX.2 [dev]): https://tests.drawthings.ai/generate
I wonder why they didn't use a Bonsai model as the text encoder
I've tested this and it's not as good as Flux in my opinion.
Can anyone think of any negative externalities of making generative photorealistic images illegal?
I can think of a lot of positives. The negatives amount to a convoluted argument about the limits of free speech.
Prisoner 1: so, what are you in for?
Prisoner 2: I made a picture of a nice sunset over the ocean
If it were illegal it wouldn’t be readily available. You’d have to seek it out. People seeking it out wouldn’t be using it to generate a sunset.
1 reply →
Odd… UK visitor and I get:
Website Not Allowed “prismml.com” is a restricted website.
Just a side note, that this website is classified by Apple as an Adult website. I have Limit Adult Websites set in Content & Privacy Restrictions switched on.
Led me to wonder what happens if a domain gets a new owner, and they want to petition Apple to remove the block.
what trade off would one need to clear to justify the hardware and the work to get this running locally as part of a broader system? It’s a lot of work setting up and maintaining a production harness/system on a local device. I don’t personally repeatedly generate images at a scale where using a lab’s app somehow burns all my tokens. I like the ideas of local ai but I don’t see widespread adoption of it happening in commercial or customer situations anytime soon no matter how little/good enough they get. Even Uber- token burn whiplash but I doubt their answer will be “run some of it local”. IT nightmare, I’d imagine.
This is cool and all but is there a real use case for these? One that actually creates value?
A few implementations listed on LM Studio. Any recommendations for which one to use?
Very interested to see where this kind of work goes for on-device video generation!
I was expecting to see images of Bonsai trees when I clicked this
I expected a small tree in black and white pixel art.
Is there a way to run it on Vulkan?
No. Sadly, NVIDIA killed any kind of compute via Vulkan.
I took few minutes to try to make it work on ROCm (AMD's alternative to CUDA), landed in python dependency hell.
> NVIDIA killed any kind of compute via Vulkan
What do you mean? They are the ones introducing the matmul extensions to Vulkan, which makes compute like this possible
The text encoder is still 4-bit quantized.
This is why I don't think the big AI companies and nvidia will dominate the market. AIs will just run locally, on whatever hardware you have. Perhaps that's why they worked on this yet-to-be-defined partnership with ARM.
Using the demo and typing in "A sign that says xxxx" where xxxx is any text, it gets it wrong almost 100% of the time.
Does anyone ever get their stuff to actually work. Like actually load?
Can't speak for browser demos, but I just got the ternary model working on my M5 generating images. The 1 bit didn't work, as it has a known bug with XCode 24.5 and I wasn't in the mood for installing 24.4 alongside.
Here's a generation in your honor: https://peterc.org/img/johndoe.png
The online demos require WebGPU so Firefox on mobilr and privacy enhanced browsers will break. WebGPU support on Linux and other open source systems is also trash, you can force it to work in Chrome but it won't be happy.
Yeah worked fine in browser.
NVIDIA Card Firefox wayland
Question,
Is it compatible with Ollama, ComfyUI or are those providers unneeded, compatible with low-end hardware?
Also, where does "./setup.sh/ drop the components in Linux?
Thank you, Sol
impressive, combines a couple techniques that I always wanted the frontier models to have
having trouble loading the webgl browser demo on my phone but no biggy
[flagged]
[dead]
[dead]