I find this very curious. I don’t think agents care about UI, humans do. So in the end the UI is not required. As soon as the ai can get into the physical world. The whole IT world is done for. All of this will be automated away. The IT and CS only ever started to make us more productive more connected to improve our physical well being. When we don’t need to touch computers anymore there is no need for …
Vision language models have been trained on how to operate human UIs though, so at least for a while, computer use will be an interesting area to explore. I think debugging web apps and building UIs is a particularly fruitful area for this
I’m curious how far are we from giving coding agents access to these desktop agents so that when we are using say Claude Code to build a native desktop app, the coding agents can actually see and act on the desktop UI that it is building
This is a great point. Not that far. We also snapshot the desktop for "slow" non-streaming updates to the UI. We could push these into Claude itself to act on or describe or whatever.
For web apps, I'd guess that many of us already do that via Playwright or other MCPs. I'd bet there are people doing something similar with desktop apps too.
We're opening the private beta where we provide a hosted environment for testing, or you can install the latest Helix release and run the installer with --code to try it on your own GPUs
Wolf now supports multiple clients connecting to the same session via the wolf-ui branch that landed recently. After lots of stability work we are now running that mode in production (and in the latest release) https://github.com/helixml/helix/releases/tag/2.5.3
I commend the fact they acknowledge the maintainer's work, but seeing the singular 'maintainer', I can't help but notice the weight on that one person's shoulders.
IMHO, the goals is not to have to watch what agents do and let them do the work.
I would personally invest into making agents more autonomous (yes hard problem today) then building a desktop video session protocol to watch them do the work.
>I would personally invest into making agents more autonomous (yes hard problem today) then building a desktop video session protocol to watch them do the work.
Seems difficult to research better autonomy without extensive monitoring. You need specific data on before/after effects of changes, for instance.
A career change that left me as a recent graduate in a decimated marketplace missing the bottom ten rungs on the ladder and no interest in getting back into the software world has led me to advanced manufacturing as a metal worker. I code a little, move heavy steel pieces periodically which is a nice way to break up the standing/sitting but not nearly as much as a general laborer, solve lots of problems, keep my trigonometry muscles toned, am forced to take breaks, get paid for my overtime, there’s a union that the company ownership is totally willing to work with, and when I’m not at work, work isn’t with me. There’s something very satisfying about leaving work with exercised muscles, smelling slightly of cutting oil. The money sucks comparatively so early in my career, but the rate increases more for performance than seniority so its rising quickly, the benefits are good, the career trajectory is pointing upwards, and longevity-wise, it’s certainly a whole lot better than gig work.
There’s a huge crisis in US manufacturing: we’re bleeding craft knowledge because off-shoring let companies hire existing experienced workers for decades, so they never had to train a new generation of tradespeople. Now all those folks are dying and retiring and they need people to pick up that deep knowledge quickly. Codifying and automating is going to kill jobs either way, but one factory employing a few people making things for other factories with local materials is better than everything perpetually shifting to the cheap labor market du jour. I’m feeling much more optimistic about the future of this than the future of tech careers.
I think over the next few years, a very large percentage of folks in tech will find themselves on the other side of the fence, quickly realize that their existing expertise doesn’t qualify them for any other white collar jobs where vibe coding experience is a bullet point in the pluses section, that tech consulting is declining even faster than salaried jobs, and that they’re vastly less qualified than the competition for blue collar jobs. Gonna be a rough road for a lot of folks. I wouldn’t invest in SF real estate any time soon.
There’s 1,000 established industries that don’t offer the rapid growth and pay outs of the modern tech ecosystem. I’m excited to see some of the current industrial backwaters soak up technical talent freed up by the SV AI brain drain.
To think we’ve handsomely paid our best and brightest the last few decades in pursuit of.. advertising?
You will always be able to produce artisanal hand-set code, same as how artisanal woodworking exists alongside industrial manufacturing. There will be a lot less demand for it, and compensation will align accordingly, but it won't go away.
Either the crap truly works and nobody is needed or it does not work. Where is this half-arsed human-agent hybrid vision coming from? The land of plateaued LLM gains?
I’ve also independently concluded Moonlight was the best way to go after trying my hand at a very similar task. I didn’t want to dig through moonlight’s source, but I’m sure if you’re dedicated enough it would pay dividends later on, it basically does everything you’d need for realtime control in the setting of simulating human input.
Hey! Yeah we are working with partners on fully integrated hardware+software stack for this. We particularly like the RTX 6000 Pro Blackwell chips for this
But is the ability to run it on the DS a feature? I highly doubt it.
I’m not trashing anything, I’m just saying that if they focused on what their market is, it would be clear no one is going to be coding/working on a Nintendo DS.
Hi, quite interesting project but have a hard time to understand why would stream a desktop.
From my (ignorant) understanding, the important part is the context of the LLM in the task. Some conversations you need visuals, some you don't. What's the advantage of giving a full desktop streaming instead of using integrations?
There's also value in being able to run multiple agents in parallel with their own isolated filesystems and runtimes. One agent won't tread on the toes of another whatever they do. You can let it loose and it doesn't matter if it breaks something, you can just spin up another one
Mainly so you can give the agent access to the desktop as well. Then it can debug your web app in Chrome Dev tools but also you can pair with it with streaming that is so good it feels local
I find this very curious. I don’t think agents care about UI, humans do. So in the end the UI is not required. As soon as the ai can get into the physical world. The whole IT world is done for. All of this will be automated away. The IT and CS only ever started to make us more productive more connected to improve our physical well being. When we don’t need to touch computers anymore there is no need for …
Vision language models have been trained on how to operate human UIs though, so at least for a while, computer use will be an interesting area to explore. I think debugging web apps and building UIs is a particularly fruitful area for this
Trying to do something similar but using kasm[0] as the backend.
[0] https://kasm.com
Fascinating, wanna compare notes on a call some time?
I’m curious how far are we from giving coding agents access to these desktop agents so that when we are using say Claude Code to build a native desktop app, the coding agents can actually see and act on the desktop UI that it is building
This is a great point. Not that far. We also snapshot the desktop for "slow" non-streaming updates to the UI. We could push these into Claude itself to act on or describe or whatever.
For web apps, I'd guess that many of us already do that via Playwright or other MCPs. I'd bet there are people doing something similar with desktop apps too.
That's the next move :-D
Author of helix code here. Here's a demo of the full system working. https://youtu.be/vVmnpcnLDGM?si=b6LxW6lmM7843LY0
We're opening the private beta where we provide a hosted environment for testing, or you can install the latest Helix release and run the installer with --code to try it on your own GPUs
https://github.com/helixml/helix/releases/tag/2.5.3
Join our discord for the beta https://discord.com/invite/VJftd844GE
Moonlight-Web? I guess it's https://github.com/MrCreativ3001/moonlight-web-stream but there's no information in the article
> Moonlight expects: Each client connects to start their own private game session
Nope, it's a Wolf design choice, eg. Sunshine allows users to concorrenly connect to the same instance/game
Wolf now supports multiple clients connecting to the same session via the wolf-ui branch that landed recently. After lots of stability work we are now running that mode in production (and in the latest release) https://github.com/helixml/helix/releases/tag/2.5.3
> The Wolf maintainer has done heroic work ...
I commend the fact they acknowledge the maintainer's work, but seeing the singular 'maintainer', I can't help but notice the weight on that one person's shoulders.
I should have said creator. He seems to have a healthy community backing him, but we should ask him!
IMHO, the goals is not to have to watch what agents do and let them do the work.
I would personally invest into making agents more autonomous (yes hard problem today) then building a desktop video session protocol to watch them do the work.
Yeah that's what we did :) https://youtu.be/vVmnpcnLDGM?si=b6LxW6lmM7843LY0
>I would personally invest into making agents more autonomous (yes hard problem today) then building a desktop video session protocol to watch them do the work.
Seems difficult to research better autonomy without extensive monitoring. You need specific data on before/after effects of changes, for instance.
xpra has video streaming and allows for sharing
Interesting, thanks!
> because we’re all going to become managers of coding agents whether we like it or not
I will join the woodworking people before that happens, thanks.
A career change that left me as a recent graduate in a decimated marketplace missing the bottom ten rungs on the ladder and no interest in getting back into the software world has led me to advanced manufacturing as a metal worker. I code a little, move heavy steel pieces periodically which is a nice way to break up the standing/sitting but not nearly as much as a general laborer, solve lots of problems, keep my trigonometry muscles toned, am forced to take breaks, get paid for my overtime, there’s a union that the company ownership is totally willing to work with, and when I’m not at work, work isn’t with me. There’s something very satisfying about leaving work with exercised muscles, smelling slightly of cutting oil. The money sucks comparatively so early in my career, but the rate increases more for performance than seniority so its rising quickly, the benefits are good, the career trajectory is pointing upwards, and longevity-wise, it’s certainly a whole lot better than gig work.
There’s a huge crisis in US manufacturing: we’re bleeding craft knowledge because off-shoring let companies hire existing experienced workers for decades, so they never had to train a new generation of tradespeople. Now all those folks are dying and retiring and they need people to pick up that deep knowledge quickly. Codifying and automating is going to kill jobs either way, but one factory employing a few people making things for other factories with local materials is better than everything perpetually shifting to the cheap labor market du jour. I’m feeling much more optimistic about the future of this than the future of tech careers.
I think over the next few years, a very large percentage of folks in tech will find themselves on the other side of the fence, quickly realize that their existing expertise doesn’t qualify them for any other white collar jobs where vibe coding experience is a bullet point in the pluses section, that tech consulting is declining even faster than salaried jobs, and that they’re vastly less qualified than the competition for blue collar jobs. Gonna be a rough road for a lot of folks. I wouldn’t invest in SF real estate any time soon.
There’s 1,000 established industries that don’t offer the rapid growth and pay outs of the modern tech ecosystem. I’m excited to see some of the current industrial backwaters soak up technical talent freed up by the SV AI brain drain.
To think we’ve handsomely paid our best and brightest the last few decades in pursuit of.. advertising?
3 replies →
You will always be able to produce artisanal hand-set code, same as how artisanal woodworking exists alongside industrial manufacturing. There will be a lot less demand for it, and compensation will align accordingly, but it won't go away.
Either the crap truly works and nobody is needed or it does not work. Where is this half-arsed human-agent hybrid vision coming from? The land of plateaued LLM gains?
I’ve also independently concluded Moonlight was the best way to go after trying my hand at a very similar task. I didn’t want to dig through moonlight’s source, but I’m sure if you’re dedicated enough it would pay dividends later on, it basically does everything you’d need for realtime control in the setting of simulating human input.
Another question regarding Helix - its being built as a platform for private air-gap-ready ai agents that can work against private LLM models.
Are there appliances or easy to deploy hardware that allow one to run these private models on-premise vs cloud
Hey! Yeah we are working with partners on fully integrated hardware+software stack for this. We particularly like the RTX 6000 Pro Blackwell chips for this
What’s the most intricate system that’s been written with this?
Itself, more recently
Whilst impressive to “bend a protocol to your will”, why did you not just take Moonlight and build on top of it, making your own?
No shoehorns needed. Just take what you like and build what you need.
It's nice for unmodified moonlight clients to be able to connect - they have tons of them, you can even run it on a Nintendo DS
But is the ability to run it on the DS a feature? I highly doubt it.
I’m not trashing anything, I’m just saying that if they focused on what their market is, it would be clear no one is going to be coding/working on a Nintendo DS.
1 reply →
Hi, quite interesting project but have a hard time to understand why would stream a desktop.
From my (ignorant) understanding, the important part is the context of the LLM in the task. Some conversations you need visuals, some you don't. What's the advantage of giving a full desktop streaming instead of using integrations?
There's also value in being able to run multiple agents in parallel with their own isolated filesystems and runtimes. One agent won't tread on the toes of another whatever they do. You can let it loose and it doesn't matter if it breaks something, you can just spin up another one
Mainly so you can give the agent access to the desktop as well. Then it can debug your web app in Chrome Dev tools but also you can pair with it with streaming that is so good it feels local
This is beautiful madness.