Comment by _pdp_
3 months ago
IMHO, the goals is not to have to watch what agents do and let them do the work.
I would personally invest into making agents more autonomous (yes hard problem today) then building a desktop video session protocol to watch them do the work.
Yeah that's what we did :) https://youtu.be/vVmnpcnLDGM?si=b6LxW6lmM7843LY0
>I would personally invest into making agents more autonomous (yes hard problem today) then building a desktop video session protocol to watch them do the work.
Seems difficult to research better autonomy without extensive monitoring. You need specific data on before/after effects of changes, for instance.
You're right. Monitoring is critical for this. You need to know exactly what changed and why.
This is actually where streaming + the actual architecture behing this becomes valuable beyond just "letting users watch." In Helix, every agent interaction is persisted with timing data (DurationMs), tool calls, error states, and even user feedback (thumbs up/down). Sessions track the full conversation history bidirectionally, so you can reconstruct exactly what the agent saw and did at each step.