Comment by frabonacci
9 hours ago
UI detection’s a big focus - we use visual grounding + structured observations (like icons, OCR, app metadata, window state), so the agent can reason more like a user would. It’s surprisingly robust even with layout shifts or new themes
No comments yet
Contribute on Hacker News ↗