Comment by torginus

10 hours ago

Moral implications aside, It's funny to see that MS (and AI companies) sees the future of agentic AI as ChatGPT creating screenshots and clicking and scrolling around the UI.

There are tools like MS Active Accessibility and UI automation which are designed for helping impaired people use the computer, as well as very useful for testing.

UI automation in particular is designed for semantic understanding instead of representing the UI in the runtime control hierarchy, and can do things like query offscreen elements or check out whats in a combo box without having to open it.

Credit where it's due - Microsoft used to really invest heavily in making Windows accessible to the blind and impaired, I've had blind acquaintances praise them for being able to use the computer fairly well (my friends grandma was a math teacher, super smart, but sadly she went blind in old age, it's really hard to overstate how much being able to use the computer meant to her.)

Not sure how well it works nowadays, with most apps being not Windows-native.

I'd have recommended people to check out UISpy which was a neat little tool that allowed you to check out your apps in a semantic way, but turns out it was folded into Power Automate, which in turn was made a part of Office 365. I see Microsoft still working tirelessly to undo all the goodwill they have rightfully earned.

The optimistic view would be that the people who wrote the agents just weren't familiar with accessibility technologies so they made it work how they are used to working.

But the more likely reason is that they realized that accessibility is usually poorly done and unreliable. Using vision and mouse lands then in the "happy path" of basically every website and avoids accessibility gaps and bugs.

I don't think any company actually sees some future there, at least not with current agentic AI as is. Agentic AI is just in this sweet legal gray area at the moment, where companies make use of their free pass to scrape all the necessary user data they'll ever need. That's my own interpretation on why it's shoved into every existing product out there, as fast as humanly possible, at least.