← Back to context

Comment by tuukkao

1 month ago

Can you elaborate how an user interface based on conversation is even remotely as efficient as a keyboard-operated screen reader? With a screen reader I can get information out of a web page much quicker than the time it takes me to think how to ”ask” for it. The only advantage with this approach I could see (assuming there would be no hallucinating etc.) is that AI can extract things out of an inaccessible / unfamiliar interface. However, in all other respects this approach would effectively lock blind people to using only the capabilities the AI is able to do. As a blind software developer this idea of a supposedly viable user interface sounds patronising more than anything.

Not to mention that this seems to completely ignore all the things that we might use computers for. Browsing websites is only one of the things I do. Many of the things I do I think would be extraordinarily clunky through natural language. Also I just do not feel comfortable talking to my computer out loud, especially when I'm anywhere with other people around. Or I don't know... playing games with friends on voice chat. It seems to be common for people to assume that a fix is very easy and simple. LLM's, OCR for screen readers, etc. If it really was as simple as just slapping OCR on everything, it would already have happened. Also I definitely like some privacy and would prefer my computing not to happen entirely through OpenAI, Anthropic or Google, and whether someone can use computers well or not, we shouldn't force them to do that exact thing. At least in my opinion. And that doesn't even go into the costs associated with all of that LLM usage.

I agree with you that someone who is good with a screen reader can efficiently move through web interfaces. A good screen reader user is faster than the typical user.

However, not all blind people are good with screen readers. For them, an AI assistant would be useful. Even for good screen reader users an AI could be useful.

An example: Yesterday, I needed to buy new valve caps for my car's tires. The screen reader path would be something like walmart -> jump to search field, type "valve cap car tire" and submit -> jump to results section -> iterate through a few results to make sure I'm getting the right thing at a good price -> go to the result I want -> checkout flow. Alternatively, the AI flow would be telling my AI assistant that I need new car tire valve caps. The assistant could then simultaneously search many provider options, select one based on criteria it inferred, and order it by itself.

The AI path, in other words, gets a better result (looking through more providers means it's likelier to find a better path, faster delivery, whatever) and also, much easier and faster. Of course, not only for screen reader users, but also just everyone.

Then the problem was solved 30 years ago, and you can continue to use it indefinitely.

No one will force a blind person to use a computer that converses in natural english. But even sighted people are likely to move away from dense visually heavy UIs towards natural conversational interface with digital systems. I suspect that given that comes to fruition (unlike us nerds, regular folks hate visual info dense clutter), young blind people won't even perceive much impediment in that area of life.

This isn't far off from CLI vs GUI debate, where CLIs are way faster and more efficient, but regular people overwhelmingly despise them and use GUIs. Ease over efficiency is the goal for them.