Comment by jauntywundrkind
12 hours ago
The json here is to ease the machine's ability to generate UI, but reciprocally it feels like this could also be a useful render tree that ai could read & fire actions on too.
There's some early exploration of using accessibility APIs to empower LLM's. This feels like it's sort of also a super simple & direct intermediate format, that maybe could be a more direct app model that LLMs could use.
More broadly it feels like we have a small forming crisis of computing having too many forms. We had cli tools, unix. Thenw we made gui's, which are yet another way to involve tools (and more). Then webapps where the page is what expresses tools (and more). Then react virtualized the page, supplanted dom. Now we have json that expresses views & tool calling. Also tools need to now be expressed as MCP as well, for ai to use it. Serverless and http and endless trpc and cap'n proto and protobuf ways to call functions/invoke tools. We keep making news ways to execute! Do we have value that each one is distinct, that they all have their own specific channels of execution, all distinct?
MCPs are a dead end. CLIs are just better, already did all the things MCPs struggle with, and are human usable. Plus you can use bash or nushell to do all sorts of fun things with command output.
> There's some early exploration of using accessibility APIs to empower LLM's.
any examples come to mind?
The popular Playwright MCP uses the Chrome accessibility tree to help agents navigate websites: https://github.com/microsoft/playwright/blob/ed176022a63add8...
I tried to have Cursor use the playwright MCP to click a few buttons on my project as a test and while it did do what I asked successfully, it burned through like 150 premium requests in 5 minutes.
I guess if you’re totally insensitive to the cost you can use this.
Chris Shank & Orion Reed's work is always excellent. https://bsky.app/profile/chrisshank.com/post/3m3q23xpzkc2u