← Back to context

Comment by gdudeman

5 hours ago

Computer use is a great idea. It gets the job done when nothing else will.

If you're a person trying to get their job done at a big company, but half your job is in 1-2 proprietary tools or is stuck behind an API you can't program against, computer use can allow you, a non-techie, to do your job more efficiently.

I think it's an awesome way to circumvent gate keepers and the IT department to let people accomplish their goals.

I think there's a sweet spot- a lot of the time you're probably better off with "reverse engineer this web page and build me an API or personalized chrome extension to meet my needs".

I have an agent doing price checks for me for an item on a certain website. Instead of blasting through a zillion tokens processing the DOM over and over, it loaded the page once and figured out how to download a json with the price.

How are folks using “computer use” to click things on intranet portals that are behind an SSO? Even this OP example shows visitors a url and enter this search term… that is port of useless.

How can I automate things behind an SSO wall? Even if it means I manually authorize it once and watch it do things on its own..

  • I've never used Gemini computer use, but I assume it's the same:

    Claude computer use takes control of your whole computer inputs (mouse and keyboard) plus screenshots. You just log in, tell Claude you're logged in, and let it get to work. It'll use the browser you're logged in with.

    The chrome extension is a little better because it only takes control of its own chrome tabs (again: you just log in.)

  • Take manual control once, save the login info to a password manager, teach the model to login with it.

That is an incredibly niche use case and comes with a boatload of footguns.

Even then, an AI writing AHK scripts likely outperforms.

Yeah, it's not that computer use is the most theoretically optimal paradigm, but there's a reasonable case that given the constraints of modern software systems and how they're built, that it's the most realistically optimal paradigm.