Comment by woeirua

5 days ago

Just reading the comments here it's amazing how many people seemingly don't know that Claude Desktop and Cowork basically already does all of this. Codex isn't pioneering these features, it's mostly just catching up.

I don't think Claude has this part yet:

> With background computer use, Codex can now use all of the apps on your computer by seeing, clicking, and typing with its own cursor. Multiple agents can work on your Mac in parallel, without interfering with your own work in other apps.

  • >background computer use

    How does that even work technically? macOS doesn't support multiple cursors. On native Cocoa apps you can pass input to a window without raising via command+click so possibly they synthesized those events, but fewer and fewer apps support that these days. And AppleScript is basically dead, so they can't be using that either.

    I also read they acquired the Sky team (who I think were former Apple employees). No wonder they were able to pull of something so slick.

  • They aquired Vercep, and their older agent Vy did have background agent. IIRC the recent computer-use agent in Claude is based on Vy, so i'm kinda surprised that feature didn't carry over to Claude desktop app.

  • Imagine where we’d be if the restrictive iOS model was dominant in all computing. We’d never get anything like this

Yeah, it’s probably very similar to my experience where I just tried Codex because I had a ChatGPT subscription found it to be quite powerful and then because I was used to it just ended up getting the pro subscription so I am guessing folks like me have never really used Claude.

Claude Cowork is unusably slow on my M1 MacBook Pro. I wonder if Codex is any better; a quick search indicates that it is also an electron app

  • At least when I tried it last, Claude Cowork tried to spin up an entire virtual machine to sandbox itself properly - and not only is that sandboxing slow to start up, it also makes it difficult to actually interact freely across your filesystem. (Perhaps a feature, not a bug.)

    Claude Code, on the other hand, has no such issues, if you've done some setup to allow all commands by default (perhaps then setting "ask" for rm, etc.).

  • Codex is a rust TUI app, and it's available as open source. It has nothing to do with Electron.

    • Codex CLI is a TUI app, but Codex App is an actual desktop GUI app. If you actually look at the TFA, you'll see that all of the videos are of the desktop app.

      1 reply →

    • > Codex is a rust TUI app, and it's available as open source. It has nothing to do with Electron.

      I just updated Codex and looked inside the macOS app package. It is most definitely still an Electron app.

    • Codex is both a macOS app and a CLI/TUI app.

      Their naming is not very clear. The codex desktop app is somewhat of a frontend for the codex cli.

      By the look and feel of it I would guess it is written with Electron.

IMHO no one is really pioneering. A lot more is possible than what is being done. I wrote a blog post about useful agents in a business setting (https://www.generativestorytelling.ai/blog/posts/useful-corp...) that highlights AI being proactive.

I mean table stakes stuff, why isn't an agent going through all my slack channels and giving me a morning summary of what I should be paying attention to? Why aren't all those meeting transcriptions being joined together into something actually useful? I should be given pre-meeting prep notes about what was discussed last time and who had what to do items assigned. Basic stuff that is already possible but that no one is doing.

I swear none of the AI companies have any sense of human centric design.

> pull relevant context from Slack, Notion, and your codebase, then provide you with a prioritized list of actions.

This is an improvement, but it isn't the central focus. It should be more than just on a single work item basis, more than on just code.

If we are going to be managing swarms of AI agents going forward, attention becomes our most valuable resource. AI should be laser focused on helping us decide where to be focused.

  • THANK YOU. I keep thinking this as well. I'm rolling my own skills to actually make my job easier, which is all about gathering, surfacing, and synthesizing information so I can make quick informed decisions. I feel like nobody is thinking this way and it's bizarre.

    • The value prop is tenuous and most people still think agents aren't capable of doing this type of work reliably yet (which is... kind of true). You won't get punished too much by users for false positives when summarizing tasks, but you will get absolutely eviscerated for false negatives (e.g. dropping a critical task from the summary). Can you guarantee that your agent won't forget to tell you about something super important?

    • I am completely convinced this is because of a gap in the intersection of knowledge. Somehow the people making the best agents are focused on extending the capabilities of the models, meanwhile the people who could best make an application layer because just think of LLM's as a chat prompt.

      We need a product person, maybe with a turtle neck sweater and an horrid work-life attitude, to fix this up, instead of a weirdly philosophic basilisk fearing idealist.

  • This makes a lot of sense, but I can't see anyone paying for this because at its simplest layer it's just a Neo4j install + some skills + a local cron job for Claude Desktop. How long will it take for Anthropic to just bake this into Claude Desktop or OpenAI into Codex? Probably not that long.

    I keep coming up with good ideas for how to use agents and keep walking away from them because there just is no defensible moat. Everything software related is just going to get totally consumed over the next year.

  • Disclaimer I work at Zapier, but we're doing a ton of this. I have an agent that runs every morning and creates prep documents for my calls. Then a separate one that runs at the end of every week to give me feedback

    • In the full blog post I actually go into more detail about automatically creating a knowledge graph of what is being worked on throughout the whole company. There are some really powerful transformative efforts that can be accomplished right now, but that no one is doing.

      Basic things like detecting common pain points, to automatically figuring out who is the SME for a topic. AIs are really good at categorizations and tagging, heck even before modern LLMs this is something ML could do.

      But instead we have AI driven code reviews.

      Code Reviews are rarely the blocker for productivity! As an industry, we need to stop automating the easy stuff and start helping people accomplish the hard stuff!

  • You should check out https://pieces.app/ ive been using it for months and I am surprised I have never seen anyone ever talk about it.

    It does exactly what you are asking for, and it can do it completely locally or with a mixture of frontier models.

  • Agreed. It is ironic that in the AI race, the real differentiation may not come from how smart the model is, but from who builds the best application layer on top of it. And that application layer is built with the same kind of software these models are supposed to commoditize.

    • This feels like *nix.

      Developers built themselves really good OSes for doing developer things. Actually using it to do things was secondary.

      Want to run a web server? Awesome choice. Want to write networking code? Great. Setup a reliable DB with automated backups? Easy peasy.

      Want a stable desktop environment? Well after almost 30 years we just about have one. Kind of. It isn't consistent and I need to have a post it note on my monitor with the command to restart plasma shell, but things kind of work.

      Current AI tools are so damn focused on building developer experiences, everything else is secondary. I get it, developers know how to fix developer pain points, and it monitizes well.

      But holy shit. Other things are possible. Someone please do them. Or hell give me a 20 or 30 million and I'll do it.

      But just.... The obvious is sitting out there for anyone who has spent 10 minutes not being just a developer.

It mostly feels like they’re just converging on each other. The latest Claude Mac app release pushed a new UI that looks almost exactly like Codex’s.

Codex has better UX/UI, but Claude is still way ahead in sheer schizophrenia: https://i.imgur.com/jYawPDY.png

Opus 4.6 has had many "oops you're right!" gaffes and other annoyances that I let my Claude subscription expire yesterday.

Codex has been more consistent and helpful, but it too is still not quite at the point where you can blindly trust it without verifying the output.

Its not like Claude is pioneering those. All that was done prior to all of them by some random startup.

Antigravity off in the corner feeling sad about itself rn.

  • I love poor forgotten Antigravity. For one, you can use your Gemini account to churn Opus credits until they run out then switch to Gemini 3.1 to finish off.

I think your making assumptions without reading the entire thread and processing the general theme. This isn't about catching up or whos better. It really comes down two things. One, how far does your money go, and secondly which political narrative you subscribe too. Up until they started their beef with the u.s. government I was a subscriber. Between that and how fast my tokens depleted I switched to Codex. Best decision of my life and now I never run out of tokens.

It was the perfect storm and I would have never switched since the first AI I started with was Claude.

  • You want to use the model that is potentially giving your data to the government vs the one that’s openly rejecting that partnership?

    • At this point you gotta pick and chose your morality Claude is screwing people on credits and tokens OoenAI is selling three molecules left of your privacy to the government Are those three molecules worth fighting for when your budget is really tight or you are unemployed? Everyone has different priorities

The first time I tried anthropics version it burned up all its tokens in like 10 minutes and left me stuck in a broken state. So I uninstalled it.

Clicking UI elements can also be done in Github copilot for vscode, and cursor.