← Back to context

Comment by swyx

2 days ago

i think its very interesting how openai basically owns/leads in every single vector you* listed. have they missed/been behind on something?

*i would have come up with a similar list but i dont trust my own judgment here. maybe i'd sub in claude code vs codex but jury is a bit out still on that

I think OpenAI is the first 100% AI-focused company to throw this many engineers (over 1,000 at this point?) at every part of the agentic workflow. I think it's a tremendous amount of discovery work. My theory would be that once we see what really works, other companies can catch up rather quickly, using far fewer resources to do so.

Google seem to be making a lot of progress on agentic too, not only with Mariner, but with Project Astra, Call For Me, and their Agent2Agent protocol. There's probably much more to come here.

Oh and OpenAI is clearly willing to spend a lot of money to push this technology a bit further. If you look at the logs of Codex, it appears to be using a very strong (read: expensive) reasoning model to basically brute force the use of a VM. If you have a follow-up question in a Codex task, they just casually throw away the old VM and spin up a new one, running all setup again. If you compare this to e.g., Cursor, I'd wager Codex costs 5-10x more to perform a similarly sized task, though it's hard to tell for sure.

  • Why aren’t they using gvisor for something like this?

    • They probably are, or at least will! But for now it seems like the first version that works end to end and it certainly feels like it’s a standard VM spinning up a Docker image. There are lots of more specialized solutions out there like CodeSandbox SDK which lets you prepare a devcontainer that can fork and spin up for a new PR in less than a second. So if it’s not Codex, _someone_ will nail this experience. Cursor’s new background agents could be it though I don’t enjoy them currently. And I also get the feeling they too spin up cloud VMs “the old school way”.