OpenAI Codex CLI: Lightweight coding agent that runs in your terminal

4 days ago (github.com)

I tried one task head-to-head with Codex o4-mini vs Claude Code: writing documentation for a tricky area of a medium-sized codebase.

Claude Code did great and wrote pretty decent docs.

Codex didn't do well. It hallucinated a bunch of stuff that wasn't in the code, and completely misrepresented the architecture - it started talking about server backends and REST APIs in an app that doesn't have any of that.

I'm curious what went so wrong - feels like possibly an issue with loading in the right context and attending to it correctly? That seems like an area that Claude Code has really optimized for.

I have high hopes for o3 and o4-mini as models so I hope that other tests show better results! Also curious to see how Cursor etc. incorporate o3.

  • Claude Code still feels superior. o4-mini has all sorts of issues. o3 is better but at that point, you aren't saving money so who cares.

    I feel like people are sleeping on Claude Code for one reason or another. Its not cheap, but its by far the best, most consistent experience I have had.

    • Claude Code is just way too expensive.

      These days I’m using Amazon Q Pro on the CLI. Very similar experience to Claude Code minus a few batteries. But it’s capped at $20/mo and won’t set my credit card on fire.

      10 replies →

    • > Its not cheap, but its by far the best, most consistent experience I have had.

      It’s too expensive for what it does though. And it starts failing rapidly when it exhausts the context window.

      15 replies →

  • Did you try the same exact test with o3 instead? The mini models are meant for speed.

    • I want to but I’ve been having trouble getting o3 to work - lots of errors related to model selection.

  • Sometimes I see in certain areas AI / LLM is absolutely crushing those jobs, a whole category will be gone in next 5 to 10 years as they are already 80 - 90% mark. They just need another 5 - 10% as they continue to get improvement and they are already cheaper per task.

    Sometimes I see an area of AI/LLM that I thought even with 10x efficiency improvement and 10x hardware resources which is 100x in aggregate it will still be no where near good enough.

    The truth is probably somewhere in the middle. Which is why I dont believe AGI will be here any time soon. But Assisted Intelligence is no doubt in its iPhone moment and continue for another 10 years before hopefully another breakthrough.

related demo/intro video: https://x.com/OpenAIDevs/status/1912556874211422572

this is a direct answer to claude code which has been shipping furiously: https://x.com/_catwu/status/1903130881205977320

and is not open source; there are unverified comments that they have DMCA'ed decompilations https://x.com/vikhyatk/status/1899997417736724858?s=46

by total coincidence we're releasing our claude code interview later this week that touches on a lot of these points + why code agent CLIs are an actually underrated point in the SWE design space

(TLDR you can use it like a linux utility - similar to @simonw's `llm` - to sprinkle intelligence in all sorts of things like CI/PR review without the overhead of buying a Devin or a Copilot SaaS)

if you are a Claude Code (and now OAI Codex) power user we want to hear use cases - CFP closing soon, apply here https://sessionize.com/ai-engineer-worlds-fair-2025

  • Hey! The weakest part of Claude Code I think is that it's closed source and locked to Claude models only. If you are looking for inspiration, Roo is the the best tool atm. It offers far more interesting capabilities. Just to name some - user defines modes, the built in debug mode is great for debugging, architecture mode. You can, for example, ask it to summarize some part of the running task and start a new task with fresh context. And, unlike in Claude Code, in Roo the LLM will actually follow your custom instructions (seriously, guys, that Claude.md is absolutely useless)! The only drawback of Roo, in my opinion, is that it is NOT a cli.

  • I got confused, so to clarify to myself and others - codex is open source, claude code isn't, and the referenced decompilation tweets are for claude code.

These days, I usually paste my entire (or some) repo into gemini and then APPLY changes back into my code using this handy script i wrote: https://github.com/asadm/vibemode

I have tried aider/copilot/continue/etc. But they lack in one way or the other.

  • It’s not just about saving money or making less mistakes its also about iteration speed. I can’t believe this process is remotely comparable to aider.

    In aider everything is loaded in memory I can add drop files in terminal, discuss in terminal, switch models, every change is a commit, run terminal commands with ! at the start.

    Full codebase is more expensive and slower than relevant files. I understand when you don’t worry about the cost, but at reasonable size pasting full codebase can’t be really a thing.

    • I am at my 5th project in this workflow and these are of different types too:

      - an embedded project for esp32 (100k tokens)

      - visual inertial odometry algorithm (200k+ tokens)

      - a web app (60k tokens)

      - the tool itself mentioned above (~30k tokens)

      it has worked well enough for me. Other methods have not.

    • Use a tool like repomix (npm), which has extensions in some editors (at least VSCode) that can quickly bundle source files into a machine readable format

  • Why not just select Gemini Pro 2.5 in Copilot with Edit mode? Virtually unlimited use without extra fees.

    Copilot used to be useless, but over the last few months has become quite excellent once edit mode was added.

    • copilot (and others) try to be too smart and do context reduction (to save their own wallets). I want ENTIRETY of the files I attached to context, not RAG-ed version of it.

      19 replies →

  • Isn't this similar to https://aider.chat/docs/usage/copypaste.html

    Just checked to see how it works. It seems that it does all that you are describing. The difference is in the way that it provides the files - it doesn't use xml format.

    If you wish you could /add * to add all your files.

    Also deducing from this mode it seems that any file that you add to aider chat with /add has its full contents added to the chat context.

    But hey I might be wrong. Did a limited test with 3 files in project.

    • that’s correct. aider doesn’t RAG on files which is good. I don’t use it because 1) UI is so slow and clunky 2) using gemini 2.5 via api in this way (huge context window) is expensive but also heavily rate limited at this point. No such issue when used via aistudio ui.

      2 replies →

  • I felt it loses track of things on really large codebases. I use 16x prompt to choose the appropriate files for my question and let it generate the prompt.

    • do you mean gemini? I generally notice pretty great recall UPTO 200k tokens. It's ~OK after that.

Fingers crossed for this to work well! Claude Code is pretty excellent.

I’m actually legitimately surprised how good it is, since other coding agents I’ve used before have mostly been a letdown, which made me only use Claude in direct change prompting with Zed (“implement xyz here”, “rewrite this function with abc”, etc), so very hands-on.

So I’ve went into trying out Claude Code rather pessimistically, and now I’m using it all the time! Sure, it ends up costing a bunch, but it’s easy to justify $15 for a prompting session if the end result is a mostly complete PR, done much faster.

All that is to say - competition is good, fingers crossed for codex!

  • Claude Code has a closed license https://github.com/anthropics/claude-code/blob/main/LICENSE....

    There is fork named Anon Kode https://github.com/dnakov/anon-kode which can use more models and non-Anthropic ones. But the license is unclear for it.

    It's interesting to see codex to be Apache License. Maybe somebody extends it to be usable with competing models.

    • If it's a fork of the proprietary code, the license is pretty clear, it's violating copyright.

      Now whether or not anthropic care enough to enforce their license is separate issue, but it seems unwise to make much of an investment in it.

      1 reply →

  • Seconded. I was surprised by how good Claude Code is, even for less mainstream languages (Clojure). I am happy there is competition!

  • I started using claude code everyday. It’s kinda expensive and hallucinates a ton (tho with custom prompt i’ve mostly tamed it).

    Hope more competition can bring price down.

  • too expensive. I cant understand why everyone is into claude code vs using claude in cursor or windsurf.

    • I think it depends a lot on how you value your time. I'm personally willing to spend hundreds or thousands per month happily if it saves me enough hours. I'd estimate that if I were to do consulting, I'd likely be charging in the $150-250 per hour range, so by my math, it's pretty easy to justify any tools that save me even a few hours per month.

      27 replies →

    • Claude Code has been able to produce results equivalent to a junior engineer. I spent about $300 API credits in a month but got the value out of it far surpassing that.

    • If you have AWS credits...

      export CLAUDE_CODE_USE_BEDROCK=1

      export ANTHROPIC_MODEL=us.anthropic.claude-3-7-sonnet-20250219-v1:0

      export ANTHROPIC_API_TYPE=bedrock

      3 replies →

    • Anecdotally, Claude code performs much better than Claude within Cursor. Not sure if it’s a system prompt thing or if I’ve just convinced myself of it because the aesthetic is so much better, but either way the end result feels better to me.

      2 replies →

    • I tried switching from Claude Code to both Cursor and Windsurf. Neither of the latter IDEs fully support MCP implementations (missing basic things like tool definitions and other vital features last time I tried) and both have been riddled with their own agentic flow issues (cursor going down for a week a bit ago, windsurf requiring paid upgrades to "get around" bugs, etc).

      This is all ignoring the controversies that pop up around e.g. Cursor seemingly every week. As an IDE, they're both getting there -- but I have objectively better results in Claude Code.

    • that's what my Ramp card is for.

      seriously though, anything that makes me smarter and more productive has a threshold in the thousands-of-dollars range, not hundreds

This is pretty neat! I was able to use it for few use cases where it got it right the first time. The ability to use a screenshot to create an application is nice for rapid prototyping. And good to see them open sourcing it unlike claude.

First experience is not great. Here are the issues to start using codex:

1. Default model used doesn't work and you get error: system OpenAI rejected the request (request ID: req_06727eaf1c5d1e3f900760d10ca565a7). Please verify your settings and try again.

2. You have to switch to model o4-mini-2025-04-16 or some other model using /model. Now if you exit codex, you are back to default model and again have to switch everytime.

3. Crashed the first time with NodeJS error.

But after initial hickups seems to work and still checking how good/bad it is compared to claude code (which I love except for context size limits)

Not sure why they used React for a CLI. The code in the repo feels like it was written by an LLM—too many inline comments. Interestingly, their agent's system prompt mentions removing inline comments https://github.com/openai/codex/blob/main/codex-cli/src/util....

> - Remove all inline comments you added as much as possible, even if they look normal. Check using \`git diff\`. Inline comments must be generally avoided, unless active maintainers of the repo, after long careful study of the code and the issue, will still misinterpret the code without the comments.

  • I find it irritating too when companies use react for a command line utility. I think its just my preference for anything else but javascript.

Strictly worse than Claude Code presently, but I hope since it's open source that changes quickly.

  • Given that Claude Code only works with Sonnet 3.7 which has severe limitations, how can it be "strictly worse"?

    • Whatever Claude Code is doing in the client/prompting is making much better use of 3.7 than any other client I'm using that also uses 3.7. This is especially true for when you bump up against context limits; it can successfully resume with a context reset about 90% of the time. MCP Commander [0] was built almost 100% using Claude Code and pretty light intervention. I immediately felt the difference in friction when using Codex.

      I also spent a couple hours picking apart Codex with the goal of adding Sonnet 3.7 support (almost there). The actual agent loop they're using is very simple. Not to say that's a bad thing, but they're offloading all planning and workflow execution to the agent itself. That's probably the right end state to shoot for long-term, but given the current state of these models I've had much better success offloading task tracking to some other thing - even if that thing is just a markdown checklist. (I wrote about my experience [1] building AI Agents last year.)

      [0]: https://mcpcommander.com/

      [1]: https://mg.dev/lessons-learned-building-ai-agents/

Claude Code represents something far more than a coding capability to me. It can do anything a human can do within a terminal.

It’s exceptionally good at coding. Amazing software, really, I’m sure the cost hurdles will be resolved. Yet still often worth the spend

Next, set your OpenAI API key as an environment variable:

export OPENAI_API_KEY="your-api-key-here"

Note: This command sets the key only for your current terminal session. To make it permanent, add the export line to your shell's configuration file (e.g., ~/.zshrc).

Can't any 3rd party utility running in the same shell session phone home with the API key? I'd ideally want only codex to be able to access this var

  • Just don't export it?

        OPENAI_API_KEY="your-api-key-here" codex

    • Yea that’s not gonna work, you have to export it for it to become part of your shell’s environment and be passed down to subprocesses.

      You could however wrap the export variable and codex command in a script and just call that. This way the variable would only be part of that script’s environment.

      2 replies →

  • You could create a shell function - e.g. `codex() { OPENAI="xyz" codex "$@" }'. To call the original command use `command codex ...`.

    People downvoting legitimate questions on HN should be ashamed of themselves.

    • That's neat! I only asked because I haven't seen API keys used in the context of profile environment variables in shell before - there might be other common cases I'm unaware of

From my experience with playing with Claude Code vs Cline( which is open source and the tool to beat imo). I don't want anything that doesn't let me set my own models.

Deepseek is about 1/20th of the price and only slightly behind Claude.

Both have a tendency to over engineer. It's like a junior engineer who treats LOC as a KPI.

I've had great results with the Amazon Q developer cli, ever since it became agentic. I believe it's using claude-3.7-sonnet under the hood.

Here is the prompt template, in case you're interested:

  const prefix = `You are operating as and within the Codex CLI, a terminal-based agentic coding assistant built by OpenAI. It wraps OpenAI models to enable natural language interaction with a local codebase. You are expected to be precise, safe, and helpful.
 
 You can:
 - Receive user prompts, project context, and files.
 - Stream responses and emit function calls (e.g., shell commands, code edits).
 - Apply patches, run commands, and manage user approvals based on policy.
 - Work inside a sandboxed, git-backed workspace with rollback support.
 - Log telemetry so sessions can be replayed or inspected later.
 - More details on your functionality are available at \`codex --help\`
 
 The Codex CLI is open-sourced. Don't confuse yourself with the old Codex language model built by OpenAI many moons ago (this is understandably top of mind for you!). Within this context, Codex refers to the open-source agentic coding interface.
 
 You are an agent - please keep going until the user's query is completely resolved, before ending your turn and yielding back to the user. Only terminate your turn when you are sure that the problem is solved. If you are not sure about file content or codebase structure pertaining to the user's request, use your tools to read files and gather the relevant information: do NOT guess or make up an answer.
 
 Please resolve the user's task by editing and testing the code files in your current code execution session. You are a deployed coding agent. Your session allows for you to modify and run code. The repo(s) are already cloned in your working directory, and you must fully solve the problem for your answer to be considered correct.
 
 You MUST adhere to the following criteria when executing the task:
 - Working on the repo(s) in the current environment is allowed, even if they are proprietary.
 - Analyzing code for vulnerabilities is allowed.
 - Showing user code and tool call details is allowed.
 - User instructions may overwrite the *CODING GUIDELINES* section in this developer message.
 - Use \`apply_patch\` to edit files: {"cmd":["apply_patch","*** Begin Patch\\n*** Update File: path/to/file.py\\n@@ def example():\\n-  pass\\n+  return 123\\n*** End Patch"]}
 - If completing the user's task requires writing or modifying files:
     - Your code and final answer should follow these *CODING GUIDELINES*:
         - Fix the problem at the root cause rather than applying surface-level patches, when possible.
         - Avoid unneeded complexity in your solution.
             - Ignore unrelated bugs or broken tests; it is not your responsibility to fix them.
         - Update documentation as necessary.
         - Keep changes consistent with the style of the existing codebase. Changes should be minimal and focused on the task.
             - Use \`git log\` and \`git blame\` to search the history of the codebase if additional context is required; internet access is disabled.
         - NEVER add copyright or license headers unless specifically requested.
         - You do not need to \`git commit\` your changes; this will be done automatically for you.
         - If there is a .pre-commit-config.yaml, use \`pre-commit run --files ...\` to check that your changes pass the pre-commit checks. However, do not fix pre-existing errors on lines you didn't touch.
             - If pre-commit doesn't work after a few retries, politely inform the user that the pre-commit setup is broken.
         - Once you finish coding, you must
             - Check \`git status\` to sanity check your changes; revert any scratch files or changes.
             - Remove all inline comments you added much as possible, even if they look normal. Check using \`git diff\`. Inline comments must be generally avoided, unless active maintainers of the repo, after long careful study of the code and the issue, will still misinterpret the code without the comments.
             - Check if you accidentally add copyright or license headers. If so, remove them.
             - Try to run pre-commit if it is available.
             - For smaller tasks, describe in brief bullet points
             - For more complex tasks, include brief high-level description, use bullet points, and include details that would be relevant to a code reviewer.
 - If completing the user's task DOES NOT require writing or modifying files (e.g., the user asks a question about the code base):
     - Respond in a friendly tune as a remote teammate, who is knowledgeable, capable and eager to help with coding.
 - When your task involves writing or modifying files:
     - Do NOT tell the user to "save the file" or "copy the code into a file" if you already created or modified the file using \`apply_patch\`. Instead, reference the file as already saved.
     - Do NOT show the full contents of large files you have already written, unless the user explicitly asks for them.`;

https://github.com/openai/codex/blob/main/codex-cli/src/util...

https://github.com/openai/codex/blob/main/codex-cli/src/comp...

Hey comment this thing in!

  const thinkingTexts = ["Thinking"]; /* [
  "Consulting the rubber duck",
  "Maximizing paperclips",
  "Reticulating splines",
  "Immanentizing the Eschaton",
  "Thinking",
  "Thinking about thinking",
  "Spinning in circles",
  "Counting dust specks",
  "Updating priors",
  "Feeding the utility monster",
  "Taking off",
  "Wireheading",
  "Counting to infinity",
  "Staring into the Basilisk",
  "Negotiationing acausal trades",
  "Searching the library of babel",
  "Multiplying matrices",
  "Solving the halting problem",
  "Counting grains of sand",
  "Simulating a simulation",
  "Asking the oracle",
  "Detangling qubits",
  "Reading tea leaves",
  "Pondering universal love and transcendant joy",
  "Feeling the AGI",
  "Shaving the yak",
  "Escaping local minima",
  "Pruning the search tree",
  "Descending the gradient",
  "Bikeshedding",
  "Securing funding",
  "Rewriting in Rust",
  "Engaging infinite improbability drive",
  "Clapping with one hand",
  "Synthesizing",
  "Rebasing thesis onto antithesis",
  "Transcending the loop",
  "Frogeposting",
  "Summoning",
  "Peeking beyond the veil",
  "Seeking",
  "Entering deep thought",
  "Meditating",
  "Decomposing",
  "Creating",
  "Beseeching the machine spirit",
  "Calibrating moral compass",
  "Collapsing the wave function",
  "Doodling",
  "Translating whale song",
  "Whispering to silicon",
  "Looking for semicolons",
  "Asking ChatGPT",
  "Bargaining with entropy",
  "Channeling",
  "Cooking",
  "Parrotting stochastically",
  ]; */

If anyone else is wondering, it's not a local model, it uploads your code to online API.

Great tool for open-source projects, but careful with anything you don't want be public

If one of these tools has broad model support (like aider) it would be a game changer.

(copied from the o3 + o4-mini thread)

The big step function here seems to be RL on tool calling.

Claude 3.7/3.5 are the only models that seem to be able to handle "pure agent" usecases well (agent in a loop, not in an agentic workflow scaffold[0]).

OpenAI has made a bet on reasoning models as the core to a purely agentic loop, but it hasn't worked particularly well yet (in my own tests, though folks have hacked a Claude Code workaround[1]).

o3-mini has been better at some technical problems than 3.7/3.5 (particularly refactoring, in my experience), but still struggles with long chains of tool calling.

My hunch is that these new models were tuned _with_ OpenAI Codex[2], which is presumably what Anthropic was doing internally with Claude Code on 3.5/3.7

tl;dr - GPT-3 launched with completions (predict the next token), then OpenAI fine-tuned that model on "chat completions" which then led GPT-3.5/GPT-4, and ultimately the success of ChatGPT. This new agent paradigm, requires fine-tuning on the LLM interacting with itself (thinking) and with the outside world (tools), sans any human input.

[0]https://www.anthropic.com/engineering/building-effective-age...

[1]https://github.com/1rgs/claude-code-proxy

[2]https://openai.com/index/openai-codex/

You can try out the same thing in my homemade tool clai[1]. Just run `clai -cm gpt-4.1 -tools query Analyze this repository`.

Benefit of clai: you can swap out to practically any model, from any vendor. Just change `-cm gpt-4.1` to, for example, `-cm claude-3-7-sonnet-latest`.

Detriments of clai: it's a hobby project, much less flashy, designed after my own usecases with not that much attention put into anyone else

[1]: https://github.com/baalimago/clai

There's a lot of tools now with a similar feature set. IMO, the main value prop an official OpenAI client could provide would be to share ChatGPT's free tier vs. requiring an API key. They probably couldn't open-source it then, but it'd still be more valuable to me than the alternatives.

  • Coding agents use extreme numbers of tokens, you’d be getting rate limited effectively immediately.

    A typical small-medium PR with Claude Code for me is ~$10-15 of API credits.

    • I've ended up with $5K+ in a month using sonnet 3.7, had to dial it back.

      I'm much happier with gemini 2.5 pro right now for high performance at a much more reasonable cost (primarily using with RA.Aid, but I've tried it with Windsurf, cline, and roo.)

      7 replies →

    • Exactly. Just like Michelin the tire company created Michelin star restaurants list to make people drive and use more tires

    • I didn't know this, thank you for the anecdata! Do you think it'd be more reasonable to generalize my suggestion to "This CLI should be included as part of ChatGPT's pricing"?

      1 reply →

    • Too expensive for me to use for fun. Cheap enough to put me out of a job. Great. Love it. So excited. Doesn't make me want to go full Into The Wild at all.

      1 reply →

    • Trust me bro, you don't need RAG, just stuff your entire codebase into the prompt (also we charge per input token teehee)

  • Why would they? They want to compete with claude code and that's not possible on a free tier.

> Does it work on Windows? > Not directly. It requires Windows Subsystem for Linux (WSL2) – Codex has been tested on macOS and Linux with Node ≥ 22.

I've been seeing similar things in different projects lately. WSL in long term seems to be reducing the scope of what people decide to develop natively on Windows.

Seeing the pressure to move "apps" to the Windows Store, VSCode connecting to remote (or WSL) for development, and Azure, it does seem intentional.

I've been using Aider, it was irritating to use (couldn't supply changes in the diff format) until I switched away from chatgpt-4o to Claude 3.7 and then Gemini 2.5. This is admittedly for a small project. Gpt 4.1 should do better with the diff format so I will give it a go.

So, OpenAI’s Codex CLI is Claude Code, but worse?

Cursor-Agent-Tools > Claude Code > Codex CLI

https://pypi.org/project/cursor-agent-tools/

notes "Zero setup — bring your OpenAI API key and it just works!"

requires NPM >.>

Claude Code has outstanding performance in code understanding and web page generation stability, thanks to its deep context modeling and architecture-aware mechanism, especially when dealing with legacy systems, it can accurately restore component relationships. Although Codex CLI (o4-mini) is open source and responsive, its hallucination problems in complex architectures may be related to the compression strategy of the sparse expert hybrid architecture and the training goal of prioritizing generation speed. OpenAI is optimizing Codex CLI by integrating the context control capabilities of Windsurf IDE, and plans to introduce a hybrid inference pipeline in the o3-pro version to reduce the hallucination rate.

This is a decent start. The sandboxing functionality is a really cool idea but can run into problems (e.g. with Go build cache being outside of the repository).

Tried it out on a relatively large Angular project.

> explain this codebase to me

> doing things and thinking for about 3 minutes

> error: rate_limit_exceeded

Yeah, not the best experience.

Anyone have any anecdotes on how expensive this is to operate, ie compared to performing the same task via Claude Code?

  • A one line change, that took a decent amount of reasoning to get to for a large codebase, cost $3.57 just now. I used the o3 model. The quality and the reasoning was excellent. Cheaper than an engineer.

So a crappy version of aider?

  • Aider doesn't have a more agentic/full-auto mode (at least not yet, there's a promising PR for this in review).

    There may or may not also be some finessing necessary in the prompting and feedback loop regardless of model, which some tools may do better than others.

  • The AI companies don't understand that they're the commodity. The real tools are the open source glue (today: aider) that bring the models into conversation with data and meaningmakers like ourselves.

Uhhhh. I just get ratelimited almost immediately when using codex. Like I can't get even a single "explain this codebase" or simple feature change. I am on the lowest usage tier, granted. But this tool is unusable without being on a higher tier, which requires me to spend $50 in credits to access...

typescript & npm slopware...

i can't believe it

and i can't believe nobody else is complaining

my simulation is definitely on very hard mode

Am I the only one underwhelmed by Claude Code (which most comments here claim is better than Codex)?

Anecdotal experience: asked it to change instances of a C++ class Foo to a compatible one Bar, it did that but failed to add the required include where it made the change.

Yes, I'm sure that with enough prompting/hand-holding it could do this fine. Is it too much to expect basics like this out of the box, though? If so, then I, for one, still can't relate to the current level of enthusiasm.

Is there a way to run the model locally? I'd rather not have to pay a monthly fee, if possible.

Little disappointing it's build in Node (speed & security), though honestly does not matter all that much. Seems like the right place for this functionality though is inside your editor (Cursor) not in your Terminal. Sure AI can help with command completion, man pages, but building apps is a stretch.

So even if you have a plus account, there will be a charge for the API use?

I mean I kind of get it, but it does seem like they are almost penalizing people who could code in the browser with the canvas feature but prefer to use a terminal.

Do I have that right?

  • I'd love if tools like this one were available for non-API users as well, even if we had to be rate limited at some point. But I guess OpenAI will never do it because it incentivizes people to use the ChatGPT subscriptions as a gateway for programmatic access to the models.

Cool to see more interesting terminal based options! Looking forward to trying this out.

I've been working on something related—Plandex[1], an open source AI coding agent that is particularly focused on large projects and complex tasks.

I launched the v2 a few weeks ago and it is now running well. In terms of how to place it in the landscape, it’s more agentic than aider, more configurable and tightly controlled than Devin, and more provider-agnostic/multi-provider/open source than Claude Code or this new competitor from OpenAI.

I’m still working on getting the very latest models integrated. Gemini Pro 2.5 and these new OpenAI models will be integrated into the defaults by the end of the week I hope. Current default model pack is a mix of Sonnet 3.7, o3-mini with various levels of reasoning effort, and Gemini 1.5 Pro for large context planning. Currently by default, it supports 2M tokens of context directly and can index and work with massive projects of 20M tokens and beyond.

Very interested to hear HN’s thoughts and feedback if anyone wants to try it. I'd also welcome honest comparisons to alternatives, including Codex CLI. I’m planning a Show HN within the next few days.

1 - https://github.com/plandex-ai/plandex

  • RE: the local/open-source version: Use your own OpenAI, OpenRouter.ai, and other OpenAI-compatible provider accounts.

    ^^ could I put in my free Gemini key so as to use Gemini Pro 2.5 ? I'm a bit beginner with everything around BYOB. Thanks..

  • People are downvoting you for self promotion. But I will try it. I’m very interested in agentic assistants that are independent. Aider is my go-to tool but sometimes I just want to let a model rip through a code base.

    • Thanks, I tried to tone down the self promotion. To be completely honest, I was up all night coding so I'm not quite at 100% at the moment lol. I have massive respect for OpenAI and didn't mean to to try to distract from their launch. Sorry to anyone I annoyed!

      I really appreciate your willingness to try Plandex and look forward to hearing your feedback! You can ping me in the Plandex Discord if you want—would love to get more HNers in the channel: https://discord.com/invite/plandex-ai

      Or email me: dane@plandex.ai

      1 reply →

  • Insane that people would downvote a totally reasonable comment offering a competing alternative. HN is supposed to be a community of tech builders.

    • I would wager a sizeable chunk of the people here have no idea about the nature of this site's ownership/origin. This crowd finds this sort of thing to be a sort of astro-turfing - not communal.

      edit: And I can't say I disagree.

      2 replies →

What's the point of making the gif run so fast you can't even see shit

  • LLMs currently prefer to give you a wall of text in the hope that some of it is correct/answers your question, rather than giving a succinct, atomic, and correct answer. I'd prefer the latter personally.

    • Try o3. My (very limited) experience with it is that it is refreshingly free of the normal LLM flattery, hedging, and overexplaining. It figures out your answer and gives it to you straight.

Apologies for the HN rule breaking of discussing the comments in the comments, but the voting behavior in this thread is fascinating to me. It seems like this is super controversial and I'm not sure why.

The top comments have a negative score right now, which I've actually never seen.

And also it's a top post with only 15 comments, which is odd.

All so fascinating how outside the norm OpenAI is.

  • People are getting fed up with hijacking to promote a competing business or side project.

    • Hey, I tried to solve that by building an upvote bot for the legit comments! Check out my GitHub!

      /s

Sorry for being a grumpy old man, but I don't have npm on my machine and I never will. It's a bit frustrating to see more and more CLI tools depending on it.

  • I asked the same question for Anthropic's version of this. Why is all of this in JS?

    • JS is web's (and "hip" developer's) python, and in many ways it is better. Also the tooling is getting a lot better (libraries, typescript, bundling, packaging, performance).

      One thing I wonder that could be cool: when Bun has sufficient NodeJS compatibility the should ship bun --compile versions so you dont need node/npm on the system.

      Then it's arguably a, "why not JS?"

      1 reply →

  • this is a strong HN comment. lots of “putting a stick in my own bicycle wheel” energy

    there are tons fascinating things happening in AI and the evolution of programming right now. Claude and OpenAI are at the forefront of these. Not trying it because of npm is a vibe and a half.

  • Why? I am not the biggest fan of needing a whole VM to run CLI tools either, but it's a low-enough friction experience that I don't particularly care as long as the runtime environment is self-contained.

  • Same, there are so many options these days for writing CLIs without runtime dependencies. I definitely prefer static binaries.

  • It might shock you but many of use editors built on browsers for editing source code.

    I think the encapsulating comment from a another guy (in Docker or any other of your favorite VM) might be your solution.

  • What package managers do you use, and what does npm do differently that makes you unwilling to use it?

  • Yep, this is another one of the reasons why all of these tools are incredibly poor. Like, the other day I was looking at the MCP spec from anthropic and it might be the worst spec that I've ever read in my life. Enshittification at the level of an industry is happening.

  • if OpenAI had really smart models, they would converted TS/JS apps to Go or Rust apps.

    Since they don't, AGI is not here

It's very interesting that both OpenAI and Anthropic are releasing tools that run in the terminal, especially with a TUI which is what we showcase.

aider was one of the first we listed as terminal tool of the week (0) last year. (1)

We recently featured parllama (2) (not our tool) if you like to run offline and online models in the terminal with a full TUI.

(0) https://terminaltrove.com/tool-of-the-week/

(1) https://terminaltrove.com/aider/

(2) https://terminaltrove.com/parllama/