← Back to context

Comment by discordance

2 days ago

Sounds like you are using ChatGPT to spit out a script in the chat? - if so, you should give 5.2 codex or Claude Code with Opus 4.5 a try... it's night and day.

> 5.2 codex or Claude Code with Opus 4.5 a try

Is using these same models but with GitHub Copilot or Replit equally capable as / comparable to using the respective first-party CLIs?

  • I don’t think so. My favorite tool is Codex with the 5.2-codex model. I use Github Copilot and Codex at work and Codex and Cursor at home. Codex is better for harder and bigger tasks. I’ll use Copilot or Cursor for small easy things. I think Codex is better than Claude Code as well.

    • Are you using the same models and thinking levels for each?

      I too have found Codex better than Copilot, even for simple tasks. But I don't have the same models available since my work limits the models in copilot to the stupid ones.

  • I have GH Copilot from work and a personal Claude Code max subscription and have noticed a difference in quality if I feed the same input prompts/requirements/spec/rules.md to Claude Code cli and GH Copilot, both using Opus 4.5, where Claude Code CLI gives better results.

    Maybe there's more going on at inference time with Claude Code cli?

    • It is likely because GH Copilot aggressively (over-)manages context and token spend. Probably to hit their desired margins on their plans. But it actively cripples the tool for more complex work IMO. I've had many times where context was obviously being aggressively compacted and also where it will straight truncate data it reads once it reaches some limit.

      I do think it is not as bad as it was 4-6 months ago. Still not as good as CC for agentic workflows.

I find this really frustrating and confusing about all of the coding models. These models are all ostensibly similar in their underpinnings and their basic methods of operation, right?

So, why does it feel all so fragile and like a gacha game?

You're holding it wrong.

  • In this case they probably are prompting it "wrong" or at least less well than codex/copilot/claude code/etc. That's not a criticism of the user, it's an indication of the fact that people have put a lot of work into the special case of using these particular tools and making sure they are prompted well with context etc whereas when you just type something into chat you would need to replicate that effort yourself in your own prompt.