← Back to context

Comment by avazhi

2 days ago

I mean, I'm the exact opposite. Ask ChatGPT to write a simple (but novel) script for AutoHotKey, for example, and it can't do it. Gemini can do it perfectly on the first try.

ChatGPT has been atrocious for me over the past year, as in its actual performance has deteriorated. Gemini has improved with time. As for the comment about lacking wit, I mean, sure I guess, but I use AI to either help me write code to save me time or to give me information - I expect wit out of actual humans. That shit just annoys me with AI, and neither ChatGPT nor Gemini bots are good at not being obnoxious with metaphors and floral speech.

Sounds like you are using ChatGPT to spit out a script in the chat? - if so, you should give 5.2 codex or Claude Code with Opus 4.5 a try... it's night and day.

  • > 5.2 codex or Claude Code with Opus 4.5 a try

    Is using these same models but with GitHub Copilot or Replit equally capable as / comparable to using the respective first-party CLIs?

    • I don’t think so. My favorite tool is Codex with the 5.2-codex model. I use Github Copilot and Codex at work and Codex and Cursor at home. Codex is better for harder and bigger tasks. I’ll use Copilot or Cursor for small easy things. I think Codex is better than Claude Code as well.

      2 replies →

    • I have GH Copilot from work and a personal Claude Code max subscription and have noticed a difference in quality if I feed the same input prompts/requirements/spec/rules.md to Claude Code cli and GH Copilot, both using Opus 4.5, where Claude Code CLI gives better results.

      Maybe there's more going on at inference time with Claude Code cli?

      1 reply →

  • I find this really frustrating and confusing about all of the coding models. These models are all ostensibly similar in their underpinnings and their basic methods of operation, right?

    So, why does it feel all so fragile and like a gacha game?

  • You're holding it wrong.

    • In this case they probably are prompting it "wrong" or at least less well than codex/copilot/claude code/etc. That's not a criticism of the user, it's an indication of the fact that people have put a lot of work into the special case of using these particular tools and making sure they are prompted well with context etc whereas when you just type something into chat you would need to replicate that effort yourself in your own prompt.