Comment by whalesalad

5 days ago

I find that even with opus 4.6, copilot feels like it’s handicapped. I’m not sure if it’s related to memory or what but if I give two tasks to opus4.6 one in CC and one in Copilot, CC is substantially better.

I’ve been really enjoying Codex CLI recently though. It seems to do just as well as Opus 4.6, but using the standard GPT 5.4

8 comments

whalesalad

chaostheory 5 days ago

I have the same experience with Antigravity and Gemini CLI, both using Gemini 3 Pro. CLI works on the problem with more effort and time. Meanwhile, antigravity writes shitty python scripts for a few seconds and calls it a day. The agent harness matters a lot

Atotalnoob 5 days ago

Copilot feels like being a caveman, Claude code feels like modern times comparatively.

gtirloni 5 days ago

I think this shows that the model alone isn't the complete story and that these "harnesses" (as people seem to be calling them) shape a lot of the experienced behavior of these tools.

theshrike79 3 days ago

My analogy is that the model is the engine and the harness is the driver and chassis.
You can have the biggest monster of an engine ever, but if you put it in a tricycle and a grandma is driving, you won't get good results.

codebolt 4 days ago

Opus 4.6 has a 200k context limit in Copilot. Could be the issue.

nfg 5 days ago

As a matter of interest are you using the copilot cli?

whalesalad 5 days ago
yeah. copilot cli using opus 4.6 vs claude code using opus 4.6
- nfg 4 days ago
  
  If you could share I’d be really interested in hearing a concrete example of the two behaving differently. I work in Microsoft (not on copilot - though I’m an heavy user, and use Claude code in a personal capacity) and would be quite happy to repro and report back to the copilot cli team who are responsive.