Comment by iamflimflam1

1 day ago

From reading the article. They offered their developers both Claude code and Copilot.

What they wanted was for them to use both and feedback which was better.

The developers voted with their feet and didn’t use Copilot.

What Microsoft were hoping was that the opposite would happen...

For months, Employees had the option to choose claude code or copilot. Now they dont.

Underlying model choice still has no restrictions. Opus 4.6 is by far the most popular. there's still big $$$ bills going anthropic's way.

  • Curious if anyone around here stayed on 4.6 (having a choice to use 4.7)

    • I went to 4.7, didn't have a choice, found it unsatisfactory, then Claude quietly added in the option to use 4.6, so I'm back on 4.6, and I'm not the only one in my company.

      I had far more hallucinations with 4.7 than 4.6.

      I'll try it again after a few more months for them to get it right, but 4.6 is what changed my mind on LLMs as a tool, and 4.7 felt like a step backwards, so for now I'm sticking with something that has delivered me value, instead of arguing with a model ostensibly better that was making shit up 1 - 2 times a day. It was really disappointing.

      I can give examples if needed, I screenshotted the most aggravating ones, but what worries me is which ones I didn't recognise.

      12 replies →

    • I have stuck with 4.6. I fully believe 4.7 can be smarter for truly complex and long running agentic use. But I prefer the more direct, literal mechanistic style and 4.6 seems to be peak Opus for that.

    • Stay with 4.6 if you can, it is disabled (afaik) on vscode claude code extension.

      4.7 IMO is around 10-20% worse at understanding your prompt intention. You need more effort to explain your intention clearer so it doesn't divert.

      10 replies →

    • 4.7 turned out to be a disaster in multilingual settings, so I sticked to 4.6 so far. 4.7 seemed to be optimized for (very specific slice of) coding at the expense of everything else.

      1 reply →

    • I still use 4.6 if I need Opus. It's mostly GPT-5.5 for me. Only if I know it cannot do some thing like push without running the tests (because AGENTS.md said so), I switch to 4.6.

      Although GPT's been acting weird since Thursday...

    • I’ve stayed on 4.6. Was thinking of trying 4.7 though just today. Still, I did not jump on it day one.

    • Switched back when 4.7 had an issue last week and it was wayyy faster. I assume mostly because a lot of people have moved off but might consider using it more just for the speed boost.

    • I don't want to change from 4.6 because I'm finding it so good (I could change).

      I've spent the last couple of days building Swift bindings to a monster CPP lib and I've actually had fun.

    • i use 4.6 and i've configured advisor to be on 4.7, so, when something's more complex the advisor can help. at least that's how i do with claude code, not sure of the others have implemented the concept of advisors.

  • Wouldn't they be forced into API pricing instead of per-seat like that though? That would potentially be a massive cost increase. But I've discovered through talking to colleagues some companies are already doing that. I can't understand why you'd ever do that when you can get VC subsidized pricing for now. At least for all initial in-plan usage. I doubt many developers go past the limit anyway and for those you switch just the extra usage to on demand anyway.

    • Teams is the only one with seat pricing. Teams has a user cap of 150. Enterprise is usage based pricing only now (with a £20/user service charge)

  • I use copilot cli and I can pick Anthropic models. The Microsoft interface seems fine to me, and equivalent. Not sure what the big deal is.

    • Funny I had the opposite experience. The Claude models seemed equivalent to GPT-5.4/5 in a generic harness like Copilot CLI or Opencode or Pi, but Claude Code the app/harness is so much better than all the others that I switched at work, even though I'd much prefer to use a non-proprietary harness (and eventually I do want to get Pi set up to be comparable).

      1 reply →

    • Harness makes a difference. Also in copilot you have smaller context for Claude models.

      And you get a token based pricing since June 1.

    • Anthropic's Claude harness is much better than Copilot, i.e. the tools and instructions in each harness are different. Anthropic is just that much better (for claude models, likely an amount of co-development).

      Personally, I looked into Copilot's prompt and saw things that made me put it down immediately to start working on my own. I'm now using OpenCode for reasons and I like it better than any Big Ai tool. Using OC with Qwen3.6-MoE (for context) and generally happy with the results.

> The developers voted with their feet and didn’t use Copilot.

This was true in January -- since then, the Copilot CLI team has spent countless hours with engineering leaders and the biggest Claude Code users at the company to understand Copilot's shortcomings, define evals to properly test them head-to-head, and close the gap between the products.

The result? Claude Code usage was organically decreasing and Copilot CLI usage was organically increasing -- when this announcement was made, internal Copilot CLI usage had been greater than Claude Code usage for weeks!

Most of us never had the option for work to pay for Claude Code -- some internal orgs did this. That being said I had a personal Claude Code subscription for a bit.

Honestly I find GitHub Copilot CLI (and now also the new GitHub Copilot app) quite decent. I mostly use it with Opus 4.7, or rarely with GPT-5.5. The VSCode extension is ok, but CLI or app are the better experience IMO.

I wish I could understand the appeal of using Claude Code inside VScode rather than Copilot. I feel like I'm missing something obvious.

  • I'm with you there. I can't stand the CLI that wants to take you away from the mostly bad code it writes. Give me the structure, let me finesse it - to do that I need to actually see it no matter how much Anthropic pretends that it's perfect.

    • I run Claude code inside an emacs vterm for moderately long lived work streams, and an ever shifting set of tmuxes for quick small features or bug fixes. The way I ensure I read the code at least a bit is the same as for wholly hand written code: I never do git add . only for one file at a time, and I got diff each file just prior to adding it (except sometimes for code genned files). I also arrange mostly to do incremental dev, sort of agile where I am the client and claude is the dev team and I check the utility of each feature one by one, so what I end up with delights me. It does tend to do more than is needed, so I will mostly delete code it has written rather than fix things. Like really not every module tunable constant needs to be over rideable from env vars. I am happy with the resulting systems, they have not collapsed into unmaintainable messes yet; the Claude in vterm in emacs is nice where I can think and run shell commands and look at code or git history while having a longer running discussion is nice UX.

    • I just have git diff open in another terminal. Everything I do is in the terminal.

  • Slightly related (me not understanding) is why the Copilot in VS code is essentially just CLI interface. Why can't it use the IDE tools (search, LSP, ...). All it ever does is trying to execute grep.

    • Claude’s prompt heavily pushes it towards grep. We have an internal cross repo semantic search mcp and to get Claude to consistently use it a skill and prompting was not enough. A pre tool use hook is the answer. Claude will even write one for you if you describe the problem to it :)

    • Someone mentioned here the other day that when you try and give Claude those tools throughan MCP or skill it tends to go a bit loopy.

      At the moment it seems like the way it's been trained has been tightly coupled with grep.

      It does feel bizarre though that it doesn't use the symbol servers.

    • Because it’s far far easier to make a text-generation machine generate text that has decades of how-to explanations on the Internet than to correctly work an internal editor API that changes often and isn’t as well-documented.

      Especially if you want effective results.

      2 replies →

  • > I wish I could understand the appeal of using Claude Code inside VScode rather than Copilot

    MS thinks CoPilot is the Clark Griswold of LLMs when it's really Cousin Eddie...

  • Same, with regard to TUIs in general. The VS code copilot chat extension has really nice integration for 'human in the loop' style agentic development. I build some tooling - https://www.agentkanban.io to integrate a taskboard and git worktrees with copilot chat

  • Claude Code will write the whole thing for you. Whereas doesn’t Copilot require input along the way of coding? ie- it doesn’t do all the programming for you

    • It can code the whole thing for you, copilot in vscode is simply better, people just never tried it.

  • I'm a little the opposite, what's the point of using an IDE with AI? I genuinely don't get it?

    These days I just use Claude Code Desktop or Claude Code in powershell. Standalone, not inside and IDE. Honestly, I'm using Desktop more and more as it gets more features.

    The IDE is for me. No AI in it at all. If I want to get Claude to do something specific to a file I just @ the file.

    • Productivity. You generate the skeleton of the code with Codex/Claude Code/et. al. and refactor it manually. It's kind of unlikely that an AI agent will be able to one-shot every bit of code in the exact way you want, even with a fat AGENTS.md file. With a smart AI-native IDE like Zed, it will quickly be able to pick up what manual change you intent to do without you fully typing out anything, especially if they're repetitive. This helps enormously when you're debugging or profiling your code.

      3 replies →

    • the obvious answer is because it's easier , faster, and more efficient to flip a true to false right in front of you than it is to prompt an llm.

      if your response is "my prompts don't produce code that needs values flipped, ever." then I would wager you're only touching very simple things with an LLM.

      for me I don't care about the token cost and prompt writing so much as the fact that it's just faster to change 0 to 1 and leaves me twiddling my thumbs for an llm output less.

      8 replies →

    • That’s like asking why anyone would use IDE autoformatting, linting, or build tools rather than constantly swapping to a terminal to run their command line versions. As in, why use tool integration in an integrated development environment? Because that’s the entire point. Classic IDE refactoring and code generation tools are limited to explicitly programmed operations, but a well-integrated LLM can do much more and smarter manipulations without you having to context switch and explain the context of what you want done.

      1 reply →

    • > what's the point

      Tab completion.

      Smart model can cut down time to write complex firewall yaml dramatically, relying both on the existing file and the ugly draft (eg comma delimited details of the rules I need) I put out. It makes it 5 minutes lead time and 20 presses of tab instead of writing a shell/python full of edge cases or just copying existing rules as a template and laborously editing them -- smart model knows what the specific firewall needs.

      But I'm not a developer, so I use both - haiku via github for tab completion and CC for cli.

    • For Windsurf at least, it makes it easier to control context. I can simply drag and drop a file from the IDE into the chat.

      I can also click on a file referenced by the AI and have it open immediately in the IDE so that I can inspect it.

      Finally, it is a pain to write long, multi-line prompts in a CLI where you can't easily click around to edit different parts.

      The primary weakness I've found in IDE based UI is that it struggles to get through the corporate security in order to run commands.

    • For me I need to compare the code generated before committing. Also I need to read markdown plans generated for review before commit to execution. VSCode CC extension also generate clickable links to the file directly if the query has something to do with it.

      All of them are valid usecase of VSCode CC extension for me.

Microsoft have historically tended to dogfood their own products.

Obviously you want to be aware of what else is on the market, and use the right tool for the job -- but equally if you have a directly competing product, you'd prefer your org's telemetry and suggestions are directed towards improving your own software rather than your competitors'.

  • This was always a little weird to be because Microsoft internally is actively hostile to cross-org collaboration. If you worked in most of Azure you basically have 0 lanes of communication with someone from the Windows team and vice versa. Triply so for stuff like Kusto or Teams which you'd be dogfooding daily. I guess if there's a horrible stop the world bug it'd get surfaced through telemetry but normal user feedback is not a thing.

    Compared to working at other big techs, where I was able to direct msg the engineers on the team for internal protobuf or datalake services in addition to user groups that were generally responsive it was just strange. Also Microsoft doesn't have a monorepo so you can't just commit patches to their service because you don't have access to their repos which I pretty regularly do elsewhere.

    • > Microsoft internally is actively hostile to cross-org collaboration

      The Copilot CLI has ushered in the beginning of a change in this dogma -- I've helped dozens of Microsoft engineers get access to GitHub source code so they can contribute to Copilot CLI! It's fun to subvert expectations when a Microsoft IC pitches an improvement and I can respond with "submit a PR!"

Maybe it's just Microsoft moving to more model agnostic tech within their copilot. I recently started using Microsoft 365 Copilot because corporate added Cowork which runs on Opus 4.7 which was better than the alternative we have available. Unlike the "real" Claude Code or Cowork this only has access to files in a specific onedrive folder in your personal sharepoint container, so it's much more compliant to things like NIS2.

Technically we're using Copilot and we're playing for it through Microsoft licenses, but it's using Opus 4.7. Even before this, most of our custom agents within m365 copilot were one of the GPT models.

Or maybe you're right and they want their developers to use the copilot models.

  • Copilot Cowork seems to be the best part of M365 Copilot by a huge margin.

    • I really dislike that I can't customize it with permanent config files, similar to how I can configure a regular GPT model agen. I guess it's probably because it's in the fancy word they use for "beta".

      I haven't really used any other Copilot product in a while since they were so bad compared to our other corporate options, but I'm rather impressed with Cowork inside it. Exactly because we can actually use it without breaking any EU laws.

Copilot was great when folks were semi-attempting write their own coffee and needed auto complete.

There's a large (and growing!) contingent of people who don't write code these days. (Many don't even use the keyboard.)

Wonder if Amazon will do the same with CC and Kiro now that we internally have access to both.

I think Kiro might have some “first mover” advantage internally, but CC feels better to use.

  • I never understand why Amazon even bothers to build their own coding agent.

    GitHub Copilot is in a somewhat similar place as Microsoft's toy but still different -- it was more or less the first coding agent/assistant, and GitHub/VSCode/Microsoft has enough user base and impact to influence individual users and enterprises' choices.

    For Amazon's coding agent -- I just never see anyone outside Amazon even mentions Kiro or Amazon Q. Maybe a little bit when Kiro was offering tons of free credits. But I don't think it's even remotely relevant these days. I don't see news about companies adopting Kiro.

    To me, it's just a matter of time before they are sunset, like Chime or a bunch of AWS products.

    • In fairness, Chime had tons of internal use and I quite liked it.

      For Kiro, I agree with you, it seems like wasted effort and Anthropic / OpenAI are miles ahead in their tooling.

    • Is there any proprietary Amazon end-dev/ops facing service that's worth using? I've never had a good experience with any I've tried - CodeBuild, Cloud9, Q, SageMaker, WorkMail, WorkDocs, Chime, OpsWorks,...

      I love AWS at the infrastructure level, but their PaaS tends to be meh, and their end-user directed stuff is usually atrocious.

      1 reply →