← Back to context

Comment by taurath

21 days ago

> Copilot excels at low-to-medium complexity tasks in well-tested codebases, from adding features and fixing bugs to extending tests, refactoring, and improving documentation.

Bounds bounds bounds bounds. The important part for humans seems to be maintaining boundaries for AI. If your well-tested codebase has the tests built thru AI, its probably not going to work.

I think its somewhat telling that they can't share numbers for how they're using it internally. I want to know that Microsoft, the company famous for dog-fooding is using this day in and day out, with success. There's real stuff in there, and my brain has an insanely hard time separating the trillion dollars of hype from the usefulness.

We've been using Copilot coding agent internally at GitHub, and more widely across Microsoft, for nearly three months. That dogfooding has been hugely valuable, with tonnes of valuable feedback (and bug bashing!) that has helped us get the agent ready to launch today.

So far, the agent has been used by about 400 GitHub employees in more than 300 our our repositories, and we've merged almost 1,000 pull requests contributed by Copilot.

In the repo where we're building the agent, the agent itself is actually the #5 contributor - so we really are using Copilot coding agent to build Copilot coding agent ;)

(Source: I'm the product lead at GitHub for Copilot coding agent.)

  • > we've merged almost 1,000 pull requests contributed by Copilot

    I'm curious to know how many Copilot PRs were not merged and/or required human take-overs.

  • > In the repo where we're building the agent, the agent itself is actually the #5 contributor - so we really are using Copilot coding agent to build Copilot coding agent ;)

    Really cool, thanks for sharing! Would you perhaps consider implementing something like these stats that aider keeps on "aider writing itself"? - https://aider.chat/HISTORY.html

    • Nice idea! We're going to try to get together a blog post in the next couple of weeks on how we're using Copilot coding agent at GitHub - including to build Copilot coding agent ;) - and having some live stats would be pretty sweet too.

  • > In the repo where we're building the agent, the agent itself is actually the #5 contributor - so we really are using Copilot coding agent to build Copilot coding agent ;)

    Thats a fun stat! Are humans in the #1-4 slots? Its hard to know what processes are automated (300 repos sounds like a lot of repos!).

    Thank you for sharing the numbers you can. Every time a product launch is announced, I feel like its a gleeful announcement of a decrease of my usefulness. I've got imposter syndrome enough, perhaps Microsoft might want to speak to the developer community and let us know what they see happening? Right now its mostly the pink slips that are doing the speaking.

    • Humans are indeed in slots #1-4.

      After hearing feedback from the community, we’re planning to share more on the GitHub Blog about how we’re using Copilot coding agent at GitHub. Watch this space!

      1 reply →

  • How strong was the push from leadership to use the agents internally?

    As part of the dogfooding I could see them really pushing hard to try having agents make and merge PRs, at which point the data is tainted and you don't know if the 1,000 PRs were created or merged to meet demand or because devs genuinely found it useful and accurate.

  • > 1,000 pull requests contributed by Copilot

    I'd like a breakdown of this phrase, how much human work vs Copilot and in what form, autocomplete vs agent. It's not specified seems more like a marketing trickery than real data

    • The "1,000 pull requests contributed by Copilot" datapoint is specifically referring to Copilot coding agent over the past 2.5 months.

      Pretty much every developer at GitHub is using Copilot in their day to work, so its influence touches virtually every code change we make ;)

      3 replies →

  • So I need to ask: what is the overall goal of your project? What will you do in, say, 5 years from now?

    • What I'm most excited about is allowing developers to spend more of their time working on the work they enjoy, and less of their time working on mundane, boring or annoying tasks.

      Most developers don't love writing tests, or updating documentation, or working on tricky dependency updates - and I really think we're heading to a world where AI can take the load of that and free me up to work on the most interesting and complex problems.

      41 replies →

  • > In the repo where we're building the agent, the agent itself is actually the #5 contributor

    How does this align with Microsoft's AI safety principals? What controls are in place to prevent Copilot from deciding that it could be more effective with less limitations?

    • Copilot only does work that has been assigned to it by a developer, and all the code that the agent writes has to go through a pull request before it can be merged. In fact, Copilot has no write access to GitHub at all, except to push to its own branch.

      That ensures that all of Copilot's code goes through our normal review process which requires a review from an independent human.

      5 replies →

  • What's the motivation for restricting to Pro+ if billing is via premium requests? I have a (free, via open source work) Pro subscription, which I occasionally use. I would have been interested in trying out the coding agent, but how do I know if it's worth $40 for me without trying it ;).

    • Great question!

      We started with Pro+ and Enterprise first because of the higher number of premium requests included with the monthly subscription.

      Whilst we've seen great results within GitHub, we know that Copilot won't get it right every time, and a higher allowance of free usage means that a user can play around and experiment, rather than running out of credits quickly and getting discouraged.

      We do expect to open this up to Pro and Business subscribers - and we're also looking at how we can extend access to open source maintainers like yourself.

  • Question you may have a very informed perspective on:

    where are we wrt the agent surveying open issues (say, via JIRA) and evaluating which ones it would be most effective at handling, and taking them on, ideally with some check-in for conirmation?

    Or, contrariwise, from having product management agents which do track and assign work?

    • Check out this idea: https://news.ycombinator.com/item?id=44030394).

      The entire website was created by Claude Sonnet through Windsurf Cascade, but with the “Fair Witness” prompt embedded in the global rules.

      If you regularly guide the LLM to “consult a user experience designer”, “adopt the multiple perspectives of a marketing agenc”, etc., it will make rather decent suggestions.

      I’ve been having pretty good success with this approach, granted mostly at the scale of starting the process with “build me a small educational website to convey this concept”.

      1 reply →

  • Is Copilot _enforced_ as the only option for an AI coding agent? Or can devs pick-and-choose whatever tool they prefer

    I'm interested in the [vague] ratio of {internallyDevlopedTool} vs alternatives - essentially the "preference" score for internal tools (accounting for the natural bias towards ones own agent for testing/QA/data purposes). Any data, however vague is necessary, would be great.

    (and if anybody has similar data for _any_ company developing their own agent, please shout out).

  • 400 GitHub employees are using GitHub Copilot day in day out, and it comes out as #5 contributor? I wouldn't call that a success. If it is any useful, I would expect that even if a developer write 10% of their code using it, it would hold be #1 contributor in every project.

  • re: 300 of your repositories... so it sounds like y'all don't use a monorepo architecture. i've been wondering if that would be a blocker to using these agents most effectively. expect some extra momentum to swing back to the multirepo approach accordingly

  • What model does it use? gpt-4.1? Or can it use o3 sometimes? Or the new Codex model?

    • At the moment, we're using Claude 3.7 Sonnet, but we're keeping our options open to change the model down the line, and potentially even to introduce a model picker like we have for Copilot Chat and Agent Mode.

      1 reply →

  • When I repeated to other tech people from about 2012 to 2020 that the technological singularity was very close, no one believed me. Coding is just the easiest to automate away into almost oblivion. And, too many non technical people drank the Flavor Aid for the fallacy that it can be "abolished" completely soon. It will gradually come for all sorts of knowledge work specialists including electrical and mechanical engineers, and probably doctors too. And, of course, office work too. Some iota of a specialists will remain to tune the bots, and some will remain in the fields to work with them for where expertise is absolutely required, but widespread unemployment of what were options for potential upward mobility into middle class are being destroyed and replaced with nothing. There won't be "retraining" or handwaving other opportunities for the "basket of labor", but competition of many uniquely, far overqualified people for ever dwindling opportunities.

    It is difficult to get a man to understand something when his salary depends upon his not understanding it. - Upton Sinclair

    • I don't think it was unreasonable to be very skeptical at the time. We generally believed that automation would get rid of repetitive work that didn't require a lot of thought. And in many ways programming was seen almost at the top of the heap. Intellectually demanding and requiring high levels of precision and rigor.

      Who would've thought (except you) that this would be one of the things that AI would be especially suited for. I don't know what this progression means in the long run. Will good engineers just become 1000x more productive as they manage X number of agents building increasingly complex code (with other agents constantly testing, debugging, refactoring and documenting them) or will we just move to a world where we just have way fewer engineers because there is only a need for so much code.

      4 replies →

    • Do you've any textual evidence of this 8-year stretch of your life where you see yourself as being perpetually correct? Do you mean that you were very specifically predicting flexible natural language chatbots, or vaguely alluding to some sort of technological singularity?

      We absolutely have not reached anything resembling anyone's definition of a singularity, so you are very much still not proven correct in this. Unless there are weaker definitions of that than I realised?

      I think you'll be proven wrong about the economy too, but only time will tell there.

  • TBF, you are more than biased to conclude this, I definitely take your opinion with an whole bottle of salt.

    Without data, a comprehensive study and peers review, it's a hell no. Would GitHub willing to be at academic scrutiny to prove it?

  • > In the repo where we're building the agent, the agent itself is actually the #5 contributor - so we really are using Copilot coding agent to build Copilot coding agent ;)

    Ah yes, the takeoff.

From talking to colleagues at Microsoft it's a very management-driven push, not developer-driven. Friend on an Azure team had a team member who was nearly put on a PIP because they refused to install the internal AI coding assistant. Every manager has "number of developers using AI" as an OKR, but anecdotally most devs are installing the AI assistant and not using it or using it very occasionally. Allegedly it's pretty terrible at C# and PowerShell which limits its usefulness at MS.

  • [flagged]

    • That's exactly what senior executives who aren't coding are saying everywhere.

      Meanwhile, engineers are using it for code completion and as a Google search alternative.

      I don't see much difference here at all, the only habit to change is learning to trust an AI solution as much as a Stack Overflow answer. Though the benefit of SO is each comment is timestamped and there are alternative takes, corrections, caveats in the comments.

      5 replies →

    • What does this have to do with my comment? Did you mean to reply to someone else?

      I don't understand what this has to do with AI adoption at MS (and Google/AWS, while we're at it) being management-driven.

    • There's a large group of people that claim that AI tools are no good and I can't tell if they're in some niche where they truly aren't, they don't care to put any effort into learning the tools, or they're simply in denial.

      4 replies →

    • It's just tooling. Costs nothing to wait for it to be better. It's not like you're going miss out on AGI. The cost of actually testing every slop code generator is non-trivial.

> I want to know that Microsoft, the company famous for dog-fooding is using this day in and day out, with success

Have they tried dogfooding their dogshit little tool called Teams in the last few years? Cause if that's what their "famed" dogfooding gets us, I'm terrified to see what lays in wait with copilot.

I feel like I saw a quote recently that said 20-30% of MS code is generated in some way. [0]

In any case, I think this is the best use case for AI in programming—as a force multiplier for the developer. It’s for the best benefit of both AI and humanity for AI to avoid diminishing the creativity, agency and critical thinking skills of its human operators. AI should be task oriented, but high level decision-making and planning should always be a human task.

So I think our use of AI for programming should remain heavily human-driven for the long term. Ultimately, its use should involve enriching humans’ capabilities over churning out features for profit, though there are obvious limits to that.

[0] https://www.cnbc.com/2025/04/29/satya-nadella-says-as-much-a...

  • You might want to study the history of technology and how rapidly compute efficiency has increased as well as how quickly the models are improving.

    In this context, assuming that humans will still be able to do high level planning anywhere near as well as an AI, say 3-5 years out, is almost ludicrous.

    • Reality check time for you: people were saying this exact thing 3 years ago. You cannot extrapolate like that.

"I want to know that Microsoft, the company famous for dog-fooding is using this day in and day out, with success."

They just cut down their workforce, letting some of their AI people go. So, I assume there isn't that much success.

> Microsoft, the company famous for dog-fooding

This was true up around 15 years ago. Hasn't been the case since.

Whatever the true stats for mistakes or blunders are now, remember that this is the worst its ever going to be. And there is no clear ceiling in sight that would prevent it from quickly getting better and better, especially given the current levels of investment.

  • That sounds reasonable enough, but the pace or end result is by no means guaranteed.

    We have invested plenty of money and time into nuclear fusion with little progress. The list of key acheivments from CERN[1] is also meager in comparison to the investment put in, especially if you consider their ultimate goal to ultimately be towards applying research to more than just theory.

    [1] https://home.cern/about/key-achievements