Comment by overfeed
21 days ago
> we've merged almost 1,000 pull requests contributed by Copilot
I'm curious to know how many Copilot PRs were not merged and/or required human take-overs.
21 days ago
> we've merged almost 1,000 pull requests contributed by Copilot
I'm curious to know how many Copilot PRs were not merged and/or required human take-overs.
textbook survivorship bias https://en.wikipedia.org/wiki/Survivorship_bias
every bullet hole in that plane is the 1k PRs contributed by copilot. The missing dots, and whole missing planes, are unaccounted for. Ie, "ai ruined my morning"
It's not survivorship bias. Survivorship bias would be if you made any conclusions from the 1000 merged PRs (eg. "90% of all merged PRs did not get reverted"). But simply stating the number of PRs is not that.
As with all good marketing, the conclusions omitted and implied, no?
4 replies →
Given that Github is continuing with the product and marketing to us it feels sufficient to count that as a conclusion.
If they measured that too it would make it harder to justify a MSFT P/E ratio of 29.6.
I'm curious how many were much more than Dependabot changes.
I see number of PRs as modern LOC, something that doesn't tell me anything about quality.
"We need to get 1000 PRs merged from Copilot" "But that'll take more time" "Doesn't matter"
I do agree that some scepticism is due here but how can we tell if we're treading into "moving the goal posts" territory?
I'd love to know where you think the starting position of the goal posts was.
Everyone who has used AI coding tools interactively or as agents knows they're unpredictably hit or miss. The old, non-agent Copilot has a dashboard that shows org-wide rejection rates for for paying customers. I'm curious to learn what the equivalent rejection-rate for the agent is for the people who make the thing.
I think the implied promise of the technology, that it is capable of fundamentally changing organizations relationships with code and software engineering, deserves deep understanding. Companies will be making multi million dollar decisions based on their belief in its efficacy
When someone says that the number given is not high enough. I wouldn't consider trying to get an understanding of PR acceptance rate before and after Copilot to be moving the goal posts. Using raw numbers instead of percentages is often done to emphasize a narrative rather than simply inform (e.g. "Dow plummets x points" rather than "Dow lost 1.5%").
I feel the same about automated dependency updates, but if your tests and verifications are good, these become trivial.
Sometimes there are some paradigms shift in the dependency that get past the current tests you have. So it’s always good to read the changelog and plan the update accordingly.
Strong automated tests and verifications seem to be nearly as rare as unicorns, at least if you take most of developers feelings on this.
It seems places don't prioritize it, so you don't see it very often. Some developers are outright dismissive of the practice.
Unfortunately, AI won't seemingly help with that