Comment by bee_rider
6 days ago
I’ve only ever gotten, like, slight wording suggestions from reviewers. I wish they would write things like this instead—it is possibly meaningful and eminently do-able (doesn’t even require new data!).
6 days ago
I’ve only ever gotten, like, slight wording suggestions from reviewers. I wish they would write things like this instead—it is possibly meaningful and eminently do-able (doesn’t even require new data!).
Taking a slightly closer look at the paper, you've got K repositories and create a set of test cases within each repository, totaling 130-ish tests. There may be some 'repository-level' effects - ie, tasks may be easier in some repo's than others.
Modeling the overall success rate then requires some hierarchical modeling. You can consider each repository as a weighted coin, and each test within a repository as flip of that particular coin. You want to estimate the overall probability of getting heads, when choosing a coin at random and then flipping it.
Here's some Gemini hints on how to proceed with getting the confidence interval using hierarchical bayes: https://gemini.google.com/corp/app/e9de6a12becc57f6
(Still no need for further data!)