Comment by dakshgupta

22 days ago

I agree that none perform _super_ well.

I would argue they go far beyond linters now, which was perhaps not true even nine months ago.

To the degree you consider this to be evidence, in the last 7 days, the authors of a PR has replied to a Greptile comment with "great catch", "good catch", etc. 9,078 times.

14 comments

dakshgupta

onedognight 22 days ago

I fully agree. Claude’s review comments have been 50% useful, which is great. For comparison I have almost never found a useful TeamScale comment (classic static analyzer). Even more important, half of Claude’s good finds are orthogonal to those found by other human reviewers on our team. I.e. it points out things human reviewers miss consistently and v.v.

Sharlin 22 days ago

TBH that sounds like TeamScale just has too verbose default settings. On the other hand, people generally find almost all of the lints in Clippy's [1] default set useful, but if you enable "pedantic" lints, the signal-to-noise ratio starts getting worse – those generally require a more fine-grained setup, disabling and enabling individual lints to suit your needs.
[1] https://doc.rust-lang.org/stable/clippy/

blibble 22 days ago

> To the degree you consider this to be evidence, in the last 7 days, the authors of a PR has replied to a Greptile comment with "great catch", "good catch", etc. 9,078 times.

do you have a bot to do this too?

BlackFly 21 days ago

For it to be evidence, you would need to know the number of Greptile comments made and how many of those comments were instead considered to be poor. You need to contrast false positive rate with true positive rate to simply plot a single point along a classifier curve. You would then need to contrast that with a control group of experts or a static linter which means you would need to modify the "conservativeness" of the classifier to produce multiple points along its ROC curve, then you could compare whether the classifier is better or worse than your control by comparing the ROC curves.

Sample number of true positives says more or less nothing on its own.

boredtofears 22 days ago

That sounds more like confirmation that greptile is being included in a lot agentic coding loops than anything

johnsillings 22 days ago

I like number of "great catches" as a measure of AI code review effectiveness

mulmboy 22 days ago

People more often say that to save face by implying the issue you identified would be reasonable for the author to miss because it's subtle or tricky or whatever. It's often a proxy for embarrassment

estimator7292 22 days ago
When mature, funtional adults say it, the read is "wow, I would have missed that, good job, you did better than me".
Reading embarrassment into that is extremely childish and disrespectful.
- mulmboy 22 days ago
  
  What I'm saying is that a corporate or professional environment can make people communicate in weird ways due to various incentives. Reading into people's communication is an important skill in these kinds of environments, and looking superficially at their words can be misleading.

written-beyond 22 days ago

I mean how far Rusts own clippy lint went before any LLMs was actually insane.

Clippy + Rusts type system would basically ensure my software was working as close as possible to my spec before the first run. LLMs have greatly reduced the bar for bringing clippy quality linting to every language but at the cost of determinism.

tadfisher 22 days ago

Not trying to sidetrack, but a figure like that is data, not evidence. At the very minimum you need context which allows for interpretation; 9,078 positive author comments would be less impressive if Greptile made 1,000,000 comments in that time period, for example.

fragmede 22 days ago

over 7 days does contextualize it some, though.
9,078 comments / 7 (days) / 8 (hours) = 162.107 though, so if human that person is making 162 comments an hour, 8 hours a day, 7 days a week?
shimman 21 days ago

Bro stop trying to deflate the boosters, they got wares to sell and shares to dump.