← Back to context

Comment by beardedwizard

9 hours ago

I'm a bit disappointed to see "The critiques here are sharp", a Claude tell, in a response which (to me) is trying to subtly argue that hackerrank is not overly reliant on LLMs.

I'm not sure if your intent was to come across as having written this yourself, but it did not have the effect of improving my perception that this approach is flawed.

I was also disappointed that you didn't address the variability in scores. I'm inferring that you believe the larger model takes care of the main observation in the post, but I don't really see you directly addressing the points.

Maybe it's just me.

There is variability in scores and that's expected given we are eventually using a LLM to score. At least, when I used it 7 months ago, the only way I could avoid it was by keeping the cutoff score low (as low as 10 or 20).

Reading this thread, I'm hoping to minimize the variability even further (even though I know it can't be fully removed).