Comment by wucke13

8 months ago

Tangentially relevant: Gernots list of benchmarking crimes.

https://gernot-heiser.org/benchmarking-crimes.html

10 comments

wucke13

Ha, I thought this would be a really useful resource but I think the people the author is complaining about do much better than most benchmarking I see in the industry

Almost all the benchmarking results I see is just a percentage difference between two algebraic means, no statistical analysis whatsoever.

Very common interaction: QA folks say "your change degraded some of our metrics and improved some others". I know they are full of shit because it's impossible that my change improved any perf metrics. I ask for statistical details, they don't have any, this meeting was a waste of time, it will be next time too.

The fact that I get these reactions suggests that everyone else just lets each other get away with it.

porridgeraisin 8 months ago
Yep. The most recent example that's stuck in my head is actually much worse: they didn't even take the mean! One sample!
https://github.com/denoland/pm-benchmark
Check the run bench shell script (there's not much else in the repo anyways)
- genewitch 8 months ago
  
  Hey, that's perfectly valid for arguing with your friend about which one to deploy on our server, all things equal.
  I do this sort of thing to see what tools are faster all the time. ripgrep, ag(silver searcher), grep, MongoDB was one we were arguing about for a while recently.
  
  7 replies →