← Back to context

Comment by wucke13

2 days ago

Tangentially relevant: Gernots list of benchmarking crimes.

https://gernot-heiser.org/benchmarking-crimes.html

Ha, I thought this would be a really useful resource but I think the people the author is complaining about do much better than most benchmarking I see in the industry

Almost all the benchmarking results I see is just a percentage difference between two algebraic means, no statistical analysis whatsoever.

Very common interaction: QA folks say "your change degraded some of our metrics and improved some others". I know they are full of shit because it's impossible that my change improved any perf metrics. I ask for statistical details, they don't have any, this meeting was a waste of time, it will be next time too.

The fact that I get these reactions suggests that everyone else just lets each other get away with it.

  • Yep. The most recent example that's stuck in my head is actually much worse: they didn't even take the mean! One sample!

    https://github.com/denoland/pm-benchmark

    Check the run bench shell script (there's not much else in the repo anyways)

    • Hey, that's perfectly valid for arguing with your friend about which one to deploy on our server, all things equal.

      I do this sort of thing to see what tools are faster all the time. ripgrep, ag(silver searcher), grep, MongoDB was one we were arguing about for a while recently.

      4 replies →