Comment by menaerus

15 days ago

No VM, no container. I could check the asm later on but sqrtrec is likely "free" because it was optimized away, no fences in the code neither so this might be an artifact of different versions of gcc being used across two different platforms.

As for the sqrt, I don't think it is unusually slow if we compare it against the results from the table above - it's definitely not an outlier since the recorded range is from 1ns to 15ns and I recorded the value of 8ns. Why is that so is not a question here.

Better question is why are your results such a big outlier?

Are you sure they're outliers? Here's someone else with similar results:

https://arkanis.de/weblog/2017-01-05-measurements-of-system-...

Google also reported similar numbers in 2011, when publicizing their fiber work.

I can also get similar numbers (~68ns) on 9front, though a little higher.

  • Data suggests that they are, and common sense too. And your point of reference is a little bit problematic since there's no code attached so it's hard for people to validate the measurements.

    Since you have been laser-focused on sqrt "bad" performance, and obvious optimization with sqrtrec, but also decided to ignore the rest of the results, maybe you can explain why there is such a large difference in your measurements between seemingly very similar platforms in terms of compute. After all this is pure compute problem.

    For example, why does 4.9GHz CPU (AMD Ryzen™ 5 7545U) yield 2x to 4x worse results than 5.5GHz CPU (AMD Ryzen™ 7 9700X)?

        AMD Ryzen 7 9700X Desktop:
        ----------------------------------------------------------------------------
        Benchmark                                  Time             CPU   Iterations
        ----------------------------------------------------------------------------
        bench_getuid                            38.6 ns         38.5 ns     18160546
        bench_getpid                            39.9 ns         39.9 ns     17703749
        bench_close                             45.2 ns         45.1 ns     15711379
        bench_syscall                           42.2 ns         42.1 ns     16638675
        bench_sched_yield                       81.7 ns         81.6 ns      8623522
        
        AMD Ryzen 5 PRO 7545U Laptop:
        ----------------------------------------------------------------------------
        Benchmark                                  Time             CPU   Iterations
        ----------------------------------------------------------------------------
        bench_getuid                             106 ns          106 ns      6581746
        bench_getpid                             111 ns          111 ns      6271878
        bench_close                              116 ns          116 ns      5944154
        bench_syscall                           85.9 ns         85.9 ns      7317584
        bench_sched_yield                        315 ns          315 ns      2249333

    • Because the low power laptop part has rather different characteristics to the desktop part, according to CPUmark benchmarks. It's not surprising that the low power part is slower; it's surprising when the newer/faster part is significantly slower for pure CPU operations. Different compliation flags, I guess.

      Edit: And, apparently, because regardless of what I do with `cpupower`, and twiddling the governors, cpu frequency on this machine is getting scaled. I've run out of time to debug that, I'll update later.

      https://www.cpubenchmark.net/compare/6205vs6367vs4835/AMD-Ry...

      I'm not sure what's up with sched_yield.

      I can also replicate these numbers with `perf bench syscall basic`.

      1 reply →