← Back to context

Comment by menaerus

16 days ago

Interesting because on my machine I can reproduce the results. It's a pretty hefty 5.3GHz and recentish (Raptor Lake) Intel i7-13850HX CPU:

  ----------------------------------------------------------------------------
  Benchmark                                  Time             CPU   Iterations
  ----------------------------------------------------------------------------
  bench_getuid                             384 ns          384 ns      1822307
  bench_getpid                             382 ns          382 ns      1835289
  bench_close                              390 ns          390 ns      1796493
  bench_syscall                            374 ns          374 ns      1874165
  bench_sched_yield                        611 ns          611 ns      1143456
  bench_clock_gettime                     44.1 ns         44.1 ns     15872740
  bench_clock_gettime_tai                 44.1 ns         44.1 ns     15879915
  bench_clock_gettime_monotonic           44.1 ns         44.1 ns     15887383
  bench_clock_gettime_monotonic_raw       44.4 ns         44.4 ns     15755225
  bench_nanosleep0                       55617 ns         4647 ns       100000
  bench_nanosleep0_slack1                 7144 ns         4362 ns       160448
  bench_nanosleep1_slack1                 7159 ns         4369 ns       160645
  bench_pthread_cond_signal               7.38 ns         7.38 ns     94670062
  bench_assign                           0.523 ns        0.523 ns   1000000000
  bench_sqrt                              8.04 ns         8.04 ns     86998912
  bench_sqrtrec                           11.4 ns         11.4 ns     61428535
  bench_nothing                          0.000 ns        0.000 ns   1000000000

EDIT: also reproducible on my skylake-x (Gold 6152) machine

With turbo-boost @3.7Ghz enabled:

  ----------------------------------------------------------------------------
  Benchmark                                  Time             CPU   Iterations
  ----------------------------------------------------------------------------
  bench_getuid                             619 ns          616 ns      1153007
  bench_getpid                             632 ns          627 ns      1150829
  bench_close                              629 ns          626 ns      1110226
  bench_syscall                            617 ns          613 ns      1160239
  bench_sched_yield                        974 ns          969 ns       702773
  bench_clock_gettime                     17.9 ns         17.8 ns     39368735
  bench_clock_gettime_tai                 17.8 ns         17.7 ns     39109544
  bench_clock_gettime_monotonic           17.9 ns         17.8 ns     39591364
  bench_clock_gettime_monotonic_raw       19.0 ns         18.8 ns     38902038
  bench_nanosleep0                       63993 ns         4381 ns       100000
  bench_nanosleep0_slack1                 7445 ns         2115 ns       328474
  bench_nanosleep1_slack1                 7346 ns         2111 ns       334833
  bench_pthread_cond_signal               2.13 ns         2.12 ns    327903411
  bench_assign                           0.167 ns        0.166 ns   1000000000
  bench_sqrt                              1.87 ns         1.85 ns    374885774
  bench_sqrtrec                          0.000 ns        0.000 ns   1000000000
  bench_nothing                          0.000 ns        0.000 ns   1000000000

With turbo-boost disabled (@2.1GHz base frequency):

  ----------------------------------------------------------------------------
  Benchmark                                  Time             CPU   Iterations
  ----------------------------------------------------------------------------
  bench_getuid                            1019 ns         1012 ns       688965
  bench_getpid                            1057 ns         1048 ns       688020
  bench_close                             1039 ns         1029 ns       684537
  bench_syscall                           1010 ns         1003 ns       696919
  bench_sched_yield                       1653 ns         1642 ns       434212
  bench_clock_gettime                     30.7 ns         30.4 ns     22999055
  bench_clock_gettime_tai                 30.5 ns         30.2 ns     23716873
  bench_clock_gettime_monotonic           29.8 ns         29.6 ns     23643198
  bench_clock_gettime_monotonic_raw       30.5 ns         30.3 ns     23277717
  bench_nanosleep0                       65256 ns         5114 ns       100000
  bench_nanosleep0_slack1                11649 ns         3402 ns       197983
  bench_nanosleep1_slack1                11572 ns         3528 ns       209371
  bench_pthread_cond_signal               3.62 ns         3.60 ns    195696177
  bench_assign                           0.255 ns        0.253 ns   1000000000
  bench_sqrt                              3.13 ns         3.10 ns    225561559
  bench_sqrtrec                          0.000 ns        0.000 ns   1000000000
  bench_nothing                          0.000 ns        0.000 ns   1000000000

I wonder why your results are so much different. Mine almost linearly scale with the core frequency.

Something is definitely up. Is there a VM? are you running in a container with seccomp?

Why are your calls to sqrt so slow on your newest machine? Why is sqrtrec free on the others?

  • No VM, no container. I could check the asm later on but sqrtrec is likely "free" because it was optimized away, no fences in the code neither so this might be an artifact of different versions of gcc being used across two different platforms.

    As for the sqrt, I don't think it is unusually slow if we compare it against the results from the table above - it's definitely not an outlier since the recorded range is from 1ns to 15ns and I recorded the value of 8ns. Why is that so is not a question here.

    Better question is why are your results such a big outlier?