← Back to context

Comment by dalvrosa

8 hours ago

Agreed. For benchmarking I used this <https://github.com/david-alvarez-rosa/CppPlayground/blob/mai...> which relies on GoogleBenchmark and pins producer/consumer threads to dedicated CPU cores

What else could be improved? Would like to learn :)

Maybe using huge pages?

kernel tickrate is a pretty big one, most people don't bother and use what their OS ships with.

Disabling c-states, pinning network interfaces to dedicated cores (and isolating your application from those cores) and `SCHED_FIFO` (chrt -f 99 <prog>) helps a lot.

Transparent hugepages increase latency without you being aware of when it happens, I usually disable that.

Idk, there's a bunch but they all depend on your use-case. For example I always disable hyperthreading because I care more about latency than processing power- and I don't want to steal cache from my workload randomly.. but some people have more I/O bound workloads and hyperthreading is just and strict improvement in those situations.

  • Thanks. Do you happen to know why hyperthreading should be disabled?

    In prod most trading companies do disable it, not sure about generic benchmarks best practices

    • It eliminates cache contention between siblings, which leads to increased latency (randomly)

    • There are some microarchitectural resources that are either statically divided between running threads, or "cooperatively" fought over, and if you don't need to hide cache miss latency, which is the only thing hyperthreading is really good at, you're probably better off disabling the supernumerary threads.