Comment by Joker_vD

20 days ago

> I am not sure if there is a better way to find the fastest code path besides "measure on the target system", which of course comes with its own challenges.

Yeah, and it's incredibly frustrating because there is almost zero theory on how to write performant code. Will caching things in memory be faster than re-requesting them over network? Who knows! Sometimes it won't! But you can't predict what those times will be beforehand which turns this whole field into pure black magic instead of anything remotely similar to engineering or science, since theoretical knowledge has no relation to reality.

At my last job we had one of the weirdest "memory access is slo-o-o-ow" scenarios I've ever seen (and it would reproduce pretty reliably... after about 20 hours of the service's continuous execution): somehow, due to peculiarities of the GC and Linux physical memory manager, almost all of the working set of our application would end up in a single physical DDR stick, as opposed to being evenly spread across 4 stick the server actually has. Since a single memory stick literally can't cope with such high data throughput, the performance tanked. And it took quite some time us to figure out what the hell was going on because nothing came up on the perf graphs or metrics or whatever: it's just that almost everything in the the application's userspace became slower. No, the CPU is definitely not throttled, it's actually 20–30% idle. No, there is almost zero disk activity, and the network is fine. Huh?!

2 comments

Joker_vD

physicsguy 19 days ago

You do have NUMA to control memory placement, it's not that easy to use though: https://blog.rwth-aachen.de/itc-events/files/2021/02/13-open...

Joker_vD 19 days ago

Well, we didn't, for obvious reasons, patch JVM to manually finagle with physical memory allocation — which it probably wouldn't be able to do anyway, being run in a container.