← Back to context

Comment by NohatCoder

2 years ago

We do actually have some methods of calculating an expected performance. For instance we know that a Zen4 CPU can do 4 256 bit operations per clock, with some restrictions on what combinations are allowed. We are never going to hit 4 outright in real code, but 3.5 is a realistic target for well optimised code. We can use 1 instruction to detect newline characters within those 32 bytes, then a few more to find the exact location, then a couple to determine if the line is a result, and a few more to extract that result. Given a high density of newlines this will mean something on the order of 10 instructions per 32 B block searched. Multiply the numbers and we expect to process approximately 11 B per clock cycle. On a 5 GHz CPU that would mean we would expect to be done in 32 ms, give or take. And the data would of course need to be in memory already for this time to be feasible, as loading it from disk takes appreciably longer.

Of course you have to spend some effort to actually get code this fast, and that probably isn't worth it for the one-shot job. But jobs like compression, video codecs, cryptography and that newfangled AI stuff all have experts that write code in this manner, for generally good reasons, and they can all ballpark how a job like this can be solved in a close to optimal fashion.