Comment by adrian_b
15 hours ago
They have instructions for memcpy/memmove (i.e. rep movs), not for strcpy.
They also have instructions for strlen (i.e. rep scasb), so you could implement strcpy with very few instructions by finding the length and then copying the string.
Executing first strlen, then validating the sizes and then copying with memcpy if possible is actually the recommended way for implementing a replacement for strcpy, inclusive in the parent article.
On modern Intel/AMD CPUs, "rep movs" is usually the optimal way to implement memcpy above some threshold of data size, e.g. on older AMD Zen 3 CPUs the threshold was 2 kB. I have not tested more recent CPUs to see if the threshold has diminished.
On the old AMD Zen 3 there was also a certain size range above 2 kB at sizes comparable with the L3 cache memory where their implementation interacted somehow badly with the cache and using "non-temporal" vector register transfers outperformed "rep movs". Despite that performance bug for certain string lengths, using "rep movs" for any size above 2 kB gave a good enough performance.
More recent CPUs might be better than that.
Whoops, this proves I’m not really a userspace assembly programmer…
But you can indeed safely read past the end if a buffer if you don’t cross a page boundary and you aren’t bound by the rules of, say, C.
X86-64 has the REP prefix for string operation. When combined with the MOVS instruction, that is pretty much an instruction for strcpy.
No, it's an instruction for memcpy. You still need to compute the string length first, which means touching every byte individually because you can't use SIMD due to alignment assumptions (or lack thereof) and the potential to touch uninitialized or unmapped memory (when the string crosses a page boundary).
You do aligned reads, which can't crash.
Not even musl uses a scalar loop, if it can do aligned reads/writes: https://git.musl-libc.org/cgit/musl/tree/src/string/stpcpy.c
And you don't need to worry about C UB if you do it in ASM.