For what's written in assembly, lack of portability is a given. The only exceptions would presumably be high level entry points called to from C, etc. If you wanted to support multiple targets, you have completely separate assembly modules for each architecture at least. You'd even need to bifurcate further for each simd generation (within x64 for example).
There indeed have been bugs caused by amd64 assembly code assuming unix calling convention being used for Windows builds and causing data corruption. You have to be careful.
For what's written in assembly, lack of portability is a given. The only exceptions would presumably be high level entry points called to from C, etc. If you wanted to support multiple targets, you have completely separate assembly modules for each architecture at least. You'd even need to bifurcate further for each simd generation (within x64 for example).
Yes, but on projects like that, ease of maintenance is a secondary priority when compared to performance or throughput.
There indeed have been bugs caused by amd64 assembly code assuming unix calling convention being used for Windows builds and causing data corruption. You have to be careful.
SIMD instructions are already architecture dependent