Comment by kccqzy
1 year ago
It's pretty cheap. You can easily find the latency and throughput numbers on different Intel architectures. Here's an example for movdqa: https://www.intel.com/content/www/us/en/docs/intrinsics-guid... which is a basic 128-bit load. Even a 512-bit load isn't much more expensive: https://www.intel.com/content/www/us/en/docs/intrinsics-guid...
No comments yet
Contribute on Hacker News ↗