Comment by namibj

8 years ago

Yes, but it's often a case where deploying finer-grained explicit parallelism works in favor, due to the much longer dependency chains that can be hid by the second thread. There are architectures not impacted by this issue, mostly ones with explicit dependency tagging long enough to handle a dTLB fault, but you need closer to double the registers and interleaving by the compiler to get abotu the same performance as with SMT-2 (aka, hyperthreading).