Disable SMT/Hyperthreading in all Intel BIOSes

8 years ago (marc.info)

Does this mean AMD hyperthreading has a performance + security advantage over currently shipping Intel processors?

Edit: https://www.amd.com/en/corporate/security-updates

> 8/14/18 – Updated: As in the case with Meltdown, we believe our processors are not susceptible to these new speculative execution attack variants: L1 Terminal Fault – SGX (also known as Foreshadow) CVE 2018-3615, L1 Terminal Fault – OS/SMM (also known as Foreshadow-NG) CVE 2018-3620, and L1 Terminal Fault – VMM (also known as Foreshadow-NG) CVE 2018-3646, due to our hardware paging architecture protections. We are advising customers running AMD EPYC™ processors in their data centers, including in virtualized environments, to not implement Foreshadow-related software mitigations for their AMD platforms.

For those on AMD platforms, how do you disable software mitigations for Foreshadow? Is this automatically done by browsers, operating systems and hypervisors?

  • They've been unsusceptible to some of the more recent side-channel attacks, but others have affected everything from Intel to AMD to ARM.

    I haven't really ready anything on this most recent set of vulnerabilities and AMD.

    • There is a lot of false equivalence going on about the vulnerability of Intel vs everyone else.

      Yes spectre type issues effect many ranges of processor (including AMD), however everything shown to date indicates that its very difficult to exploit.

      There are a large number of Intel only issues like meltdown and the L1 cache attack that are much more severe, and much easier to exploit.

  • With the newer AMD processors having as many real cores as they do, does the cost-benefit analysis of HT/SMT change? I read in a comment here a few weeks ago that turning it off on the newer AMD CPUs can yield better performance because of improved cache-coherency on some workloads (My memory of what I read might be totally wrong).

    • Yes, but it's often a case where deploying finer-grained explicit parallelism works in favor, due to the much longer dependency chains that can be hid by the second thread. There are architectures not impacted by this issue, mostly ones with explicit dependency tagging long enough to handle a dTLB fault, but you need closer to double the registers and interleaving by the compiler to get abotu the same performance as with SMT-2 (aka, hyperthreading).

    • I'll defer to whatever the benchmarks say of course, but I don't see why HT would affect cache coherency for normal workloads. If you disable HT you'd still have the same number of threads/processes running on the system, so you still have to schedule the same amount of work and do the same number of context switches.

      2 replies →

    • some of the heavily multithreaded applications I use at work see up to a ~27% loss in performance by disabling SMT, others don't see much of a loss at all (on AMD EPYC)

For anyone (understandably) confused about the attacks and mitigations related to L1TF, I've found the linux kernel documentation on the mitigations[0] to be a great resource.

One interesting thing is that to mitigate L1TF hyperthreads only need to be disabled if you are running VMs, the userspace mitigations are effective regardless of HT status. However, there's a catch, you can leave hyperthreading enabled if you disable the Extended/Nested Page Table virtualization feature. However it is noted that this will result in a significant performance impact.

[0] https://marc.info/?l=openbsd-cvs&m=152818076013158&w=2

We have disabled Hyper Threading(HT) on all public facing servers(running OpenBSD). However, our compute nodes running Linux kernel are benefiting about 80 to near 100% boost for specific scientific workloads. So, we run our INTERNAL NETWORK ONLY compute nodes with HT on. In places where security is not primary concern, why not make use of HT for extra efficiency?

Think and plan before you blanket disable HT on all servers running intel CPUs...

If an attacker is able to run any code on these private servers, I have bigger problems to deal with than HT as attack vector..

  • If you fully trust the software you're running, I see no reason to disable HT. At this point, I don't think I'd have it running on anything publicly facing, though. That said, I still have it enabled on my work PC & home PC.

  • Yes I think it is a more significant problem for Multitenant cloud providers..

  • Agree. A personal computer could probably even risk it as long as they don't run untrusted javascript (which they shouldn't do anyways, or only under sandboxed/careful conditions).

Does that mean hyperthreading is effectively unpatchably insecure?

Cloud Providers are gonna have a bad time if this is true.

  • You can give both hyperthreads in a physical core to the same tenant, no?

    Scheduling different VMs to run on the same hyperthreaded core at once seems like it can't be good for either VM's performance, even if there were no security concerns. Hyperthreading is much more useful for running multiple threads of the same app, accessing similar instruction caches etc.

    (There's also a question of safety within the VM, but a huge number of cloud users are running effectively one user within their VM.)

    • Yes, you can isolate hyperthread siblings to the same VM but you also need to ensure no host code (userspace or kernel) runs on that core, or the untrusted guest may be able to read values stored in L1 by that code. This is harder to do and likely would result in large performance drops for some workloads (because you are essentially disabling the advantage of locality for data that needs to be accessed from both guest and host environment).

    • Other things require sandboxed multitenancy than just full-on VMs. Database queries against a "scale-free" database like BigQuery/Dynamo, for example, where two queries from different tenants might actually be touching the same data, with much the same operations, and therefore you'd (naively) want to schedule them onto the same CPU for cache-locality benefits.

      1 reply →

    • The latter question is very important indeed. If you for instance render websites in your vm they, if i understand correctly, can potentially read secrets from other processes, like db credentials and other stuff...

      If the only real solution is to turn off HT/SMT that, seen positively, should net us a lot faster VMs then...

      4 replies →

  • Probably in most cases.

    Vmware have disclaimers in the mitigation options that preclude turning off HT, meaning, use at your own risk. [1]

    I am still waiting on a comment from Linode [2]

    Openstack have some knobs you can adjust, but it really depends on your workloads and what risk you are willing to accept. [3]

    AWS have their own custom hypervisor and are said to have worked around the issue. [4] Amazon had info on this before others. It appears they have a special relationship with Intel?

    I have not found any hardware or OS vendors that are willing to say that you can leave HT enabled. It is a very heated topic because folks will have to increase their VM infrastructure anywhere from 5% to 50% depending on their workload profiles. For public clouds, you can't predict workload profiles.

    Edit: Oops I left out the main site for L1TF [5]

    [1] - https://kb.vmware.com/s/article/55806

    [2] - https://blog.linode.com/2018/08/16/intels-l1tf-cpu-vulnerabi...

    [3] - https://access.redhat.com/articles/3569281

    [4] - https://aws.amazon.com/security/security-bulletins/AWS-2018-...

    [5] - https://foreshadowattack.eu/

  • Is it a feasible solution to enable hyperthreading only for threads or forks of the same process? Then they can use this ability, but other processes cannot do timing attacks on this process in this core... I think

    • >Is it a feasible solution to enable hyperthreading only for threads or forks of the same process?

      how does that work on unix systems when processes are all forked from 1 process? even if you get past that issue, how do you prevent less privileged processes that use other security mechanisms (cgroups, pledge, selinux, croot, sandboxing)?

    • You could allow processes that have ptrace rights on each other to run simultaneously which would cover most issues, but you’d still run into trouble with JavaScript engines running untrusted code.

  • Thinking about this, they're probably gonna introduce "insecure but cheap" instances for customers that don't mind the chance of data leaks and takeovers...

    • Which is going to be everyone except customers who already have issues with cloud and need special instances because of regulations. Then we'll wee the occasional "30.000 credit cards stolen" hack every three years because of this issue and that'll be it.

      It's another situation like what happened with WEP WiFi encryption ten years ago.

    • That's always been a fundamental part of the proposition of multi-tenant VM hosting, though.

  • Question: If I rent 4 core AWS instance, does it mean 4 physical cores or 4 hyper threaded cores? Is there a standard to this definition of “cores” across GCP, DO, Linode, etc. I don’t have the experience or knowledge about cloud computing but just have a DO instance running a web server. I’m curious.

    • A cloud "vCPU" is a hyperthread and in good providers (EC2/GCE) they are properly pinned to the hardware such that, for example, a 4-vCPU VM would be placed on two dedicated physical cores. This was probably done for performance originally but now it also has security benefits. You can get hints of this by running lstopo on VMs and similar bare metal servers.

      On second and third tier cloud providers, the vCPUs tend to be dynamically scheduled so that they may share cores with other VMs.

      1 reply →

  • Hyperthreading is fine on its own, but yes in combination with other CPU features it is effectively impossible to secure.

    Turn it off or sell your Intel chips.

> SMT is fundamentally broken because it shares resources between the two cpu instances and those shared resources lack security differentiators.

I thought the root of one of the Foreshadow problems was that caches are shared across cores, and therefore even with hyperthreading disabled, you still gain information about a process on another core. Am I misinterpreting it?

It does seem like the paranoid thing to do is that each socket gets to be used by only a single user. (I half-jokingly suggested at work that we replace our internal cloud with a Beowulf cluster of Raspberry Pis...)

It also seems like you could design OSes in a way which is more robust to this, e.g., certain cores are only for the kernel and processes running as root, and system calls are inter-processor interrupts, so privileged kernel (or userspace root) data doesn't go into untrusted caches at all.

  • Foreshadow is caused by the L1 cache which is not shared across cores. It may be only a matter of time before L3 attacks are discovered but I don't know of any today.

    • Oh - I forgot the L1 cache isn't shared across cores. That makes sense, thanks.

  • There are cache partitioning implementations to isolate cores from each other, but mainly to prevent noisy neighbors from bumping you out of the higher level caches.

    https://danluu.com/intel-cat/

    Cache timimg attacks are the old hat in the Timing side channel business, the newer attacks are cooler because the memory maps are not checked and you can determine the caching status of memory not mapped into your processes address space. (AFAIK)

    • It looks like CAT only does allocation of the last level cache (ie, L3). Despite the literature claiming this could prevent timing attacks, but I don't see how it could. Isn't there enough difference in speed between L3 and L1 that one should be able to extract timing information?

Scary looking headline on a discussion forum with instructions to perform a task that the average user would not really understand, with no explanation of attack vector or even consequences for any user who doesn't want to take the time (and energy, frankly at this point) to follow security news.

I'm pretty close to not caring anymore. I hope somebody figures out how to at least fix the security news infrastructure, if fixing security is still a ways off.

EDIT: Scratch that, I assume attack vector is a browser since they mentioned JavaScript.

  • This is not security news at all. This is Theo de Raadt's personal E-mail sent to the OpenBSD development mailing list for system developers. It is never intended for the consumption by the general public.

    • Hah, okay. I'm not familiar with OpenBSD so I didn't know who Theo was. Well, that would be good to know, let's say, on the HN headline.

      Thanks!

Maybe this is what finally gets me to upgrade from my ~2012 i7-3770. Not because of performance improvements, but to avoid performance degradation from all these security patches...

  • I'm not so sure I'd bother, more of these attacks keep coming out, odds are you'll just buy a CPU that'll be vulnerable to the next one. Maybe AMD would save you from some.

  • I'm in this exact scenario. I am thinking I might just go with AMD this time around, even if it is mostly an illusory short-term strength over Intel. In the long run I will undoubtedly have to refresh my hardware as new exploits come out, but at least I can take solace in the fact that I'm only worried about a single machine and not a datacenter.

  • I feel the same, maybe I will switch to AMD next time I upgrade. On the other end, I am happy with my budget dedicated server: its Intel Atom N2800 is so "rustic" that it did not get affected by any of Intel vulns (yet).

FTA

>>> We are having to do research by reading other operating systems.

Si Intel cooperates with business partners like apple/windows and not with open source. Does it mean that Apple and Windows can claim to be more secure because they have access to the information needed to fix Intel's issues ?

  • A lot of these bugs seems to be found by the Google teams and since they are heavy Linux users, I'm sure in many cases they have a solution for Linux before Apple or Microsoft does.

  • > Does it mean that Apple and Windows can claim to be more secure because they have access to the information needed to fix Intel's issues ?

    They can claim it, but I would trust a Linux or FreeBSD box over MacOS or Windows anytime even if they get some security info before the open source operating systems.

  • Intel cooperates with organizations that obey embargoes and don't badmouth their partners in public, like Red Hat, Canonical, and probably the Linux Foundation. Intel does not cooperate with OpenBSD.

    • > badmouth

      a.k.a. truth

      Linus has "badmouthed" Intel in much harsher and more explicit terms. Linux is just too big for them to get away with trying to smear and slander so they ignore him and move on.

There are some comments here talking about the possibility of getting better performance without HT. Here's an article from a test (on Intel only) of that theory: https://www.phoronix.com/scan.php?page=article&item=intel-ht...

In the end, the conclusion is: "Long story short, Hyper Threading is still very much relevant in 2018 with current-generation Intel CPUs. In the threaded workloads that could scale past a few threads, HT/SMT on this Core i7 8700K processor yielded about a 30% performance improvement in many of these real-world test cases."

Will be interesting to see what Apple does about this in their next software update. I can’t imagine many people will be happy if the next software update forcibly disables hyper threading.

(For those who aren’t familiar with Apple devices, Apple don’t expose settings like this to a user, which are usually available in the BIOS on a PC)

  • Will the loss of HT have apparent consequences on most Intel-based Apple hardware? Very few of them are servers under constant multithreaded load, throughput-oriented. I suppose almost all Macbooks and most Mac Pros will not visibly slow down.

    • I would expect disabling HT on dual-core systems to have a noticeable performance impact on the desktop.

It's not clear that he's using "SMT" to refer to AMD specifically as he goes on to talk about "Intel CPUs" and disabling it in "Intel BIOS". Does the Zen architecture have the same issue?

[ marc.info not responding for me, found post linked elsewhere. Edited due to markdown.

URL: http://openbsd-archive.7691.n7.nabble.com/Disable-SMT-Hypert...

Here it is: ]

---

Title: Disable SMT/Hyperthreading in all Intel BIOSes

Posted by Theo de Raadt-2 on Aug 23, 2018; 11:35am

Two recently disclosed hardware bugs affected Intel cpus:

- TLBleed

- T1TF (the name "Foreshadow" refers to 1 of 3 aspects of this bug, more aspects are surely on the way)

Solving these bugs requires new cpu microcode, a coding workaround, AND the disabling of SMT / Hyperthreading.

SMT is fundamentally broken because it shares resources between the two cpu instances and those shared resources lack security differentiators. Some of these side channel attacks aren't trivial, but we can expect most of them to eventually work and leak kernel or cross-VM memory in common usage circumstances, even such as javascript directly in a browser.

There will be more hardware bugs and artifacts disclosed. Due to the way SMT interacts with speculative execution on Intel cpus, I expect SMT to exacerbate most of the future problems.

A few months back, I urged people to disable hyperthreading on all Intel cpus. I need to repeat that:

DISABLE HYPERTHREADING ON ALL YOUR INTEL MACHINES IN THE BIOS.

Also, update your BIOS firmware, if you can.

OpenBSD -current (and therefore 6.4) will not use hyperthreading if it is enabled, and will update the cpu microcode if possible.

But what about 6.2 and 6.3?

The situation is very complex, continually evolving, and is taking too much manpower away from other tasks. Furthermore, Intel isn't telling us what is coming next, and are doing a terrible job by not publically documenting what operating systems must do to resolve the problems. We are having to do research by reading other operating systems. There is no time left to backport the changes -- we will not be issuing a complete set of errata and syspatches against 6.2 and 6.3 because it is turning into a distraction.

Rather than working on every required patch for 6.2/6.3, we will re-focus manpower and make sure 6.4 contains the best solutions possible.

So please try take responsibility for your own machines: Disable SMT in the BIOS menu, and upgrade your BIOS if you can.

I'm going to spend my money at a more trustworthy vendor in the future.

Does anyone know the best way to disable hyperthreading on Linux?

  • Okay, I found it. A SMT knob was added alongside in the L1TF fixes.

        /sys/devices/system/cpu/smt
    
        /sys/devices/system/cpu/smt/active
    
        /sys/devices/system/cpu/smt/control
    
        active:  Tells whether SMT is active (enabled and siblings online)
        control: Read/write interface to control SMT. Possible
    
        values:
    
        "on"		SMT is enabled
        "off"		SMT is disabled
        "forceoff"   	SMT is force disabled. Cannot be changed.
        "notsupported"	SMT is not supported by the CPU
    
        If control status is "forceoff" or "notsupported" writes are rejected.

I would love to but how in the world do I do this when I'm using Windows and Lenovo doesn't give me an option in the BIOS?

  • One way to achieve something similar would be via a software tool which would set the process affinity to only run on real cores.

    Or you could only run Chrome (untrusted JavaScript) on core 2 and 3, and run the app that has your secrets on core 0 and 1. (It is my understanding that 2k cores are real, and 2k+1 is their matching, "virtual" core) This way you get both hyperthreading and security. I'm not a security expert though.

    https://bitsum.com/docs/pl/Using%20the%20GUI/using_the_gui.h...

    • I'm not sure it would be that easy since I believe e.g. I/O can go through the System process (or other processes even), which has full affinity. We'd likely have to set thread affinities for all processes/threads. But then it would clash with manually-set affinities, and I'm also not sure if it would have worse performance than actually disabling hyper-threading or not.

      Right now I'm looking at what making a UEFI application to disable HT before boot might involve... not sure if that's too late in the boot process or not.

    • > It is my understanding that 2k cores are real, and 2k+1 is their matching, "virtual" core)

      I'm not sure that's true. For example, on a i7-4770 I get:

        $ cat /sys/devices/system/cpu/cpu[0-3]/topology/thread_siblings_list
        0,4
        1,5
        2,6
        3,7
      

      (Of course, that might just be Linux renumbering them)

So, realistically, how much performance did your average OpenBSD server just lose from following this mitigation?

  • Performance is not a top priority for OpenBSD.

    If you read https://www.openbsd.org/goals.html, the word "performance" does not appear.

    • I think this may be missing the point of the grandparent comment; rather than interpreting it as an accusation of OpenBSD sabotaging its users' performance, I think we're all just curious at the relative importance of hyperthreading for real-world workloads, on any OS, in grim anticipation of the potential worst-case scenario where hyperthreading's security woes continue to worsen and worsen.

  • I don't think you can answer that with a single question. It depends on how CPU-intensive the work that server was handling was.

    I think, though, that if you'd be particularly willing to knowingly allow these kinds of vulnerabilities in exchange for some performance, OpenBSD probably isn't a good fit for you in the first place.

    • > I think, though, that if you'd be particularly willing to knowingly allow these kinds of vulnerabilities in exchange for some performance, OpenBSD probably isn't a good fit for you in the first place.

      I disagree. You may have consciously picked OpenBSD because you believe that security is critical for your business. But if you're renting a server (shared or otherwise) to handle your website and paid for X number of cores, RAM, etc., you establish a baseline for what kind of performance you get out of that setup. If that performance suddenly nosedives 20% overnight because the new mitigation patches turned off hyperthreading, the rig you paid for may have gone from sustainably handling your workload to buckling, causing service degradation, outages, etc. I imagine it could be a real problem. It's not so much "Oh, we can't handle that performance hit, we'll run without it" so much as wanting to know the extent of the damage before they take the plunge.

Does anyone know if disabling SMT has had an effect on vmd(8) performance in -current?

THEO DE SMAASH

I'm curious why these problems with HT didn't get highlighted earlier?

Can I ask, Is Intel going to sue us if compare benchmarks with SMT to no SMT? Like they mentioned about the microcode benchmarks. Intel is in trouble.

I've long felt that there's something less than half-baked about the multi CPU architecture we're currently using. The hacky contortions HFT coders have come up with to avoid things like False Sharing strike me as a big red flag.

https://mechanical-sympathy.blogspot.com/2011/07/false-shari...

How about an architecture more like Erlang's, where you have independent processes with their own CPU core, where each has their own memory, but where you have much faster communications supported at lower hardware levels? Why not have a multi-processor architecture designed for direct support of Hoare CSP-inspired languages?

Hypercube topology: http://web.eecs.umich.edu/~qstout/pap/IEEEM86.pdf

  • Something like this?

    Parallelism is inherent in most problems but due to current programming models and architectures which have evolved from a sequential paradigm, the parallelism exploited is restricted. We believe that the most efficient parallel execution is achieved when applications are represented as graphs of operations and data, which can then be mapped for execution on a modular and scalable processing-in-memory architecture. In this paper, we present PHOENIX, a general-purpose architecture composed of many Processing Elements (PEs) with memory storage and efficient computational logic units interconnected with a mesh network-on-chip. A preliminary design of PHOENIX shows it is possible to include 10,000 PEs with a storage capacity of 0.6GByte on a 1.5cm2 chip using 14nm technology. PHOENIX may achieve 6TFLOPS with a power consumption of up to 42W, which results in a peak energy efficiency of at least 143GFLOPS/W. A simple estimate shows that for a 4K FFT, PHOENIX achieves 117GFLOPS/W which is more than double of what is achieved by state-of-the-art systems.

    https://memsys.io/wp-content/uploads/2017/12/20171003-Memsys...

  • Something like this:

    1) https://www.sciencedirect.com/science/article/pii/S014193311...

    (PDF: https://science.raphael.poss.name/pub/poss.13.micpro.pdf )

    "The Apple-CORE project has co-designed a general machine model and concurrency control interface with dedicated hardware support for concurrency management across multiple cores. Its SVP interface combines dataflow synchronisation with imperative programming, towards the efficient use of parallelism in general-purpose workloads. Its implementation in hardware provides logic able to coordinate single-issue, in-order multi-threaded RISC cores into computation clusters on chip, called Microgrids. In contrast with the traditional “accelerator” approach, Microgrids are components in distributed systems on chip that consider both clusters of small cores and optional, larger sequential cores as system services shared between applications.

    2) https://ieeexplore.ieee.org/document/7300441/ (PDF: https://science.raphael.poss.name/pub/poss.15.tpds.pdf )

    "This article advocates the use of new architectural features commonly found in many-cores to replace the machine model underlying Unix-like operating systems. "

  • That's like the Cell processor. For every year that you program for Cell you need at least two years of therapy.

    • If you're going to change the substrate or paradigm, then you need to do a dynamite job of supporting your users. Sony did not do that.

  • No networking can touch silicon-level interconnect between cores or within cores on a single chip, at least for latency. Erlang's model of computation doesn't have much to say about physical implementation, and multi-socket/distributed systems are not perfomant for latency-critical user applications. For servers and high performance computing sure, I guess in theory we could use tons of simple single-core chips, but fabrication costs and energy efficiency would be significantly worsened.

    • No networking can touch silicon-level interconnect between cores or within cores on a single chip

      So how about silicon-level interconnect that looks like networking? As it is now, it seems almost designed to elicit badly non-optimal code.

      multi-socket/distributed systems are not perfomant for latency-critical user applications...fabrication costs and energy efficiency would be significantly worsened.

      I think there would be tremendous benefits if we started designing multi-socket/distributed system that could perform in those situations. For one thing, Intel has currently painted itself into a corner with regards to large wafer yields, and AMD is kicking their butts by combining smaller dies.

      https://www.youtube.com/watch?v=ucMQermB9wQ