Intel Skylake/Kaby Lake processors: broken hyper-threading

9 years ago (lists.debian.org)

The problem description is short and scary:

Problem: Under complex micro-architectural conditions, short loops of less than 64 instructions that use AH, BH, CH or DH registers as well as their corresponding wider register (e.g. RAX, EAX or AX for AH) may cause unpredictable system behavior. This can only happen when both logical processors on the same physical processor are active.

I wonder how many users have experienced intermittent crashes etc. and just nonchalantly attributed it to something else like "buggy software" or even "cosmic ray", when it was actually a defect in the hardware. Or more importantly, how many engineers at Intel, working on these processors, saw this happen a few times and did the same.

More interestingly, I would love to read an actual detailed analysis of the problem. Was it a software-like bug in microcode e.g. neglecting some edge-case, or a hardware-level race condition related to marginal timing (that could be worked around by e.g. delaying one operation by a cycle or two)? It reminds me of bugs like http://danluu.com/cpu-bugs/ suggests to me that CPU manufacturers should do more regression testing, and far more of it. I would recommend demoscene productions, cracktros, and even certain malware, since they tend to exercise the hardware in ways that more "mainstream" software wouldn't come close to. ;-)

(To those wondering about ARM and other "simpler" SoCs in embedded systems etc.: They have just as much if not more hardware bugs than PCs. We don't hear about them often, since they are usually worked around in the software which is usually customised exactly for the application and doesn't change much.)

  • A few past lives ago, I used to work on the AIX kernel at IBM. I once spent a few weeks poring through trace data trying to investigate a very mysterious cache-aligned memory corruption induced by a memory stress test. Our trace data was quite comprehensive, and is always turned on due to its very low overhead. It was concerning enough (and took me long enough) that it eventually sucked in the rest of my team to aid in the investigation. None of these other guys were noobs- a couple of them had (at the time) built over 20 years of experience in this system, and in diagnosing similar memory corruption bugs beyond any doubt (many were due to errant DMAs from device drivers). I had too, though for much less than 20 years.

    After several full days of team-wide debugging, we had no better explanation based on the available evidence than cosmic rays, or a hardware bug. IBM's POWER processor designers worked across the street from us, so we tried to get them to help- first by asking nicely, then by escalating through management channels.

    Their reply was more or less: we've run our gamut of hardware tests for years, and your assertion that it's hardware related is vanishingly unlikely... we don't look into hardware bugs unless you can prove to us beyond a doubt it's hardware related. Cache-aligned memory corruption without any other circumstantial evidence isn't enough.

    On a crashed test system sitting in the kernel debugger for several weeks now, there would be no more circumstantial evidence beyond the traces. A corruption like this was never seen again, by all accounts.

    If we were right and it was evidence of a hardware failure, this is one way such a problem can go undetected. I hope it was something else, or even a cosmic ray, but we'll never know for sure, I guess.

    • I understand that someone at Microsoft Research once found a bug in the XB360's memory subsystem by model checking a TLA+ spec of it. The story goes that IBM initially refused to believe the bug report. A few weeks later they admitted that such a bug did indeed exist, had been missed by all their testing and would have resulted in system crashes after about 4 hours use.

    • Did you by chance see the paper a year back or so outlining that memory errors are more likely to occur near page boundaries? The author's premise was that a lot of 'cosmic rays' are just manufacturing flaws.

      10 replies →

  • > short loops of less than 64 instructions that use AH, BH, CH or DH registers as well as their corresponding wider register (e.g. RAX, EAX or AX for AH)

    This is yet another of the many places where the complexity of the x86 ISA shows up and makes its hardware implementations more complicated: the x86 ISA has instructions which can modify the second-lowest byte of a register, while keeping the rest of the rest of the register unmodified (but AFAIK no instructions which do the same for the third-lowest byte, showing its lack of orthogonality).

    For in-order implementations, like the ones which originated the x86 ISA, it's not much of a problem. But for out-of-order implementations, which do register renaming, partial register updates are harder to implement, since the output value depends on both the output of the instruction and the previous value of the register. The simplest implementation would be to make a instruction depending on the new value wait until it's committed to the physical register file or equivalent, and that's probably how it was done for these instructions for these partial registers before Skylake.

    For Skylake, they probably optimized partial register writes to these four partial "high" registers (AH, BH, CH, DH), but the optimization was buggy in some hard-to-hit corner case. That corner case probably can only be reached when some part of the out-of-order pipeline is completely full, which is why it needs a short loop (so the decoder is not the bottleneck, AFAIK there's a small u-op cache after the decoder) and two threads in the same core (one thread is probably not enough to use up all the resources of a single core). The microcode fix is probably "just" flipping a few bits to disable that optimization.

    And this shows how a ISA is more than just the decoding stage; design decisions can affect every part of the core. In this case, if your ISA does not have partial register updates (usually by always zero-extending or sign-extending when writing to only part of a register, instead of preserving the non-overwritten parts of the previous value), you won't have the extra complexity which led to this bug. AMD partially avoided this when doing the 64-bit extension (a partial write to the lower 32 bits of a register clears the upper 32 bits), but they kept the legacy behavior for writes to the lower 16 bits, or to either of the 8-bit halves of the lower 16 bits.

    • The loop needs to be short because the loopback buffer is only active in loops of 64 or fewer entries (usually fewer real instructions, something like 40 or so). Moreover, Skylake introduced one loopback buffer per thread, instead of the previous loopback buffer shared between both threads.

      My guess is that is where the bug is; the behavior for partial register access stalls---insert one extraneous uop to combine, e.g., ah with rax---is unchanged since Sandy Bridge.

      1 reply →

    • It's also a problem with SMT[1]. The design cost is pretty small, it's a fairly straightforward extension of what an out of order CPU is already doing. But due to the concurrency issues debugging/verifying it is incredibly difficult.

      [1]Simultanious MultiThreading, which is marketed by Intel under the name Hyperthreading when using two threads.

    • This is an amazing analysis, and seems entirely likely to be right to me. Thanks for writing it up.

    • You really don't know what you're talking about.

      ---

           For Skylake, they probably optimized partial register 
           writes to these four partial "high" registers (AH, BH, 
           CH, DH), but the optimization was buggy in some hard-to-
           hit corner case.
      

      They did not do this.

      The high registers (AH/BH/DH/CH) are nearly written out of existence with the REX Prefix in 64bit mode. Within the manual(s) it is called out effectively not to use them as they're now emulated and not support directly in hardware.

      The 16bit registers (AX/BX/DX/CX) are in worse situation, but it ends up costs additional cycles to even decode these instructions as the main encoder can't handle these instructions and you have to swap to the legacy encoder, and you'll end up losing alignment. This costs ~4-6 cycles, also the perf registers to track were only added in Haswell (and require Ring0 to use [2]).

      High Register and 16bit registers are huge wart that it seems Intel is trying desperately hard to get us to stop using.

          That corner case probably can only be reached when some
          part of the out-of-order pipeline is completely full,
          which is why it needs a short loop (so the decoder is not
          the bottleneck, AFAIK there's a small u-op cache after the decoder)
      

      There is a 64uOP cache between the decoder and L1i cache that is called loop stream detector. Normally this exists to do batched writes to the L1i cache.

      But in _some_ scenarios when a loop can fit completely within this cache it'll be given extremely priority. This is a way to max out the 5uOP per cycle Intel gives you [1]. It'll flush its register file to L1 cache piece meal as it continues to predict further and further and further ahead speculatively executing EVERYPART OF IT in parallel. [3]

      In short this scenario is extremely rare. uOPs have stupidly weird alignment rules. Which you can boil down to:

          Intel x64 Processor are effectively 16byte VLIW RISC processors
          that can pretend to be 1-15byte AMD64 CISC processors at a minor performance
          cost. 
      

      ---

      The real issue here is when Loop Stream mode ends it is properly reloading the register file, and OoO state.

      This is likely just a small micro-code fix. The 8low/8high/16bit/32bit/64bit weirdness is likely somebody wasn't doing alignment checks when flushing the register file.

      ---

      [1] On Skylake/KabyLake. IvyBridge, SandyBridge, Haswell, and Boardwell limited this to 4.

      [2] Volume 3 performance counting registers I think we're up to 12 now on Boardwell.

      [3] Volume 3 Chapter 3.4.1.7 (Page 107)

      5 replies →

  • CPU manufacturers do do huge amounts of testing, and Intel does formal verification of some functional units. The reliability is far better than most software, in part because making a new release costs billions.

    • In my limited experience, their root cause analyses are really impressive as well with lots of internal attention and resources. I'm not allowed to talk about any Intel issues, but we reported a very strange issue to Nvidia, sent a couple of dozen cards back and six months later got a truly fascinating report back we with hundreds of pages of compute test result tables and electron microscope images and chemistry lab reports. Anything that hints of a manufacturing problem is taken incredibly seriously.

      7 replies →

    • That's absolutely true. When it comes to CPU/memory, skilled software engineers always think, "it must be my bug, it always is".

      So in that super rare case of actually running into a CPU defect, it's a mindfuck, it'll drive you crazy. You'll be looking for the flaw in your algorithm which makes it fail once a week under production load. But you just can't find it, it makes no sense ...

      (When it comes to drivers for network/storage/graphics etc devices, it's a whole different story. Those things are piles of bugs that need work-arounds in drivers.)

      9 replies →

    • It may be far better than average software quality but far more also relies on it. The question is whether the quality is adequate in light of what is at stake.

      10 replies →

    • IIRC wasn't there some blog posts about how intel has cut their verification/QA drastically since 2010? is this the result?

      1 reply →

    • This seems like precisely the sort of thing that a competent manufacturer should rule out formally. Formal verification of individual FUs isn't exactly ambitious...

      I think we're getting to levels of complexity where the process Intel uses, with lots of different QA and testing teams doing their best to look for bugs, just isn't going to cut it. We need formally verified models transformed step-by-verified-step all the way down to the silicon. It's already feasible, with free tools, to formally verify your high-level model (using e.g. LiquidHaskell) and then transform this to RTL (using e.g. Clash). With Intel's QA/testing budget, it's well within reach to A) verify the transformation steps and B) figure out how to close the performance gap between machine-generated (but maybe slower) and hand-rolled (faster, but evidently wrong) silicon.

      33 replies →

  • > I wonder how many users have experienced intermittent crashes

    I wonder if it's exploitable ;) Maybe that's why they never release the details of these CPU bugs.

    > Was it a software-like bug in microcode e.g. neglecting some edge-case, or a hardware-level race condition related to marginal timing

    Not sure about microcode, these x86 cores execute many simple operations natively, by means of dedicated circuits. Microcode is only involved in emulation of complex x86 instructions.

    And hardware problem doesn't have to be marginal timing. It could simply be a logic bug, i.e. the circuit operates as designed but it was designed to do something else than it should be doing in some unforeseen circumstances.

  • I feel like a lot of the processors Intel has released recently that have had problems like this. Intel's Bay Trail processors like the Celeron J1900 have a huge problem around power state management (https://bugzilla.kernel.org/show_bug.cgi?id=109051) that's unlikely to ever get resolved and makes those processors almost unusable under a lot of conditions (random hard hangs on systems without watch dog timers really kind of sucks). I wonder if Intel has been more lax recently with how the systems get tested?

  • I've no knowledge about IC design, but it sounds to me that even the biggest name in CPU industry doesn't (or do they ever) do formal verification? Is the process like when I'm writing some mediocre code and say to myself: "hmm, it probably works", and throw the bunch into the version control (whereas they throw it to the wafer fab)?

  • > * I would recommend demoscene productions, cracktros, and even certain malware*

    Modern PC demoscene productions don't really do very funky things CPU-wise anymore. Most run mostly in shaders, actually. Amiga and C64 is a different story, but Intel isn't making that many Amiga CPUs :-)

  The issue was being investigated by the OCaml community since
  2017-01-06, with reports of malfunctions going at least as far back as
  Q2 2016.  It was narrowed down to Skylake with hyper-threading, which is
  a strong indicative of a processor defect.  Intel was contacted about
  it, but did not provide further feedback as far as we know.
 
  Fast-forward a few months, and Mark Shinwell noticed the mention of a
  possible fix for a microcode defect with unknown hit-ratio in the
  intel-microcode package changelog.  He matched it to the issues the
  OCaml community were observing, verified that the microcode fix indeed
  solved the OCaml issue, and contacted the Debian maintainer about it.
 
  Apparently, Intel had indeed found the issue, *documented it* (see
  below) and *fixed it*.  There was no direct feedback to the OCaml
  people, so they only found about it later.

Inexcusable.

  • They forgot to follow up on a support ticket. As your quotation mentions, the issue was documented and fixed. Calling that "inexcusable" is a bit strong, don't you think?

    I'm not a particularly big fan of Intel's practices, but the reactions in this thread seem a bit too strong to me.

    • It wouldn't be a big deal if this was just software, but the fact that Intel allowed a PROCESSOR bug to be reported, tested, and fixed without telling anyone that the bug actually exists is honestly horrible. You can't just let CPU bugs under the run since it can throw the stability and reliability of the entire system into question. The people that reported it shouldn't have to dig through Intel microcode updates and test different fixes to see if the bug they found was fixed. Hardware manufacturers (especially processor manufacturers) need to be held to a higher standard when it comes to bug reporting and this kind of behavior really has no excuse.

      4 replies →

  • What exactly do you find inexcusable here?

    • I don't mind Intel keeping very quiet about fixed-in-microcode bugs that don't directly affect valid userspace programs, like

      « Instruction Fetch May Cause Machine Check if Page Size and Memory Type Was Changed Without Invalidation »

      or

      « Execution of VAESIMC or VAESKEYGENASSIST With An Illegal Value for VEX.vvvv May Produce a #NM Exception »

      but something like this should be announced clearly.

      (I keep my microcode packages up to date, but I don't normally bother rebooting when an update comes in.)

      1 reply →

    • The contempt for users. I know what I do when a user files a real bug: respond to them, acknowledge it's a problem, tell them when it's fixed.

      The fact that Intel does not do that with a bug of this magnitude shows how much respect they have for their users.

      2 replies →

  • We had been seeing this one on and off on some of our machines, and were already at least mentally pointing the finger to the LWT library. Turns out these machines were affected. That's one less worry.

The latest intel-microcode package from Ubuntu 16.04 does not fix the problem. I installed the same package from Ubuntu 17.10 [0] which fixes the problem. You can check your system with the script linked in the mailing list thread [1].

[0] https://packages.ubuntu.com/en/artful/amd64/intel-microcode/...

[1] https://lists.debian.org/debian-devel/2017/06/msg00309.html

  • Indeed, the latest Intel microcode published for Ubuntu 16.04 is the ancient 20151106 [1]. Later Ubuntu releases do have more recent microcode packages [2]. I cannot understand why they left out 16.04 there. So much for LTS, it seems.

    This recently came to my attention while debugging some increasingly frequent lockups, which took me a solid week of eliminating all seemingly more likely causes (VirtualBox, nVidia driver, faulty RAM, etc). In the end I found the culprit while digging into the Intel Specification updates: my Core i7-5820K (and most other Haswell-E and Broadwell processors) has a bug when leaving package C-states, and the only workaround is to disable C-states above level 1. Timely updated microcode, which applies this workaround, would have saved me my week.

    [1] https://launchpad.net/ubuntu/xenial/+source/intel-microcode [2] https://launchpad.net/ubuntu/+source/intel-microcode/+change...

    • > Indeed, the latest Intel microcode published for Ubuntu 16.04 is the ancient 20151106.

      By ancient, perhaps you mean the version that was current at the time 16.04 shipped?

      > I cannot understand why they left out 16.04 there. So much for LTS, it seems.

      See https://wiki.ubuntu.com/StableReleaseUpdates. The point of an LTS (or any stable release, for that matter) is that it doesn't change by default. For those who want to keep everything up-to-date, Ubuntu ships a new release on a six month cadence. If you choose not to use that, then you shouldn't be surprised when things aren't updated, since that's exactly what you opted in to.

      The microcode package may warrant an exception, however, and we have a bug to track that. It's tricky because without the source we cannot pick apart what changed, or determine whether any changes meet our update policy. We have to be careful. Sooner or later some user will inevitably come along to tell us that a microcode update broke things, and ask why we didn't fulfill our LTS promise by not changing it.

      8 replies →

  • How do you check if the issue is actually fixed after installing the microcode fix?

Here's how to fix it on a Thinkpad on Linux. I've got a T460s and checked with the script[1] that it was indeed affected. The Debian instructions said to update your BIOS before updating the microcode package so I went to the model support page[2] to the BIOS/UEFI section and downloaded the "BIOS Update (Bootable CD)" one. The changelog included microcode updates so it looked promising[3]. To get the ISO onto a usb drive I did the following:

  $ geteltorito n1cur14w.iso > eltorito-bios.iso # provided by the genisoimage package on Ubuntu
  $ sudo dd if=eltorito-bios.iso of=/dev/sdXXX # replace with your usb drive with care to not write over your disk

I then had a bootable USB drive that I ran by rebooting the computer, pressing Enter and then F12 to get to the boot drive selection and selecting the USB. From then it's just following the options it gives you. It's basically pressing 2 to go into the update and then pressing Y and Enter a few times to tell it you really want to do it. After that just let it reboot a few times and the update is done. After booting again the same test script[1] now said I had an affected CPU but new enough microcode.

[1] https://lists.debian.org/debian-user/2017/06/msg01011.html

[2] http://pcsupport.lenovo.com/pt/en/products/laptops-and-netbo...

[3] https://download.lenovo.com/pccbbs/mobiles/n1cur14w.txt

There's a perl script on the debian mailing list that digs a bit deeper and tells you if you're affected in the first place, if you're affected but patched already, affected but have HT disabled, etc.

https://lists.debian.org/debian-user/2017/06/msg01011.html

  • I ported this to bash since I have a chromebook w/o perl and (as for right now) the fs is read-only, so I just piped the script to it and sure enough my brand-new Samsung Chromebook Pro appears to be vulnerable, though apparently patchable.

    Details and if you want the script I link to it from here: https://forum.xda-developers.com/hardware-hacking/chromebook... - don't judge my shitty bash skills.

    ft

In my experience with parallel code written in Haskell, hyper-threading offers only a very mild speedup, perhaps 10%. It is essentially an illusion, a logical convenience. (How long does it take to complete a parallel task on a dedicated machine? Four cores with hyper-threading off has nearly the performance of eight virtual cores with hyper-threading on.)

Many people have neither the interest nor the hardware access to overclock, and these processors have less overclocking headroom than earlier designs. Nevertheless, the hyper-threading hardware itself generates heat, restricting the overclocking range for given cpu cooling hardware. In this case, turning off hyper-threading pays for itself, because one can then overclock further, overtaking any advantage to hyper-threading.

  • It depends on what resources your code uses on-chip. If all threads are contending on the same resources, then you won't see a speedup; if they're using different resources, hyperthreading can increase throughput significantly. I've seen hyperthreading give me the equivalent of 50% of another CPU, particularly when I'm running multiple CPU-bound processes concurrently (so they're not executing the same code at the same time in some kind of parallel operation, and certainly aren't bound on synchronization primitive overheads).

    • That makes sense. I'm a mathematician, and my experience is with pure computations, homogeneous across each (virtual) core.

  • >It is essentially an illusion, a logical convenience.

    Just checked numbers. That was my expectation as well until I came across code that experienced a bit over 80% speedup when HT was used.

  • It normally helps "some", and rarely hurts performance. So you might as well enable it.

    • I worry about hyperthreading hurting worst-case latency (since a thread might be assigned to run on a virtual core which does not work as fast as expected).

      1 reply →

It's painful to have to read text like « select Intel Pentium processor models ».

If Intel used marketing names that were more closely related to technical reality, then when something like this happens they wouldn't have so many customers finding themselves in the "maybe I'm affected by this horrid bug" box.

So will this be affecting most Macbook Pros of the past few years?

If so, there's a way to disable hyper-threading, but you need Xcode (Instruments).

Open Instruments. Go to Preferences. Choose 'CPU'. Uncheck "Hardware Multi-Threading". Rebooting will reset it.

  • This is kind of like cutting off your leg because of a hangnail. I've been running a Skylake MBP for more than 6 months for compilation workloads and haven't seen a single processor hang.

    I'm much more annoyed by the completely unpredictable desktop assignment on monitors when hotplugging DisplayPort connections on multiple displays. This one bothers me every day.

    • That's weird re monitor issues. I've been impressed with how consistent mine are.

      What I see:

      Same monitor/monitors plugged into same ports produce consistent configs.

      I get a unique config per monitor/port.

    • I agree. I've been fine myself. But if people feel the need to turn it off. ;)

  • From one of the 2016 MacBook Pros:

    > machdep.cpu.model: 78 > ... > machdep.cpu.stepping: 3 > ... > machdep.cpu.microcode_version: 174

    Can't find if 174 is the fixed version or not.

    So this is one of the models for which there exists a fix, as per the email.

Rule of thumb: On a desktop, if you have an i5 you do not have Hyperthreading. All i3s and i7s do have Hyperthreading, as do new Kaby Lake Pentiums (G4560, 4600, 4620).

On laptops, some i5s are not real quad cores but dual cores with Hyperthreading.

  • >Rule of thumb: On a desktop, if you have an i5 you do not have Hyperthreading. All i3s and i7s do have Hyperthreading, as do new Kaby Lake Pentiums (G4560, 4600, 4620).

    Hmm...either this statement is wrong or this desktop /proc/cpinfo is wrong:

        $ grep -E 'model|stepping|cpu cores' /proc/cpuinfo | sort -u
        cpu cores	    : 4
        model           : 94
        model name	    : Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz
        stepping	    : 3
        $ grep -q '^flags.*[[:space:]]ht[[:space:]]' /proc/cpuinfo && echo "Hyper-threading is supported"
        Hyper-threading is supported
    

    Intel's product spec page[1] lists this CPU as not supporting Hyper-Threading so I'm a bit puzzled as to why the ht flag is present.

    [1]https://ark.intel.com/products/88188/Intel-Core-i5-6600-Proc...

    • Hmmm, checking for "ht" seems to be giving weird info. On a i5-750 here (few years old), running Fedora 25:

          $ grep '^flags.*[[:space:]]ht[[:space:]]' /proc/cpuinfo
          flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush
          dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts
          rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2
          ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm tpr_shadow vnmi flexpriority ept vpid dtherm ida
      

      "ht" is being returned even though the CPU only has 4 cores and no hyperthreading:

      https://ark.intel.com/products/42915/Intel-Core-i5-750-Proce...

      dmidecode seems to give more accurate info for this:

          $ sudo dmidecode -t processor | grep 'Count:'
          	Core Count: 4
          	Thread Count: 4

      2 replies →

    • To quote the Intel Developer Instructions[1] on the HTT flag:

      >A value of 0 for HTT indicates there is only a single logical processor in the package and software should assume only a single APIC ID is reserved. A value of 1 for HTT indicates the value in CPUID.1.EBX[23:16] (the Maximum number of addressable IDs for logical processors in this package) is valid for the package.

      UPDATE: It appears these flags refer to each initial APIC ID, so it seems the HTT flag value should be 0 in all cases where the overall processor:thread ratio is 1, suggesting there might either be incorrect information in the CPUID instruction for some Intel CPUs or the kernel is not correctly evaluating CPUID.1.EBX[23:16].

      Hopefully, someone more versed in CPUs can correct me here.

      [1]https://www.intel.com/content/www/us/en/architecture-and-tec...

I would've expected at least an example assembly code reproducing the bug? How was it not discovered before, but only with the OCaml compiler? They say "unexpected behavior", does this mean that code compiled with this can give incorrect results? Can this have any security implication? How much code was compiled with similar patterns? Can the problem reproduced with any JIT compiler? We need to know what can cause this, maybe compiled and working code already contains such patterns waiting to be abused...

Charming. I picked up a 5th Gen X1 Carbon configured with a Kaby Lake processor, and apparently there's no way to disable hyperthreading in the BIOS, and according to Intel's errata, no fix available yet.

Oh well... so far the machine (running Windows 10) has been stable minus one or two random lockups in 2 months of heavy usage which could be attributed to this. Guess I wait...

That's a really nicely done announcement. Simple, to the point, no drama, all the info you could want, scripts to figure out your processor, etc.

Well done Debian folks!

So if I understand correctly, some affected processors can be fixed by a microcode update, but there are some which cannot be fixed at all?

Also the advisory seems to imply that the OCaml compiler uses gcc for code generation, which it does not -- it generates assembly directly, only using gcc as a front end to the linker.

  • > advisory seems to imply that the OCaml compiler uses gcc for code generation, which it does not -- it generates assembly directly

    Yes, but that assembly code contains calls into the OCaml runtime, for garbage collection etc. If I understand correctly, the particular loop affected by this bug was somewhere in this memory management code. That code is written in C and compiled with a C compiler.

  • What is the percentage of non fixable chips for Skylake and kabylake? Sometimes those early steppings are not widely distributed.

    • According to the mail, the systems "cannot be fixed" because they lack HyperThreading in the first place so there is no fix to apply.

So, serious question: If the microcode "fix" for this ends up disabling HT, how does one get a refund not just for the CPU but for the $3k laptop I spec'd around it? Without needing to sue?

This isn't a hypothetical; what did Intel do when the only fix for broken functionality was to disable TSX entirely?

  • I remember the pentium bug in the mid 90s, they actually shipped out replacement processors. Doubt that could be pulled off on laptops. Perhaps a microcode update can work around it.

    • Given my Surface Book got a 1/10 repair-ability score on ifixit... I dont think they'll just replace the chip :-P

  • The same for any part of your computer, I imagine.

    What happens when your laptop's display has frequently broken pixels?

    • > What happens when your laptop's display has frequently broken pixels?

      Well they wait for 6 or more before they say it's out of spec

If anyone on Windows wishes to update their CPU microcode without waiting for Microsoft to push it out via Windows Update, you can use this tool from VMware https://labs.vmware.com/flings/vmware-cpu-microcode-update-d... which can update microcode as well.

Windows stores its microcode in C:\Windows\System32\mcupdate_GenuineIntel.dll which is a proprietary binary file and you can't simply replace it with Intel's microcode.dat file (which is ASCII text), so you have to use a third-party tool such as VMware's one.

Simply: 1. Download and extract the zip file in the first paragraph 2. Modify the install.bat file so that the line which reads `for %%i IN (microcode.dat microcode_amd.bin microcode_amd_fam15h.bin) DO (` only contains the microcode.dat parameter (since you obviously don't have an AMD CPU, and the tool is made for both) 3. Download and extract microcode.dat from Intel's website (https://downloadcenter.intel.com/download/26798/Linux-Proces...) and place it into the same directory as the VMware tool 4. Run install.bat with admin privileges 5. Hit cancel when it tells you that the AMD microcode files are missing, and you're done

The CPU microcode will be updated immediately (yes, while Windows is running.) The service will also run on each boot and update your CPU microcode, since microcode updates are only temporary and are lost each time you restart. You can check Event Viewer for entries from `cpumcupdate` to see what it has done. It's advised to run a tool to view the microcode version before installing (such as HWiINFO64) so you can re-run the tool after installing and confirming that the version has changed.

I have done this and it works as described. I went from 0x74 to 0xba as shown by the μCU field in HWiNFO64, and I have an i7-6700k.

Has anyone benchmarked one of these machines before and after applying this microcode update? The options in microcode are rather limited and all are likely to have performance impacts. This is likely disabling functionally to avoid this case. I would hope the patch is smart enough to not apply if threading is not enabled, but who knows.

  • Would a performance hit go unnoticed?

    Usually (always?) it's not a ROM update, the encrypted microcode blob is loaded into the CPU by the OS on every boot via CONFIG_MICROCODE.

    some linkrot: http://imgur.com/a/z1uLv

Just got the 2017 no touchbar 13 macbook pro with the kaby lake i7. Should I be worried, can I even disable HT with mac. And presumably the update will be provided so the whole laptop is still ok?

I've been using the thunderbolt 3 dock with two external monitors and occasionally get a little glitch prolly loose cable I think.

I've downloaded the bitcoin blockchain, done quite a bit of work in pycharm + chrome, multiple projects, flow and webpack in the background and haven't had any sort of crashes tho.

Holy cow. Definitely feel like I dodged a bullet by building an AMD/Ryzen system this time around - which had it's own set of issues (but seem to be more or less ironed out now).

  • This is not a fair comment: Ryzen had a crash that can be triggered by compiling with GCC and a memory compatibility issue where it cannot run them at their nominal speed. Ryzen is a really young architecture, it already had like 6 stable patches of microcode and you can expect way more.

Does Windows have a patch for this too? Or just disabling HT is the safest option?

  • Windows does have a microcode update driver, as you would expect, so it can fix this.

    However, looking at the microcode update driver on an updated Windows 10 as of right now, I don't see a recent enough microcode version to fix it. The latest updates appear to be from 2015.

  • It's safe to assume that Windows distributes the microcode fixes as well.

    • For some reason, Debian says you need bios updates for certain machines, and Debian packaged microcode for others. I wouldn't be surprised if it is the same for Windows.

  • If they don't fix for Windows then they are killing 90% of their customers which they won't. So if it's not already out it will be soon. It is Linux which gets fixes late most of the time.

When intel had the floating point division hardware bug they recalled chips. https://en.wikipedia.org/wiki/Pentium_FDIV_bug

I wonder if intel will do something like that again or if the industry as a whole is more tolerant of unreliable / buggy behavior and will just live with it. Examples of Apple just telling people that the poor reception strength was their own fault / changing software to hide problems / etc.

I have a Skylake mobile CPU (i7-6700hq) and it pretty much rocks with Ubuntu 17. Also the system is stable and fast. Under heavy load, e.g. games the system is stable. Compiling a big(>10000 modules) C++ project via ninja/cmake under Qt Creator hangs the system resporducibly after ~15 minutes. I wonder now if this broken hyperthreading could be such sideffect.

When HT first started appearing on P4 chips I was looking after NetWare, 2K and XP boxes, they would freak out with HT enabled all kinds of oddities, I suspect most because of the OS's not fully supporting it.

To this day I disable it by reflex on everything!

I do wonder though, why didn't Debian maintainers pick up the microcode updates when they were made available by Intel? Why did it need a wink from the Ocaml people for them to note? Or am I missing something?

The late 2016 Razer Blade uses the i7-6700HQ which is specifically a Family 6, Model 94, Stepping 3 processor.

I wonder if a microcode update would solve some of the various issues I have in Windows.

My CPU (6th generation i5) died last week. RIP.

I installed debian 9, installed virtualbox, vagrant, setup a clean development machine for myself, everything took 4 hours to finish.

I reboot the virtual machine, and boom, there was a kernel panic which I sadly don't remember exactly / didn't take a picture of. After I rebooted the machine, and opened terminal, the system froze. The cursor wouldn't move. Reboot again, motherboard has a CPU fail/undetected light on. Couldn't get it to boot after that.

I am both sad and relieved that bad stuff exists, but it's being patched to prevent proliferating.

I sincerely hope I'll get a replacement from Intel.

  • CPUs rarely die, unless you're OCing or PSU went bad and took things out, I am willing to bet your MoBo is the part that is bad.

What is the probability for this to happen? Or how could I estimate the time it takes for random code to hit this bug at least once with a probability of over 90%?

A little off-topic, but does anybody know of any hacky ways to disable hyper-threading (on Haswell if it matters) if the firmware doesn't provide the option?

If I run $grep name /proc/cpuinfo | sort -u

model name : Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz

then $cat /proc/cpuinfo | grep ht

Definitely there under flags Should I be concerned?

  • Your processor isn't affected. It would be i[357]-[67]xxx processors only.

    But a new computer may serve you well...

Is it fixed or not? Beginning of post says to disable hyperthreading but then goes on to say Intel fixed this with a microcode update.

This is just great! Just yesterday I got a new laptop with a skylake processor. Now I wonder whether I experienced that bug today as I got kicked out of dosbox (on debian) for no apparent reason. In the config file I had to change the value of 'core' in the cpu-section from 'automatic' to 'normal'. Could be something entirely different but it is a funny timing.

Well, at least Intel acknowledges, documents and finally fixes these CPU bugs (via microcode updates).

AMD on the other hand doesn't even acknowledge an issue when multiple customers report problems. See this Ryzen bug: https://community.amd.com/thread/215773

  • >at least Intel acknowledges

    Not in this case.

    "Apparently, Intel had indeed found the issue, documented it (see below) and fixed it. There was no direct feedback to the OCaml people, so they only found about it later."

    • It was acknowledged, and documented in a errata bulletin, but not communicated back to the reporter. Assuming that it had not already been reported, discovered internally, or reported by another source. Seems more like sloppy followup on a bug report, which unfortunately happens in any large project.

Does this affect execution of Ocaml runtime, or only the Ocaml compiler?

has anyone affected by this bug tried using a kernel configured with hyperthreading support disabled? would that work?

So what does this mean for the thousands of new MacBook Pro 2016/2017 owners out there?

  • Nothing, since they've been running their laptops without issues (it would have been all over the news if it was some widespread issue) for 2+ years.

    At some point in the near future Apple will package the microcode fix in an update, and that will be it.

    • I've run into an issue where switching users causes a crash on my 2016 MBP. Many more people are having this issue, according to the Google.

      Also, up until the end of 2016, Apple didn't use Skylake processors, so it wouldn't be 2+ years.

    • Probably already done since this was fixed in April/May by Intel. There has been at least one update since then.

The poor guys from OCaml who found the bug. Imagine how much debugging it takes to find such an issue and narrow it down to the precise register sequence. I guess since it’s a hyper threading bug it even depends on multiple threads doing certain things at the same time. Usually you trust your CPU to execute code properly.

Intel's communication is incredibly poor. Errata exist for all CPUs but this one is quite important and resulted in no proper public communication it seems.

One of many typical erratum... nothing to see here, been patched months ago. Most people are unlikely to ever encounter it even if unpatched.

Intel treats its customers like mushrooms: feeds them shit and keeps them in the dark.

Do you need a professional hacker, contact cipherhacker1@GMAIL.COM Our service includes Change University grades Facebook, twitter, IG hack Email hack icloud and phone hack * grades change Wipe criminal records Wipe credit card debt MasterCard's/visa cards Bank account *Data base hack and lot more hacking services in general Among other customized services...all this are at all great rate. Results guaranteed. Contact us at cipherhacker1@GMAIL.COM

Do you need a professional hacker, contact cipherhacker1@GMAIL.COM Our service includes Change University grades Facebook, twitter, IG hack Email hack icloud and phone hack * grades change Wipe criminal records Wipe credit card debt MasterCard's/visa cards Bank account *Data base hack and lot more hacking services in general Among other customized services...all this are at all great rate. Results guaranteed. Contact us at cipherhacker1 GMAIL.COM

Do you need a professional hacker, contact cipherhacker1@GMAIL.COM Our service includes Change University grades Facebook, twitter, IG hack Email hack icloud and phone hack * grades change Wipe criminal records Wipe credit card debt MasterCard's/visa cards Bank account *Data base hack and lot more hacking services in general Among other customized services...all this are at all great rate. Results guaranteed. Contact us at cipherhacker1@GMAIL.COM

Great, pay a premium for the top of the line CPU to get anything more than 4 threads, that disable it...

  • Premium != perfect. You pay for the extra speed, cache, etc.

    And since a cheaper cpu can have the same or worse bugs, the point is moot.

    You don't pay i7 vs i5 etc for better quality control.

    For that you have to go to enterprise/server grade stuff that costs much more (Xeon). And even there, nothing is guaranteed to be perfect.

    In fact, with the complexity of modern CPUs/GPUs it's a minor miracle that anything works at all. We have it better than ever...