Comment by bri3d
14 hours ago
Linked in the Bugzilla thread is a really nice in depth investigation of the same issue with high register aliases in a similar algorithm (Huffman coding) but in an entirely different product: https://fgiesen.wordpress.com/2025/05/21/oodle-2-9-14-and-in... .
It's concerning that Intel don't seem to have been responsive to anyone with respect to this issue and it doesn't appear to have an official errata yet, although Raptor Lake was the Intel CPU with voltage issues and basically random bit rot so I suppose it's hard to tell if this is a silicon level errata caused by bad design or by some kind of post-manufacturing damage. Raptor Lake in general causes enough non-reproducible noise that I believe Firefox gave up on automated crash reports from it ( https://bugzilla.mozilla.org/show_bug.cgi?id=1975808 ).
EDIT: I read that Oodle article (which is SO good!) again and realized that their customer-provided reproduction of the bug was directly linked to boost clock speeds (the customer said that overclocking by 5% made it happen entirely reliably), so this is definitely not a "the architecture has a 100% bug in it" but rather some deeper issue with clock propagation that appears at edge cases.
Read the Oodle article in full, fantastic investigation indeed!
It also looks like there's a slight difference in the unwanted effect both companies have reported, despite the bug being seemingly triggered the same way (mov touching the high byte):
- Oodle reports that a low byte is occasionally stored in the intended location.
- Mozilla's fix suggests that a full 16-bit value is stored instead, corrupting an adjacent variable! This could have much more serious consequences.
Technically, this could still be the same exact bug. I found no mention of the order the output buffer was accessed in by the Huffman decoder debugged in the Oodle report, and, since it was a contiguous buffer, it's easy to mistake an occasional out-of-bounds copy there for a copy from a wrong location. But if both analyses are correct, the behavior of high byte accesses on Raptor Lake is way less predictable than those fixes suggest. Haven't managed to find an official erratum from Intel.
It's very interesting because my 13900K has worked like a dream from day one and still to this day. Never had any of the voltage issues, never had any abnormal crashes in Firefox or any other software. I was undervolting it for a long while, so I wonder if somehow that saved me from the voltage issues before they were fixed?
Undervolting would definitely help, and is the actual fix. The current Intel fixes were mostly just for the symptoms, as the main issue is high voltage+power when pushing high clocks, but they can't actually fix that as it'd downgrade the advertised clocks the cpus were sold with
Sorry, but that understanding is dangerously incomplete. You're describing the first set of issues they uncovered, but there's also:
"Microcode and BIOS code requesting elevated core voltages which can cause Vmin shift especially during periods of idle and/or light activity" (emphasis mine)
https://community.intel.com/t5/Blogs/Tech-Innovation/Client/...
Recall also that "Vmin shift" means "the minimum voltage the processor needs to run correctly goes up" so if the issue isn't addressed, that level of undervolt may stop working
1 reply →
I remember Puget systems pointed to this same thing when they analyzed the issues back in the day when it was blowing up.
https://www.pugetsystems.com/blog/2024/08/02/puget-systems-p...
My 1360p and 13400 seem fine too. I applied the microcode and firmware updates when they came out... but I'm guessing it didn't affect all skus equally for whatever magical reason.
My 13900K was affected by the widespread voltage issue and had to be replaced, but since then I have had zero problems with it.