Comment by mtlmtlmtlmtl
11 hours ago
It's very interesting because my 13900K has worked like a dream from day one and still to this day. Never had any of the voltage issues, never had any abnormal crashes in Firefox or any other software. I was undervolting it for a long while, so I wonder if somehow that saved me from the voltage issues before they were fixed?
Undervolting would definitely help, and is the actual fix. The current Intel fixes were mostly just for the symptoms, as the main issue is high voltage+power when pushing high clocks, but they can't actually fix that as it'd downgrade the advertised clocks the cpus were sold with
Sorry, but that understanding is dangerously incomplete. You're describing the first set of issues they uncovered, but there's also:
"Microcode and BIOS code requesting elevated core voltages which can cause Vmin shift especially during periods of idle and/or light activity" (emphasis mine)
https://community.intel.com/t5/Blogs/Tech-Innovation/Client/...
Recall also that "Vmin shift" means "the minimum voltage the processor needs to run correctly goes up" so if the issue isn't addressed, that level of undervolt may stop working
Not sure what's supposed to be wrong with that? The clock tree degrades at high voltage. Some theories I've seen were on the CPU requesting significantly higher voltages during alternating clocks when there's a short lull in load from e.g. a pipeline stall. Then there doesn't seem to be a good enough of a sensor net in the correct places for the CPU to react to this, so it just "burns" itself down gradually. Assuming these are true, actual fixes from intel would be relaxing boost clocks to ones that are universally safe and open themselves to a lawsuit from everyone that bought the high end SKUs, or do a new stepping which is extremely expensive for a done design.
When you degrade the CPU naturally needs higher voltages to be stable, until the point where it just breaks completely and no amount of voltage it help it. But if your CPU doesn't degrade because it hasn't been overdoing it on voltages then there'll be no issues for Vmin to shift.
As an anecdotal experience from someone I know that runs these in prod for game servers, limiting the CPU to 80°C and 1.4V-1.45V, 400A has been keeping them alive for years doing 24/7 loads. Maybe a bit lower on the voltage if one wants to be sure longer term, as they are fine with just mass RMAing these. There's also large amount of differences in the silicon quality between samples that can make one run cool and completely fine even at the old stock settings, and an another sample that'll have to pull say 1.5x the power for the same load and clocks having it degrade.
I remember Puget systems pointed to this same thing when they analyzed the issues back in the day when it was blowing up.
https://www.pugetsystems.com/blog/2024/08/02/puget-systems-p...
My 1360p and 13400 seem fine too. I applied the microcode and firmware updates when they came out... but I'm guessing it didn't affect all skus equally for whatever magical reason.
My 13900K was affected by the widespread voltage issue and had to be replaced, but since then I have had zero problems with it.