← Back to context

Comment by Numerlor

23 minutes ago

Not sure what's supposed to be wrong with that? The clock tree degrades at high voltage. Some theories I've seen were on the CPU requesting significantly higher voltages during alternating clocks when there's a short lull in load from e.g. a pipeline stall. Then there doesn't seem to be a good enough of a sensor net in the correct places for the CPU to react to this, so it just "burns" itself down gradually. Assuming these are true, actual fixes from intel would be relaxing boost clocks to ones that are universally safe and open themselves to a lawsuit from everyone that bought the high end SKUs, or do a new stepping which is extremely expensive for a done design.

When you degrade the CPU naturally needs higher voltages to be stable, until the point where it just breaks completely and no amount of voltage it help it. But if your CPU doesn't degrade because it hasn't been overdoing it on voltages then there'll be no issues for Vmin to shift.

As an anecdotal experience from someone I know that runs these in prod for game servers, limiting the CPU to 80°C and 1.4V-1.45V, 400A has been keeping them alive for years doing 24/7 loads. Maybe a bit lower on the voltage if one wants to be sure longer term, as they are fine with just mass RMAing these. There's also large amount of differences in the silicon quality between samples that can make one run cool and completely fine even at the old stock settings, and an another sample that'll have to pull say 1.5x the power for the same load and clocks having it degrade.