← Back to context

Comment by gmokki

3 years ago

I worked at a customer that upgraded to major new MySQL version and boosted the hw to 4x more performant (ram+cores). Result: their average transaction time climbed from 0.1ms to 0.2ms and all support personnel started complaining since the their software started taking 4 seconds to load a new screen instead of 2 seconds. The support systems operated on raw data with no cache to show the actual state.

Managed to fix the problem and restore performance by suggesting we disable the MySQL query cache which was slowing down the system with more cores.

And this was nothing to do with banking or high frequency anything. Just code that did lots of small queries to fill a screen full of data to support persons.

> Managed to fix the problem and restore performance by suggesting we disable the MySQL query cache which was slowing down the system with more cores.

This approach for a fix sounds very unintuitive.

Did you manage to understand why this was like that? The explanation is likely interesting!

What I could think of: Context switches across cores constantly invalidated data in the CPU caches. In such a case CPU pinning would help maybe. (Just speculating! I'm not an expert on such things. But I know that the CPU cache, and memory I/O in general, is the single most important topic when dealing with usual performance issues on modern CPUs. Today's CPUs are only fast when they have their data available in their local cache(s). Fetching form RAM is by now like fetching from spinning rust a decade ago; it will kill your performance no mater how fast your CPU is).

Tangentially related: https://news.ycombinator.com/item?id=14888360

  • The query cache was behind a single lock. Thus while fast, it serialized all DB operations. And the more CPUs you have the slower the performance. Especially since any write has to go through the cache and invalidate all affected queries.

    It was meant to accelerate PHP code in the 90s with single core CPUs. It was never designed to be scalable and in-development (back then) versions had already disabled it, which gave me the idea that it was something to try.