Comment by mystifyingpoi
1 day ago
> If your primary load exceeds 30% (CPU util), consider adding read replicas.
I'm not an expert, but isn't this excessive? In theory you could triple the load and still have slack. I'd actually try to scale down, not up.
If most of your users are concentrated in the same (or nearby) time zones, your traffic can easily vary by 5–10x over a 24-hour period. In that case, 30% average CPU utilization doesn't mean you have 70% headroom at peak... it may already imply you're close to saturation during busy hours.
For example, if 30% is your daily average and your peak-to-average ratio is ~5x, you're effectively hitting 150% of capacity at peak. Obviously the system can't sustain that, so you'll see queueing, latency spikes, or throttling.
The 30% guideline makes sense if you care about strict SLAs and predictable latency under peak load. If you're more tolerant of temporary slowdowns, you could probably run closer to 60–70% average utilization, but you're explicitly trading off peak performance and tail latency to do so.
Load is highly bursty. You can autoscale application services quickly, but scaling your database up is a slower thing.