Comment by p0w3n3d
2 years ago
How can your company guarantee p99.9 if there is only one instance? Is there any log shipping/duplication etc? Is consistency maintained on one server fault?
2 years ago
How can your company guarantee p99.9 if there is only one instance? Is there any log shipping/duplication etc? Is consistency maintained on one server fault?
p99.9 referring to latency. However, we also do a weekly test of how quickly we recover from a catastrophic crash, which is roughly about 6 minutes (which is the amount of time it takes for the autoscaling group to spin up a new host, Litestream to restore the database from s3, and the server to start up again).
Honestly, 99.9% uptime is pretty generous - we can fit in quite a few catastrophes per year and still have 99.9% uptime. In the 2 years this service has been running, we've had 100% uptime via zero-downtime deployments, anyway.
In terms of monitoring, traces and error logs are shipped to our observability solution, yes.