Comment by dilyevsky

5 years ago

What happens when your single server solution is disconnected from the network for whatever reason or you exceed the current configuration? Bad times i bet. Cost of components isn’t the only factor here.

This may seem trite, but if you can get 20 servers worth of performance out of one you can afford to run two active-active and still reap a 10x capex/opex savings. The technology to have simple but reliable systems has been around for decades. You also can't assume that the cloud is going to never fail, so you always have to defend against failure whether it be running two servers or two availability zones.

  • Also a lot of the time you're not trying to achieve (and can't anyway) an uninterruptible uptime - you just need a rapid recovery from infrequent outages.

  • Apples to oranges - you’re not gonna get 10x saving by running a/a in a strong consistency mode

    • I don’t see how this follows. If you have one single-threaded server doing the job of 20 similarly specified servers running the distributed system, you could run every job twice, on two completely independent servers on two completely independent networks, and still be 10x as efficient unless both failed catastrophically during the same job. Or you could run three completely independent copies of the same job and still be at nearly 7x efficiency, etc. There is no need for any sort of “consistency mode” here. This is just brute force, without any synchronisation between servers or resumption of aborted jobs at all.

      3 replies →

That’s only if there is no contingency plan in place and a backup server. Usually these things are thought out beforehand. And it’s not like you’re excused from this type of problems when you’re running your solution in the cloud. From cloud outages to configurations nightmares, data inconsistencies, not knowing what is happening because pinpointing on a complex infrastructure takes more time and so on. Sometimes the cloud way is the way to go but other times it is not justified.

We had a replication HA solution, but it was "warm", and customers didn't really want to run another server doing nothing. I think we also charged a lot for it.

The area we were in, customers could tolerate some outage. Restoring from a backup didn't take that long. I believe we were known for good support helping people get back up.

But yeah, what you are saying is all the arguments that kept coming to us. People were used to having a few servers for the database, and different front ends, and layering things like that.

Isn't this a solved problem? Just use a failover, right?

  • If your consistency model allows you to use their asynch replication and potentially lose data in surprise switchover - sure. Otherwise you’re looking at similarly worse performance and also worse reliability