Comment by hansvm
2 days ago
Jumping away from the article slightly, where would you go if you needed a ton of vanilla disk space, the ability to do compute near to that disk, and for it to be accessible with a reasonable pipe to the public internet?
As far as I can tell, you'd almost have to do something with a colo if you didn't want to pay 10x or more for the storage. Are there other options?
You'd colo if you didn't want to use public cloud. The savings are good over time, but the upfront cost can be huge depending upon how much disk space you need. The setup/config can also be rough (many 4U boxes packed with tons of disks each, then CephFS + Rados for S3 api). You also want to have a few configured nodes on cold stand by, and many extra disks. Because of this, I've seen some companies take compute off the cloud but leave bulk data storage on the cloud.
Well, is "a ton" a level that can fit in a dedicated server or many? Just looking at https://serversearcher.com you've got 72TB of storage on 10Gb pipe for $360 or so a month: 128 GB RAM 6c/12t E-2276G 2X 512GB NVME + 4X 18.0TB HDD 300TB / 10Gbps
Just a few PB, and sharding it across servers is fine.
It looks like all the disk-optimized examples on that site (still much more expensive than paying for raw disk, barely 5x cheaper than S3, when a disk-optimized colo solution only has ~3% overhead over the disks themselves) are through some no-name provider "HostKey". I suppose beggars can't be choosers, but in the contexst of storage (where systemic failures should be accounted for in the model) are you aware of more than one provider with reasonably priced storage?
I mean on that site you also have clouvider who are cheaper $423.79/mo for 72TB including bandwidth, compute and two extra boot SSDs. I just searched by a minimum disk size of 12TB.
Colo will be cheaper I'm sure but it's fundamentally a different comparison you have to pay for drive failures, networking, bandwidth, remote hands, network switches and so on and so forth.
2 replies →