← Back to context

Comment by jodrellblank

10 days ago

I'm not posting to convince people they should use it, just that it's a really cool piece of open source infrastructure that I think is less well known, and I resepect it. It is very configurable and tunable, has a lot of features, command lines, and things to learn, and that does need people with skills and time.

That said, it doesn't need constant management; it's excellent at staying up even while damaged. As long as the cluster has enough free space it will rebuild around any hardware failure without human intervention, it doesn't need hot spares; if you plan it carefully then it has no single point of failure. (The original creator introduces the design choice of 'placement groups' and tradeoffs in this video[1]).

Most of the management time I've spent has been ageing hardware flaking out without actually failing - old disks erroring on read, controllers failing and dropping all the disks temporarily causing tens of seconds of read latency which had knock-on effects, or when we filled it too full and it went read-only. Other management work has been learning my way around it, upgrades, changing the way we use it for different projects, onboarding and offboarding services that use it, all of which will vary with what you actually do with it.

I've spent less time with VMware VSAN, but VSAN does a lot less, it takes your disks and gives you a VMFS datastore and maybe an iSCSI target. There can't be many alternatives which do what Ceph does, and require less skill and effort, and don't involve paying a vendor to manage it for you and give you a web interface?

[1] https://www.youtube.com/watch?v=PmLPbrf-x9g