Comment by kyyol
8 days ago
I run Ceph in my k8s cluster (using rook) -- 4 nodes, 2x 4TB enterprise SSDs on each node. It's been pretty bulletproof; took some time to set up and familiarize with Ceph but now it's simple to operate.
Claude Code is amazing at managing Ceph, restoring, fixing CRUSH maps, etc. It's got all the Ceph motions down to a tee.
With the tools at our disposal nowadays, saying "I wouldn't dare deploy it without a deep understanding of the source code" seems like an overexaggeration!
I encourage folks to try out Ceph if it supports their usecase.
Considering the hallucinations I routinely deal with about databases, there isn’t a chance in hell I would trust an LLM to manage my storage for me.
If you setup ceph correctly (multiple failure domains, correct replication rules across failure domains, monitors spread across failure domain, osds are not force purged) it is actually pretty hard to break it. Rook helps a lot too as rook makes it easier to set up ceph correctly.