← Back to context

Comment by PeterCorless

2 years ago

When datasets are small and can easily fit into a single node [a few terabytes], this isn't as much of an issue. Yet when datasets grow far larger, or when compute/QPS needs grow while the dataset grows slower — when either side of the equation does not scale in balanced proportion with each other — that's when this separation of compute & storage becomes vital. [Either that, or you need to find hardware servers or cloud instance types that also support this imbalance of compute & storage, which is sometimes harder to do; it also locks you into a hardware configuration that cannot dynamically scale as needs and workloads change.]

Apache Pinot also offers the same 2-tier compute/storage separation. And it also has nodes for minion [administrative] tasks. Again, these are more issues for larger scale analytical use cases.