Comment by PeterCorless

3 years ago

When datasets are small and can easily fit into a single node [a few terabytes], this isn't as much of an issue. Yet when datasets grow far larger, or when compute/QPS needs grow while the dataset grows slower — when either side of the equation does not scale in balanced proportion with each other — that's when this separation of compute & storage becomes vital. [Either that, or you need to find hardware servers or cloud instance types that also support this imbalance of compute & storage, which is sometimes harder to do; it also locks you into a hardware configuration that cannot dynamically scale as needs and workloads change.]

Apache Pinot also offers the same 2-tier compute/storage separation. And it also has nodes for minion [administrative] tasks. Again, these are more issues for larger scale analytical use cases.

2 comments

PeterCorless

ddorian43 3 years ago

> Apache Pinot also offers the same 2-tier compute/storage separation.

Based on looking at the docs, I don't think so. Maybe only with HDFS. Feel free to link to a page that says otherwise.

PeterCorless 3 years ago

The Brokers are the compute layer. The Servers are the storage later.
Note that this is separate from the fact that the Servers can also be in a tiered storage configuration.
https://docs.pinot.apache.org/basics/architecture