Comment by PeterCorless
2 years ago
When datasets are small and can easily fit into a single node [a few terabytes], this isn't as much of an issue. Yet when datasets grow far larger, or when compute/QPS needs grow while the dataset grows slower — when either side of the equation does not scale in balanced proportion with each other — that's when this separation of compute & storage becomes vital. [Either that, or you need to find hardware servers or cloud instance types that also support this imbalance of compute & storage, which is sometimes harder to do; it also locks you into a hardware configuration that cannot dynamically scale as needs and workloads change.]
Apache Pinot also offers the same 2-tier compute/storage separation. And it also has nodes for minion [administrative] tasks. Again, these are more issues for larger scale analytical use cases.
> Apache Pinot also offers the same 2-tier compute/storage separation.
Based on looking at the docs, I don't think so. Maybe only with HDFS. Feel free to link to a page that says otherwise.
The Brokers are the compute layer. The Servers are the storage later.
Note that this is separate from the fact that the Servers can also be in a tiered storage configuration.
https://docs.pinot.apache.org/basics/architecture