Comment by jl6

2 years ago

I worked at a corp that had built a Hadoop cluster for lots of different heterogeneous datasets used by different teams. It was part of a strategy to get "all our data in one place". Individually, these datasets were small enough that they would have fitted perfectly fine on single (albeit beefy for the time) machines. Together, they arguably qualified as big data, and justification for the decision to use Hadoop was because some analytics users occasionally wanted to run queries that spanned all of these datasets. In practice, these kind of queries were rare and not very high value, so the business would have been better off just not doing them, and keeping the data on a bunch of siloed SQL Servers (or, better, putting some effort into tiering the rarely used data onto object storage).