Comment by aub3bhat

9 years ago

They are not overkill at all, rather they are tuned towards different set of performance characteristics. E.g. in the Boeing 777 example above, transatlantic journey.

In the article above, the data and results stay on the local disk, however in any organization, they need to be stored in a distributed manner, available to multiple users with varying levels of technical expertise. Typically in NFS or HDFS, preferably if they are records stored/indexed via Hive/Presto. At which point the real issue is how do you reduce the delay resulting from transferring data over the network. Which is what the original idea (moving computation closer to data) behind Hadoop/MapReduce.

1 comment

aub3bhat

qwertyuiop924 9 years ago

rolls eyes

The point is that if you've got such tiny quantities of data, why are you storing it in a distributed manner, and why are you breaking out the 777 for a trip around the racetrack? Grab the 777 when you need it, and take the Tesla when you need the performance characteristics of a Tesla.