Comment by aub3bhat

9 years ago

From my experience organizations have adopted, Hive/Presto/Spark on top of Hadoop. Which actually solves a whole bunch of problems that "script" approach would not. With several added benefits. Executing scripts (cat, grep, uniq, sort) do not provide similar, benefits, while they might be faster. A dedicated solution such as Presto by Facebook will provide similar if not even faster results.

https://prestodb.io/

5 comments

aub3bhat

qwertyuiop924 9 years ago

Ah, so it doesn't solve data storage, and runs SQL queries, which are less capable than UNIX commmands. If your data's stuck inside 15 SQL DBs, than that'd make sense, but a lot of data is just stored in flat files. And you know what's really good at analyzing flat files? Unix commands.

aub3bhat 9 years ago
Did you even read it? Presto reads directly from HDFS, which is as close to distributed "flat files" as you can get. As far as "SQL being less capable than UNIX commands", you have got to be kidding me. SQL allows type checking, conversion, joins all of which are difficult if not impossible with grep | uniq | sort etc.
- qwertyuiop924 9 years ago
  
  I read it.
  >Presto allows querying data where it lives, including Hive, Cassandra, relational databases or even proprietary data stores. A single Presto query can combine data from multiple sources, allowing for analytics across your entire organization.
  That doesn't sound like HDFS to me. I mean, I assume it can read from HDFS, but Presto is backend agnostic. You could probably write code to run it on Manta. That would be neat for people who like Presto, I guess.
  Type checking and conversions, no, and table joins only matter when you're handling relational data.
  Also, how many formats can Presto handle? Unix utilities can handle just about any tabular data, and you can run them against non-tabular data in a pinch (although nobody reccomends it). I doubt Presto is that versitile.
  
  2 replies →