Comment by matt_wulfeck
9 years ago
There's a lot of operational benefits to running on Hadoop/yarn as well. You get operational benefits from node resiliency (host went down? Run the application over there). You also get the Hadoop filesystem which conveniently stores your data in S3 and distributed HDFS.
These systems were designed by people who probably managed difficult etl pipelines that were nothing but what the author suggests: simplified shell scripts using UNIX pipes.
Besides going up against Hadoop MR is easy. I'd like to see you compete against something like Facebook's presto or spark which are optimized for network and memory.
No comments yet
Contribute on Hacker News ↗