Comment by cle

9 years ago

Not necessarily true. Depending on your use cases, it often still makes sense to use Hadoop. A really common scenario is that you'll implement your 3.5 GB job on one box, then you'll need to schedule it to run hourly. Then you'll need to schedule 3 or 4 to run hourly. Then your boss will want you to join it with some other dataset, and run that too with the other jobs. You'll eventually implement retries, timeouts, caching, partitioning, replication, resource management, etc.

You'll end up with a half assed, homegrown Hadoop implementation, wishing you had just used Hadoop from the beginning.

0 comments

cle

No comments yet

Contribute on Hacker News ↗