← Back to context

Comment by toast0

9 years ago

My hadoop experience is dated (circa 2011), do the work nodes still poll the scheduler to see if they have work to do? If so, that's still a giant impediment to speed for smaller tasks. Especially if poll times are in the range of minutes.

If hadoop put effort into making small tasks time efficient, I think your argument has merit, if there's a reasonable chance of actually needing to scale, or to pick up ancillary benefits (fault tolerance, access to other data that needs to be processed with hadoop etc)