You don't have the same tools. You are probably thinking about emulating POSIX filesystem API and things like that and using those command-line tools on top of that in a single-box kind of way. That's not how you treat your distributed system.
EDIT:
For something that beats a single box easily I envision an interpreter with JIT running on each node in a distributed system and on the same process that stores data, having pretty much no overhead to access and process it.
>You are probably thinking about emulating POSIX filesystem API and things like that and using those command-line tools on top of that in a single-box kind of way. That's not how you treat your distributed system.
Yeah, but Manta's mapreduce does something close, and it seems to work okay.
You don't have the same tools. You are probably thinking about emulating POSIX filesystem API and things like that and using those command-line tools on top of that in a single-box kind of way. That's not how you treat your distributed system.
EDIT: For something that beats a single box easily I envision an interpreter with JIT running on each node in a distributed system and on the same process that stores data, having pretty much no overhead to access and process it.
>You are probably thinking about emulating POSIX filesystem API and things like that and using those command-line tools on top of that in a single-box kind of way. That's not how you treat your distributed system.
Yeah, but Manta's mapreduce does something close, and it seems to work okay.