← Back to context

Comment by arjie

9 years ago

So you have large data storage, and processing that can handle large data (assuming for convenience that you have a conventional x86 processor with that throughput). The only problem that remains is moving things from the former to the latter, and then back again once you're done calculating.

That's (100 * 1024 GB) / (20 GB/s) = 85 minutes just to move your 100 TB to the processor assuming your storage can operate at the same speed as DDR4 RAM. A 100 node Hadoop cluster has (100 * 1024 GB) / (0.2 * 100 GB/s) throughput with commodity disks.

Back-of-the-envelope stuff, obviously, with caveats everywhere.