Comment by rusanu
9 years ago
60TB at 500Mb/s transfer will take +1 day to read the data. This is the problem of drinking the ocean through a straw. Even with SSD transfer rates, is still a problem at scale. Clusters give you no only capacity, but also multiplication factor for transfer rates.
Just use 24 of them interleaved/stripped and it will take just one hour for loading the data.
But then you need small disks (eg. 2TB). My point is that huge capacity drives are not appropriate in compute environments, as Hadoop is. They're more for cold storage.