Comment by rusanu

9 years ago

60TB at 500Mb/s transfer will take +1 day to read the data. This is the problem of drinking the ocean through a straw. Even with SSD transfer rates, is still a problem at scale. Clusters give you no only capacity, but also multiplication factor for transfer rates.

Just use 24 of them interleaved/stripped and it will take just one hour for loading the data.

  • But then you need small disks (eg. 2TB). My point is that huge capacity drives are not appropriate in compute environments, as Hadoop is. They're more for cold storage.