I would ensure you understand all the Hadoop software projects, both ones in incubation as well as active. A good starting place would be:
HDFS stores files as data blocks and distributes these blocks across entire cluster as hdfs was designe fault taulerant and run of commodity hardware blocks a replicate no of times to ensure high data availability the replecation factor is a property that can be set in the hdfs. configure file that will allow you to adjust global replication factor for entire the cluster . for each block stored hdfs there will be n-1 duplicate blocks distributed across the cluster . for example if replication factor set to 3 default value of hdfs there would be one original block and two replicats.
Replication factor how many copies of hdfs blocks should be replicate in hadoop cluster.
default replicate factor = 3
minimum replicate foctor that can be set =1
maximum replicate factor that can set =52
one can set the replication factor in hdfs site xml file as fallow
<name>dfs replication </name>
if one copy is not accessible and corrupted then the data can be read from the other copy.
you will have ample time to send an alert to namenode and recover duplicate of the failed node into a new node.
in the mean time if the second is also failed unplanned you will still have one active critical data to process