WORKING OF HADOOP FRAMEWORK
Hadoop framework follows the Master-Slave architecture. With the help of Hadoop, large volume of data can be stored and processed in parallel manner. HDFS allows the fast data transmission rates and also allows the system to complete its all operation even in case of any node failures. It combines the file systems on local nodes to create a single huge virtual file system. For Processing, Hadoop uses MapReduce. Initially, data is divided into chunks which are then parallel processed by Map step. The results of the maps are then sorted and given as input to the Reduce step.
Normally, both input and output are stored in the file system. This framework also takes care of scheduling of multiple jobs, monitoring of currently running tasks and re-execution of failed tasks.
Figure 2 shows the framework of Hadoop. In HDFS data is divided into blocks and then these blocks are stored separately. When client wants to store the file in HDFS then first it sends request to the namenode. Namenode will then send with the reply that how and where to store the blocks of this file. Client then divide the file into blocks of fixed size(128 MB default size) and then store those on slaves accordingly. When the block is sent to the slave then slave makes multiple (by default 3) replicas of that block to achieve availability. After storing the replicas on different slaves it will notify the namenode about the storage and then namenode updates the metadata. In this way HDFS stores the files in distributed manner.
However Hadoop framework was considered to work in a precise isolated environment. But moving it to the public cloud, many challenges arose related to the security mechanisms of Hadoop. Traditional security mechanisms, which were made for securing small static data, are inadequate. Nowadays most of the technologies are being dependent on Big data and therefore, security of Big data is an important issue.
Figure 2. Hadoop framework
