Working of HADOOP architecture

Technology advancement makes all the data to be migrated from the file-based system to computer-based systems. Handling such large data is a challenging task for IT professionals. In order to solve the problem of large data, there are multiple techniques and technology available and proposed by many researchers, but to explain the working of big data, we are explaining the working of Hadoop. The Hadoop architecture framework consists of four modules as follows:

  • 1. Hadoop Common
  • 2. Hadoop Yarn
  • 3. HDFS
  • 4. Hadoop Map reduce

Hadoop common consists of java utilities and libraries that are required by Hadoop modules. Hadoop yarn is defined as the framework for searching jobs and is responsible for cluster resource management. The HDFS is a distributed file system used to provide large throughput access to the application data, whereas Hadoop Map reduce is used for parallel processing of large data sets to process information in a faster manner.

Map Reduce and the HDFS are defined as two most important pillars of the Hadoop system as shown in Hadoop architecture in Figure 3.6, where map reduce is used for processing of whole data sets for whole query and thus has the ability to process as ad-hoc queries to provide results. It performs such activity by means of a programming model which is used for abstracting the problem from the disk reads

Hadoop architecture

FIGURE 3.6 Hadoop architecture.

and writes and then transforming it into computation over set of keys and values. Thus, Map Reduce is defined as programming model which is used for processing and generating large data sets in parallel processing using a distributed algorithm in a cluster manner. Map Reduce thus helps in unlocking the data which was previously achieved on tape or disk.

The HDFS, on the other hand, is used for storing the data in an organized manner. Thus, Hadoop in combination with Map Reduce and the HDFS is used for providing a reliable storage with efficient analysis systems for big data sets.

Hadoop runs application using a map-reduce algorithm which processes data parallels on different CPU nodes. The working of Hadoop can be categorized into three stages as follows:

  • 1. In the first stage, user by means of application can submit a job to Hadoop for specific process by means of specifying variables like location of input and output files within a file system. This stage consists of java classes in the form of jar file which contains implementation of map and reduce functions.
  • 2. In the second stage, Hadoop job client submit the job and configuration to job tracker which is responsible for distribution of the work to the slave, scheduling tasks, and monitoring of this job.
  • 3. In the third stage, the task tracker at different nodes executes the task as per Map Reduce implementation and the output of this reduce function is stored in output files in the file system.

Thus, Hadoop is more beneficial than the traditional file system in the following manner:

  • 1. Hadoop makes writing and testing of distributed system in a faster manner.
  • 2. Hadoop makes process independent of hardware.
  • 3. Hadoop makes dynamic addition and removal of server from the cluster without affecting the operation.
  • 4. Hadoop being implemented in java makes it compatible on all platforms.

Image Processing with Big Data Analytics

With more inventions in image processing, the use of image processing has been extended to industries, organizations, administrative divisions, various social organizations, economic/business institutions, healthcare, defense, etc. Image processing takes images as input and using various image-processing techniques produces modified and enlarges images as output. The image processing can be applicable to images as well as on videos to extract a part of image or video which needs to be addressed. The image processing involves a large amount of data in the form of images from satellite, medical, defense, etc. which need to be addressed in an efficient and faster manner and thus the need for big data comes into picture. Image processing with big data is used to process large data and used to store this data in a structured or unstructured format as a result of processing images by means of different computation techniques. This big data analytics integrating with image processing can be used for mining knowledge from the data created which can be used in different sectors like medical, education, defense, agriculture, satellite mapping, etc.

Image processing processes the images by means of applying different computation techniques. It takes images as input and enhances the properties of image to extract the features of importance in order to make it less complex for the purpose of study. Different image-processing techniques are as follows:

  • 1. Visualization: This is technique used to set up a communication by means of messages using images, diagrams, and animation. One of the visualization techniques is visual imagery. Image visualization can be performed by two methods. One is abstract visualization and another is model-based scientific visualization, w'here abstract visualization uses 2D and 3D techniques, whereas model-based scientific visualization uses digitally constructed real images for its purpose.
  • 2. Image restoration: Image restoration is used for clearing noise and recovering the loss of resolution. Thus, it is used to recover the original image from the degraded image. This can be achieved by various software such as paint, Adobe Photoshop, NET, etc.
  • 3. Image retrieval: It is a process of retrieving image from a large database. It involves different techniques such as Content-based Image Retrieval, Human-oriented Image Retrieval, Document-based Image Retrieval, Content-based Visual Information Retrieval, etc.
  • 4. Image recognition: It is used to recognize the image object. It takes input as image and gives the recognized object as output.
  • 5. Image Enhancement: It is used to enhance the required object of concern to high quality which can be addressed easily. There are different techniques used for image enhancement such as morphology, filtering, etc.

Image preprocessing

Image preprocessing enhances the image by means of modifying the value of pixel either in terms of its brightness or contrast for its visual impact which may occur due to blurriness due to capturing from low conventional/digital cameras or from the images obtained from the satellite pictures. Image preprocessing can be divided into two types as follows:

i. Static Thresholding

ii. Dynamic Thresholding

The above-mentioned techniques of image processing with big data techniques can be used for processing and analytics of big data received from satellites, cameras, etc.

Digitized images are analyzed and manipulated for the eminence of image. There is a process of image segmentation which contributes for this integration. The image segmentation process is used to separate the object of concern from the original image which can be used for the purpose of analysis. There are different techniques that are being used in the field of image processing and big data analytics which is discussed in the chapters in the subsequent section.

Authors like Wang et al. (2008) proposed an algorithm for denoising and extraction of contour. The proposed algorithm uses features such as feature extraction, smoothing of images, image reconstruction, enhancement of quality of images, etc. These features along with canny edge detection, median filtering, contour tracing, and wavelet transform help in processing of images captured which have spatial redundancy and high noise. Hu et al. (2014) classified a big data framework into four components as data analytics, data acquisition, data generation, and data storage. He used Hadoop as a tool of data analytics to perform his research and conclude his result. Paulchamy et al. (2017) performed image detection, classification, and recognition on vehicle on the road using MATLAB software. In his experiment, he suggested that the existing model involves the region of interest (ROI) and pixel classification technique which requires a large amount of database. He uses other techniques like raspberry pi, E-speak, etc. for slowing the vehicle at road sign such as speed break, school zone, etc. Pandian and Balasubramanian (2017) revealed that use of image processing is increasing and gathering the data of image leads to generation of gigabyte of space which needs to be handled. The need for organizing, examining, retrieving, and recovering of images from these large data is the demand for computer vision with database management system. Goutam et al. (2016) studied the role of big data and its technologies in the field of map reduce. He defined that map reduce plays an important role in creating key and value. He focused his research on structured and unstructured data, whereas in unstructured data he focused on data of images which got erroneous due to noise and other factors. For augmenting the quality of a tainted image, the mapper function could accomplish acceptably and precisely in creating the data related to key and value pair. This program again can be used as a reducer function for the auxiliary dispensation of major prominence. The author also proposes histogram technique for equalization which can be used for image improvement. This technique in combination with map reduce can be used for effectual and precise process.

Thus, it can be concluded that with advancement in image processing, various applications are there in image processing which process the data of images and videos which produce a large amount of data and which need to be stored and processed in an efficient manner for better results. Image processing application in areas such as medical, remote sensing, defense, etc. where there is a need of faster image processing as the data in these cases are very high and the processing and result require to be in an efficient manner, accurate and faster. Thus, the need for big data analysis comes into picture which helps in processing large data of image and video in a faster and effective manner so that the result can be achieved in a short span of time with effective results. The big data are also used for efficient storing of data in an organized manner. The big data application with image processing can be integrated with other tools such as machine learning, fuzzy logic, etc. for increasing the efficiency of the system which provides the result in an efficient manner in terms of accuracy and time involved.


Agrawal, R. and Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487-499.

Archana. B.P.K., Bruno, A.D., and Gopala. I.D.M. (2017). A novel approach towards road safety based on investigational survey using image processing and user interface system, pp. 105-108.

Buddhiraju, K. M. and Alok. P. (2015). Hyperspectral image processing and analysis. Current Science 108, 833-841.

Elgendy, N. and Elragal (2016). Big data analytics in support of the decision making process. Procedia Computer Science 100, 1071-1084. 10.1016/j.procs.2016.09.251.

Goutam, D.S. and Gupta, S.D. (2016). Big data analytics: Image enhancement based approach. International Journal Advanced Research Computer Science Software Engineering 6(5). 570-573.

Hu, H.. Wen, Y., Chua, T.-S.. and Li. X. (2014). Toward scalable systems for big data analytics: A technology tutorial. Access, IEEE 2, 652-687. 10.1109/ACCESS.2014.2332453.

Komal, M. (2018). A review paper on big data analytics tools.

Pandian, A. and Ratnasamy, B. (2017). Performance analysis of texture image retrieval in curvelet, contourlet, and local ternary pattern using DNN and ELM classifiers for MRI brain tumor images. 10.1007/978-981-10-2104-6_22.

Sudhir, R. (2020). A Survey on Image Mining Techniques: Theory and applications.

Wang, Y. Zheng. J., Zhou, H.. and Shen. L. (2008). Medical image processing by denoising and contour extraction. In Proceedings of the 2008 IEEE International Conference on Information and Automation, ICIA 2008, pp. 618-623. 10.1109/ICINFA.2008.4608073.

< Prev   CONTENTS   Source   Next >