A Novel MapReduce- Based Hybrid Decision Tree with TFIDF Algorithm for Public Sentiment Mining of Diabetes Mellitus

In the recent information era, Internet of Health Things (IoHT) objects are embedded in sensors for observing the environment and use an Internet connection for data transmission. It offers the capability of collecting data and examining it in real time. At the same time, due to the development of information technologies, the generation of data becomes higher regularly, which results in establishing a model named “big data.” It could be constrained as volume, velocity, variety, validity, and value (5V) [1]. Hence, big data is defined as data that becomes heterogeneous basically and the maximum amount of data has been upgraded with limited duration. It is evaluated that the data produced might be improved to 35 trillion gigabytes (GB) [2]. Big data is massive and robust, and hard to determine when compared w'ith the traditional model [3]. As a result, resolving big data needs diverse methodologies, models, devices, and structures while expanding data at analytical values [4]. In order to deal with developing big data models to produce applicable insight into the latest processing, an analytical approach is essential. Hence, big data analytics is said to be tedious and easy to predict large-scale data that ranges from terabytes (ТВ) to yottabytes (YB).

Opinion mining or sentiment analysis (SA) through the Internet is assumed to be a novel research field that inspires and produces vital interest between developers. SA is defined as the analysis of emotion toward a product or action [5]. It applied in businesses to point and examine the digital information to enhance the standard of products and provide an optimal service for users [6]. The research on the previous work states the way of developing an opinion and emotions from the attained data by social media. At the initial stage, SA has been employed in classifying movie reviews or product reviews that may be either positive or negative [7]. There are massive techniques in finding sentiment of syntactic, semantic, and feature (machine learning [ML]). The former method applies n-gram [8] and resulted in higher accuracy. The second technique is w'idely applied by several authors to find the opinion of texts. Most of the research focused on extracting public sentiment under the application of diverse natural language programing (NLP) schemes. Saggion and Funk [9] employed a model to process classification tasks [10]. ML used in identifying sentiment is performed by Zhang et al. [11]; a new ML technique is presented, and feature-based learning is enhanced. Current research aims to know the opinions on social as well as geopolitical content.

The Hadoop Echo system and its elements are typically applied to manage big data [12]. Hadoop is said to be a freely accessible model that enables consumers to save and compute big data in a shared platform over a collection of the system with the application of easy programming modules. It is developed to hold maximum fault-tolerant nature as well as reliability from an individual server to a million nodes [13]. There are three major units of a Hadoop such as Hadoop Distributed File System (HDFS), MapReduce, and Hadoop YARN. HDFS has been created on the basis of the Google File System (GFS). Hadoop MapReduce is said to be a conceptual method at the core of Apache Hadoop to provide numerous scalability in Hadoop clusters. MapReduce can be used in processing more amounts of data. The performance of MapReduce encloses two vital phases such as the Map phase and the Reduce phase. Every phase is comprised of input and output; where input and output job are saved in the file system. This approach follows scheduling operations, tracking, and reimplementing the ineffective operations.

YARN is referred to as cluster managing methodology. It is a major characteristic in second generation of Hadoop, established from knowledge attained from first generation Hadoop. YARN offers resource handling as well as a centralized environment to supply reliable task, secure, and data provisioning devices over Hadoop clusters. Fig. 7.1 depicts the Hadoop platform applied in managing big data effectively.

Hadoop ecosystem

FIGURE 7.1 Hadoop ecosystem.

The Proposed Model

This section defines the proposed system that gathers and examines people’s emotions in food, lifestyle, and physical activity with the application of the big data method. It is comprised of three stages: Data Collection, Data Integration, and Analysis.

