Decision Support System to Improve Patient Care

V. Diviya Prabha and R. Rathipriya


The amount of data is quickly increasing at a very fast pace in the healthcare sector. The extraction of relevant data from high volume data is a challenging task. The novel physiognomies of medical data are challenging for data mining. The wide variety and the huge volume of data is valuable only when a useful pattern is extracted. The required model is available in the raw data alongside data that is not useful. Obtaining such useful (Archenaa and Mary Anita 2016; Malykh and Rudetskiy 2018) knowledge with a pre-existing data mining approach that is crucial.

Similarly, forming decision-making tasks from raw data in various dimensions of other data are essential. Moreover, there are numerous reports available in hospitals based across cities (Liu et al. 2018) and villages. Forming the correct decision based on a patient's data helps both patients and doctors get a good result (Abraham 2016). High dimensional data with a large number of features show us the importance of feature selection.

Over the last few decades, feature selection with machine learning approaches an important area for research. The significance of choosing the best features helps the doctors and patients understand the medical (Sasikala et al. 2016) data in making the appropriate decision and diagnosis of infected patients as soon as possible. There are several method filters, and wrapper and embed methods for feature selection.

This chapter discusses the likelihood of patients being readmitted to the hospital after discharge, and how the knowledge of drugs taken by the patient and medical data can help in making these predictions. Several previous studies motivate us. An entropy-based measure is useful with the integration of machine learning techniques.

It is analyzing data using attribute selections for a predictive model by considering necessary attributes that support prediction and removing irrelevant attributes. Much of clinical data today consists of irrelevant attributes that weaken the prediction level. Thus, the proposed approach concentrates on decision entropy-based attribute selection. The subset of the attribute is selected based on entropy value and given as an input to the machine learning algorithms such as logistic regression (Prabha et al. 2019), support vector machines, decision trees, etc., for prediction of readmissions. This gives better accuracy than the existing model. It also suits for increasing data dynamically to handle new data for readmission prediction. This helps to reduce readmission risk and improves patient care.

The chapter is organized as follows: In section two, the basic concepts of entropy-based feature selection are explained. In section three, the algorithms and flow of PySpark-based algorithms are evaluated, and concentrations of features are reported. Section five concludes the paper.

Related Work

Frequent work has been carried out in the field of feature selection. The accurate prediction model needs to identify the relevant feature (Xing et al. 2001). The significant features represent the strengths and weaknesses of the features. For subset feature selection, a DFL algorithm (Maji 2010) is used to find the optimal features. Large datasets have more features, so it is important to categorize the relevant features. Differentiation entropy (Cassara et al. 2016) is used for feature subsets so that these important features can be selected. The neighborhood entropy works better for classical game theory process. It recommends that Shannon's entropy works well only for nominal data, and that it does not work well for other data values. A subset feature selection using entropy (Zheng and Chee 2011) for a huge amount of data is the most critical task here, and identifying the correct subset of feature is difficult in this approach (Ahmed and Kumar 2018; Masoudi-Sobhanzadeh et al. 2019).

Feature Selection

The basic filter and wrapper methods are performed in prevision papers, but that is not significant importance to feature selection. Feature selection for high dimensional datasets plays an important role in disease prediction

(Agrawal 2016; Li et al. 2016). In this paper, the main objective is to study the different types of entropy in large datasets. Entropy is used to identify the signs of attributes present in the dataset. The dataset is taken from the UCI repository. The first important step in data mining is data preprocessing. The main advantage is using Spark in Python is the PySpark. The first process is to import PySparkin to the PySpark context, and creating the PySpark context as SparkContext in the local environment.

The flowchart in Figure 4.1 represents the flow of the proposed method in the PySpark environment. The datasets are preprocessed as pipeline processes that combine multiple algorithms in the single process. Other processes are also carried out: Stringlndexer for character variables, OnHotEncoder for the binary values and converting them to vectors as vector assembler. Stringlndexer helps to convert the string values in the dataset into numeric form; for example, gender features consists of male and female. It converts this data into 0 for males and 1 for females. For OnHotEncoder, VectorAssembler is used for preprocessing. All these processes are simultaneous processes that reduce the computation time of the model.


Steps for the proposed methodology.

Entropy Formula

Entropy is an essential approach used to identify the uncertainty of the predicted variables. Here, decision entropy performs the same based on the target value. The measurement of entropy is used to select the best variable for the target class. Different types of entropy are carried out to find the criteria for best feature selection. Following this, for each variable, the entropy formula is the following:

For each variable, the entropy is calculated as the following:

The entropy formula for decision entropy is formulated on the basis of if the entropy_features based on the target variable. If the entropy_features value is greater than 1 then the feature is selected, but if the entropy features based on target variable are less than 1 then the feature value is set to 0 means which the feature is rejected. The decision_entropy is used to identify relevant features based on the target variable.

Figures 4.2-4.5 characterize the different types of entropy used to select the best features for readmission prediction (Marcello and Battiti 2018). These figures highlight each variable that supports the entropy method. Each


Shannon's entropy.


Relative entropy.


Boltzman's entropy.

selected attribute promptly changes from one entropy method to another. The cross-entropy method (Weiss and Dardick 2019) epitomizes the probability distribution of one method to other methods.

In the current work, different works based on entropy are obtained in the proposed work decision entropy is computed to improve the predictive accuracy. The entropy values are calculated based on the target value. If the values




Decision entropy.

of the particular attribute are greater than 1 then that feature is selected for prediction, otherwise the feature is rejected. The feature values are similarly calculated. Figure 4.6 represents decision entropy, which provides a way to identify the optimal features and improves the best features. The best 11 features are selected for the prediction of readmission.

Table 4.1 represents readmission predictions for hospitals using different entropy methods. The proposed method of decision entropy is compared with the existing approach. The accuracy of the proposed approach is high when compared with the existing approach. Shannon's entropy, Boltzmann's entropy, and cross-entropy have a certain similar range of prediction. Additionally, relative entropy and decision entropy have similar ranges, of which overall performance accuracy of the proposed entropy performance is better than the existing approach which is at 92% accuracy.

< Prev   CONTENTS   Source   Next >