Experimental Result Analysis

Dataset Considered for This Experiment

For this experimental analysis, we considered five gene expression datasets that are easily available from different repositories (Table 7.1). The course of dimensionality is the issue with the adopted datasets as the number of gene values is more than the number of observations.

Normalization

As datasets are drawn from different repositories there may be a chance that the data value may vary. To solve this issue we normalized the data using min-max normalization. With the help of the normalization, feature values are scaled within a particular range. After normalization there is less chance that the diversity features value misleads the objective function. Ten-fold cross-validation is always acceptable. Here we have divided the whole dataset into training and testing.

TABLE 7.1 Dataset Details

Database

No. of Genes

No. of Classes

Samples in Class 1

Samples in Class 2

Leukemia

7129

2

27

11

ALL/AML

7129

2

29

15

CNS

7129

2

39

21

ADCA Lung

12,533

2

15

134

Prostate

12.600

52

50

102

Details of Classifiers Used in This Experimental Study and Evaluation Metrics

In this experimental study, we used three different classifiers—SVM, ANN. and К-nearest neighbors to identify the feature data subsets. The performance evaluation of the different classifiers are compared in terms of the following criteria:

  • • True positive (True-pos);
  • • False-positive (Fal-pos);
  • • True negative (True-neg);
  • • False-negative (Fal-neg).

From these criteria we calculated the different performance evaluations of accuracy, sensitivity, and specificity.

Result Analysis

In this section, the performance of the proposed BSFLA-PSO is shown with different microarray datasets like Prostate, ALL/AML, Leukemia, ADAC Lung, and CNS. Performance evaluation of the proposed approach is tested with the KNN classifier. Later we compared the approach with basic BSFLA, Basic PSO. DE, SFLA-PSO, wPSO, and SFLA, as shown in Tables 7.2, 7.4, 7.6, 7.8, and 7.10. Figures 7.2, 7.5, 7.7, 7.11, and 7.14 present the output of three different classifiers (KNN, ANN, SVM) using the three performance metrics of accuracy, sensitivity, and specificity. Figures 7.4, 7.7, 7.10, 7.11, 7.13, and 7.16 present the error rate with different classifiers and with different datasets.

Performance of Proposed BSFLA-PSO with Prostate Dataset

Table 7.2 presents the performance of BSFLA-PSO with the prostate dataset. The result proves that taking 170 numbers of the dataset is optimal. Table 7.3 presents the performance analysis of the proposed approach with basic BSFLA. Basic PSO, DE, SFLA-PSO. wPSO, and SFLA which proves that BSFLA-PSO is the optimal and better solution.

TABLE 7.2

Classification Performance of BSFLA-PSO with the Prostate Dataset

No. of Genes

Accuracy

Sensitivity

Specificity

10

96.76

100

97.12

20

95.43

99.87

97.45

30

96.34

99.74

95.43

40

97.41

99.61

96.34

50

96.89

99.48

97.25

60

95.77

99.35

98.16

70

96.54

100

99.07

80

96.91

99.09

99.98

90

96.34

98.96

96.34

100

96.45

98.83

96.45

110

94.67

98.7

94.67

120

98.63

100

98.63

130

86.56

98.44

86.56

140

96.12

98.31

96.12

150

98.41

98.18

98.41

160

97.34

98.05

97.34

170

100

97.92

96.05

180

98.67

97.79

96.16

190

99.42

97.66

96.27

200

97.43

100

96.38

210

97.88

97.4

96.48

220

96.49

97.27

96.59

230

96.78

97.14

96.70

240

98.03

97.01

96.81

250

99.12

98.11

96.92

Feature

Selection

Approach

KNN

ANN

SVM

Accuracy

Sensitivity

Specificity

Accuracy

Sensitivity

Specificity

Accuracy

Sensitivity

Specificity

BSFLA-PSO

100

100

100

100

100

100

100

100

100

BSFLA

100

100

100

100

100

100

100

100

100

Basic PSO

100

100

100

100

100

100

100

100

100

DE

100

100

100

99.67

100

99.56

100

100

100

SFLA-PSO

100

100

100

100

100

100

100

100

99.68

wPSO

96.94

100

100

100

97.57

100

100

100

100

SFLA

100

100

100

100

100

100

100

100

100

Performance comparison with different classifiers with prostate dataset

FIGURE 7.3 Performance comparison with different classifiers with prostate dataset.

Error rate with prostate dataset

FIGURE 7.4 Error rate with prostate dataset.

Performance of Proposed BSFLA-PSO with Leukemia Dataset

Table 7.4 presents the performance of BSFLA-PSO with the leukemia dataset. The result proves that taking 130 numbers of the dataset is optimal. Table 7.5 presents the performance analysis of the proposed approach with basic BSFLA, Basic PSO. DE, SFLA-PSO, wPSO. and SFLA, which proves that BSFLA-PSO is the optimal and better solution.

TABLE 7.4

Classification Performance of BSFLA-PSO with Leukemia Dataset

No. of Genes

Accuracy

Sensitivity

Specificity

I0

95.78

94.92

100.00

20

94.89

97.56

100.00

30

97.45

98.14

99.45

40

96.23

98.57

100.00

50

97.07

96.89

99.67

60

97.46

96.78

99.37

70

97.85

99.45

100.00

80

98.24

98.22

100.00

90

98.63

98.30

99.89

100

99.02

98.38

100.00

110

99.41

98.46

99.89

120

99.80

98.55

99.90

130

100.00

98.63

99.91

140

96.02

98.71

99.92

150

97.56

98.80

99.93

160

99.10

98.88

99.94

170

93.45

98.96

99.96

180

95.23

99.05

99.97

190

98.45

99.13

99.98

200

98.23

99.21

99.99

210

98.67

99.29

100.00

220

97.12

99.38

100.00

230

96.90

99.46

100.00

240

96.34

99.54

100.00

250

95.79

99.63

100.00

Feature

Selection

Approach

KNN

ANN

SVM

Accuracy

Sensitivity

Specificity

Accuracy

Sensitivity

Specificity

Accuracy

Sensitivity

Specificity

BSFLA-PSO

100

100

100

100

100

100

100

100

100

BSFLA

100

100

99.33

99.57

100

100

97.58

96.79

100

Basic PSO

96.94

100

99.67

100

99.79

100

98.65

99.67

97.23

DE

96.95

100

99.88

99.46

100

100

97.89

98.57

97.68

SFLA-PSO

97.67

99.45

98.57

99.68

99.79

99.79

98.67

97.78

98.73

wPSO

95.24

98.34

98.57

97.89

97.89

98.77

95.99

95.89

98.67

SFLA

97.89

97.34

99.33

98.67

99.45

97.95

95.63

98.41

97.93

Performance comparison with different classifiers with leukemia dataset

FIGURE 7.6 Performance comparison with different classifiers with leukemia dataset.

Error rate with prostate dataset

FIGURE 7.7 Error rate with prostate dataset.

Performance of Proposed BSFLA-PSO with ALL/AML Dataset

Table 7.6 presents the performance of BSFLA-PSO with the ALL/AML Dataset. The result generated proves that taking ten numbers of the datasets is optimal. Table 7.7 presents the performance analysis of the proposed approach with basic BSFLA, Basic PSO, DE, SFLA-PSO, wPSO, and SFLA, which proves that BSFLA-PSO is the optimal and better solution.

TABLE 7.6

Classification Performance of BSFLA-PSO with ALL/AML Dataset

No. of Genes

Accuracy

Sensitivity

Specificity

10

99.05

99.78

98.46

20

98.34

97.23

99.67

30

99.9

98.77

99.37

40

97.66

99.44

100.00

50

99.45

100

100.00

60

98.22

97.37

99.89

70

98.30

97.75

100.00

80

98.38

96.45

99.89

90

98.46

97.39

99.90

100

98.55

97.56

99.91

110

98.63

94.56

99.92

120

98.71

97.45

99.93

130

98.80

98.67

99.94

140

98.88

98.56

99.96

150

98.96

97.9

99.99

160

98.22

97.37

99.89

170

96.91

99.54

98.55

180

96.34

98.61

98.63

190

96.45

99.74

98.79

200

94.67

99.64

98.85

210

98.63

97.78

98.84

220

86.56

99.56

99.83

230

96.12

97.34

97.89

240

98.41

98.66

97.99

250

97.34

95.56

98.09

Feature

Selection

Approach

KNN

ANN

SVM

Accuracy

Sensitivity

Specificity

Accuracy

Sensitivity

Specificity

Accuracy

Sensitivity

Specificity

BSFLA-PSO

100

100

100

100

100

100

100

100

100

BSFLA

100

100

100

100

100

100

100

100

100

Basic PSO

97.67

99.57

98.68

98.56

100

97.37

95.54

99.67

99.14

DE

98.56

97.57

97.26

96.56

96.68

98.56

100

100

100

SFLA-PSO

99.35

98.67

98.67

97.57

97.56

99.36

98.45

99.67

99.7

wPSO

98.14

99.78

98.66

96.89

96.92

97.78

97.78

99.45

99.45

SFLA

99.77

99.45

97.67

97.79

97.98

98.35

97.46

99.78

99.48

Performance comparison with different classifiers with ALL/AML Dataset

FIGURE 7.9 Performance comparison with different classifiers with ALL/AML Dataset.

Error rate different classifiers with ALL/AML dataset

FIGURE 7.10 Error rate different classifiers with ALL/AML dataset.

Performance of Proposed BSFLA-PSO with ADCA Lung Dataset

Table 7.8 presents the performance of BSFLA-PSO with the ADCA lung dataset. The result proves that taking 100 numbers of the dataset is optimal. Table 7.9 presents the performance analysis of the proposed approach with basic BSFLA. Basic PSO, DE, SFLA-PSO, wPSO, and SFLA, which proves that BSFLA-PSO is the optimal and better solution.

TABLE 7.8

Classification Performance of BSFLA-PSO with ADCA Lung Dataset

No. of Genes

Accuracy

Sensitivity

Specificity

10

97.89

97.27

96.93

20

97.66

97.45

97.29

30

98.45

96.92

96.34

40

98.56

96.86

97.95

50

98.84

96.99

98.26

60

99.12

97.41

98.48

70

99.4

96.45

97.23

80

99.68

94.67

96.49

90

97.32

98.63

97.78

100

99.95

98.99

99.56

110

98.79

96.12

98.45

120

97.56

98.41

98.55

130

98.68

97.88

97.57

140

97.23

95.89

98.67

150

98.68

98.56

96.89

160

95.87

97.23

97.54

170

98.88

97.77

98.14

180

97.67

86.56

98.57

190

97.88

97.12

98.99

200

97.32

96.7

96.78

210

97.76

97.43

98.44

220

97.99

96.64

97.53

230

98.37

98.11

97.14

240

99.67

96.99

98.33

250

98.69

96.33

97.11

Feature

Selection

Approach

KNN

ANN

SVM

Accuracy

Sensitivity

Specificity

Accuracy

Sensitivity

Specificity

Accuracy

Sensitivity

Specificity

BSFLA-PSO

100

100

100

100

100

99.66

100

100

100

BSFLA

97.66

99.67

99.33

99.57

100

100

97.58

96.79

100

Basic PSO

99.78

99.23

99.67

100

99.79

100

98.65

99.67

97.23

DE

98.67

98.77

99.88

99.46

100

100

97.89

98.57

97.68

SFLA-PSO

98.94

99.24

99.57

98.68

99.79

99.79

98.67

97.78

98.73

wPSO

99.78

97.02

98.57

97.89

97.89

98.77

95.99

95.89

98.67

SFLA

99.88

99.67

99.33

98.67

99.45

97.95

95.63

98.41

97.93

Performance comparison with different classifiers with ADCA lung dataset

FIGURE 7.12 Performance comparison with different classifiers with ADCA lung dataset.

Error rate for ADCA lung dataset

FIGURE 7.13 Error rate for ADCA lung dataset.

Performance of Proposed BSFLA-PSO with CNS Dataset

Table 7.10 presents the performance of BSFLA-PSO with the CNS dataset. The result shows that taking 90 numbers of the dataset is optimal. Table 7.11 presents the performance analysis of the proposed approach with basic BSFLA. Basic PSO, DE, SFLA-PSO. wPSO, and SFLA, which proves that BSFLA-PSO is the optimal and better solution.

TABLE 7.10

Classification Performance of BSFLA-PSO with CNS Dataset

No. of Genes

Accuracy

Sensitivity

Specificity

Ю

83.21

87.68

84.56

20

82.12

85.99

87.54

30

86.41

87.45

85.89

40

81.42

86.49

86.98

50

83.11

85.78

87.01

60

83.79

85.99

83.43

70

84.59

78.89

87.12

80

82.89

77.81

86.23

90

88.98

78.99

82.67

100

81.78

76.84

81.45

110

83.56

82.22

81.34

120

82.67

84.12

82.12

130

81.45

83.16

86.41

140

81.34

83.56

81.42

150

86.78

84.89

84.56

160

86.49

85.84

87.97

170

84.78

84.56

85.99

180

85.89

85.79

78.89

190

86.98

84.34

77.81

200

87.01

81.78

78.99

210

83.43

83.65

76.84

220

87.12

81.98

83.16

230

86.23

77.82

83.56

240

81.69

79.61

84.89

250

86.49

81.55

85.84

TABLE 7.11

Comparison of BSFLA-PSO with Basic BSFLA, Basic PSO, DE, SFLA-PSO, wPSO, and SFLA with CNS Dataset

Feature

Selection

Approach

KNN

ANN

SVM

Accuracy

Sensitivity

Specificity

Accuracy

Sensitivity

Specificity

Accuracy

Sensitivity

Specificity

BSFLA-PSO

91.89

87.19

89.11

90.45

84.53

91.67

89.67

85.11

89.56

BSFLA

88.89

85.33

87.11

88.33

81.34

87.46

87.78

78.82

83.79

Basic PSO

87.82

82.32

83.51

86.11

83.39

86.11

87.71

78.72

83.35

DE

86.89

77.82

81.83

86.13

83.75

86.13

86.13

83.79

86.97

SFLA-PSO

88.57

85.91

78.67

85.11

82.67

86.01

87.76

83.88

87.11

wPSO

84.95

81.41

81.89

78.31

80.46

85.58

87.67

82.93

78.94

SFLA

85.91

79.45

79.91

78.56

81.66

85.78

79.67

83.67

81.22

Performance comparison with different classifiers with CNS dataset

FIGURE 7.15 Performance comparison with different classifiers with CNS dataset.

Comparison graph of error rate with different classifiers with CNS dataset

FIGURE 7.16 Comparison graph of error rate with different classifiers with CNS dataset.

Conclusion

In this chapter, BSFLA-PSO is a novel hybrid algorithm based on a combination of a binary shuffled frog-leaping algorithm and PSO. In the primary stage, the best 250 genes were identified and after that BSFLA-PSO was implemented; classification was done with KNN. If we deeply analyze the performance, we find that BSFLA- PSO provides outstanding performance with other different learning approaches, namely BSFLA. Basic PSO, DE, SFLA-PSO, wPSO, and SFLA. In the prostate dataset, feature selection approaches such as BSFLA-PSO, BSFLA, Basic PSO, and

SLFA perform with 100% accuracy. DE performs 100% with KNN. and SVM performs 99.67% with ANN. In the case of the leukemia dataset, BSFLA-PSO performs at 100% with all classifiers, whereas BSFLA performs at 100% with KNN only. Other approaches perform much less well. With ALL/AML and BSFLA-PSO, BSFLA performs at 100% with all classifiers, whereas only DE performs at 100% with SVM. In the ADCA lung dataset, BSFLA-PSO performs at 100% with all classifiers, but with the CNS dataset the performance of BSFLA-PSO is better than all other feature selection approaches and achieves an accuracy of 91.89%, 90.45%, and 89.67% with KNN, ANN, and SVM, respectively. The datasets considered for this experiment were binary classes in nature, so in the future we will extend this work to multiclass high-dimensional classification problems.

References

  • 1. Kira, K.. & Rendell, L. A. (1992). A practical approach to feature selection. In Machine learning proceedings 1992 (pp. 249-256). New York: Morgan Kaufmann.
  • 2. Sahu. B. (2018). A combo feature selection method (filter + wrapper) for microarray gene classification. International Journal of Pure and Applied Mathematics, 118(16), 389-401.
  • 3. Sahu. B.. Mohanty, S. N.. & Rout, S. K. (2019). A hybrid approach for breast cancer classification and diagnosis. Endorsed Transactions on Scalable Information Systems, 6(20), e2.
  • 4. Sahu. B.. Mohanty, S. N.. & Rout. S. K. (2019). Ensemble comparative study for diagnosis of breast cancer datasets. International Journal of Engineering & Technology, 7(4), 15.
  • 5. Deb, K.. Pratap. A., Agarwal. S., & Meyarivan. T. (2002). A fast and elitist multi-objective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2), 182-197.
  • 6. Hasan. H. B.. & Kurnaz, S. (2019). Classification on breast cancer using genetic algorithm trained neural network. IJCSMC, 8(3), 223-229.
  • 7. Boucheham. A., &Batouche. M. (2019). Hybrid wrapper/filter gene selection using an ensemble of classifiers and PSO algorithm. In Biotechnology: Concepts, methodologies, tools, and applications (pp. 525-541). Hershey. PA: IGI Global.
  • 8. Elhoseny, M.. Bian. G. B., Lakshmanaprabu. S. K.. Shankar, K., Singh. A. K., & Wu, W. (2019). Effective features to classify ovarian cancer data on the internet of medical things. Computer Networks, 159, 147-156.
  • 9. Sharma, R., & Kumar. R. (2019). A novel approach for the classification of leukemia using artificial bee colony optimization technique and back-propagation neural networks. In Proceedings of 2nd international conference on communication, computing and networking (pp. 685-694). Singapore: Springer.
  • 10. Zainuddin, S., Nhita. F.. &Wisesty, U. N. (2019. March). Classification of gene expressions of lung cancer and colon tumor using Adaptive-Network-Based Fuzzy Inference System (ANFIS) with Ant Colony Optimization (ACO) as the feature selection. Journal of Physics: Conference Series. 1192(1), 012019.
  • 11. Bhattacharjee. К. K., &Sarmah, S. P. (2014). Shuffled frog leaping algorithm and its application to 0/1 knapsack problem. Applied Soft Computing, 19, 252-263.
  • 12. Amirian. H., & Sahraeian, R. (2017). Solving a grey project selection scheduling using a simulated shuffled frog leaping algorithm. Computers & Industrial Engineering, 107. 141-149.
  • 13. Zhang, Y.. Gong. D.. Hu. Y., and Zhang. W. (2015). Feature selection algorithm based on bare-bones particle swarm optimization. Neurocomputing. 148, 150-157.
  • 14. Kabir. M. M., Shahjahan, M.. & Murase, K. (2012). A new hybrid ant colony optimization algorithm for feature selection. Expert Systems with Applications, 39, 3747-3763.
  • 15. Huang, C. L.. & Dun. J. F. (2008). A distributed PSO-SVM hybrid systemwith feature selection and parameter optimization. Applied Soft Computing, 8. 1381-1391.
  • 16. Dash, R.. Dash, R., & Rautray. R. (2019). An evolutionary framework based microarray gene selection and classification approach using binary shuffled frog leaping algorithm. Journal of King Saud University—Computer and Information Sciences.
  • 17. Shahbeig, S.. Rahideh, A., Helfroush. M. S., & Kazemi. K. (2018). An efficient search algorithm for biomarker selection from RNA-seq prostate cancer data. Journal of Intelligent & Fuzzy Systems, 35(1), 1-10.
  • 18. Sharma. T. K.. & Pant. M. (2018). Opposition-based learning embedded shuffled frog-leaping algorithm. In Soft computing: Theories and applications (pp. 853-861). Singapore: Springer.
  • 19. Sahu, B. (2019). Multi-tier hybrid feature selection by combining filter and wrapper for subset feature selection in cancer classification. Indian Journal of Science and Technology, 12(3), 1-11. doi: 10.17485/ijst/2019/v 12i3/141010
  • 20. Kennedy. J.. & Eberhart. R. (1995). Particle swarm optimization. In Proceedings of ICNN'95-International Conference on Neural Networks (Vol. 4, pp. 1942-1948). IEEE.
 
Source
< Prev   CONTENTS   Source   Next >