Advanced Ensemble Machine Learning Model for Balanced BioAssays

Table of Contents:

Introduction

ML is a technique by which a model can learn from the training dataset. Machine Learning is one of the common forms of Artificial Intelligence (AI); According to the report by the global research and advisory firm, Gartner. Artificial Intelligence is expected to generate jobs close to 2.3 million by the year 2020 [1]. Machine Learning is a statistical approach to handle the dataset and to apply different algorithms to predict the results for the testing dataset. Machine learning in the medical field has recently achieved greater heights. Recently, Google has worked on a Machine Learning algorithm to identify cancer and tumor patients [2]. Many other universities and organizations are using the Deep Learning algorithm to classify skin cancer [3, 4].

The report International Research View reported in 2018 that the global medical discovery market was estimated at nearly $0.7 billion in 2016 nearly and was expected to progress at the Compound Annual Growth Rate of 12.6% in this decade. Artificial Intelligence and Machine Learning (ML) is encouraging many researchers to find inexpensive solutions in numerous fields such as medical discovery and the like. By the end of this decade. Artificial Intelligence will have the capability to save $0.07 billion in the medical field. In this Research paper, there is a work with the Biopsy Dataset consisting of 144 attributes based on which will be applied on different classifiers, and then after choosing the Best 10 out of all the classifiers based on their accuracy, we will ensemble the classifiers using Stacking and Voting to attain more accuracy. We'll be using WEKA to create the ensemble model [5, 6].

The biopsy could be a diagnostic assay which will be a procedure to obviate a bit of tissue or a sample of cells from the body to be analyzed during a laboratory. If an individual is experiencing signs and symptoms, or if the doctor has discovered an area of concern, a diagnostic assay may be ordered to figure out whether or not cancer is present or it is due to another condition [7]. While imaging tests, like X- rays, square measure useful in detective work lots or areas of abnormality, they alone cannot differentiate cancerous cells from noncancerous cells. For the bulk of cancers, the sole thanks to building a definitive designation are to perform a diagnostic test to gather cells for closer examination.

Every year, pathologists diagnose 14 million new patients with cancer around the world. These millions of individuals will face years of uncertainty. Pathologists have been diagnosing cancer diagnoses and prognoses for many years. Most pathologists have a 96-98% success rate for the designation of cancer. On their part, they are highly capable. The problem comes with the other part. The prognosis is that part of a biopsy that comes when cancer has been diagnosed; it predicts the course of the illness. It is time for a number of successive steps to be taken in pathology [8]. ML has key advantages over pathologists. Firstly, machines will work a lot quicker than humans. A Biopsy test typically takes a diagnostician ten days .whereas a computer will do thousands of biopsies during a matter of seconds.

Machines will do one thing that humans aren’t that good at. They'll repeat themselves thousands of times without becoming exhausted. After each iteration, the machine repeats the method to try and do it higher. Humans can work hard to make the result perfect but they still cannot match the computational power of the computer.

Related Work

In this research paper firstly the supervised dataset of biopsy is taken from the Internet and the class is balanced using the Class Balancer in WEKA. There are weka filters supervised; for instance Class Balancer, a straightforward filter that gives instance weights that every category of instances can have identical weight and also the total add of instance weights within the dataset remains the same [9].

When weka classifier meta Filtered Classifier is used with this filter and also the base classifier doesn't execute Weighted Instances Handler then the weights can once more be wont to kind a likelihood distribution for sampling with replacement. This can yield a training set wherever each class is (approximately) balanced. And after applying the Class Balancer, different classifiers are applied to the dataset and then the author prepared the graphs for their Accuracy, True Positive Rate, False Positive Rate, and ROC [10].

The Classifiers that are used in work are Random Forest, Random Tree, REPTree, Decision Stump, Decision Table, PART, ZeroR, JRip, Input Mapped Classifier, AdaBoost Ml, Bagging, LogitBoost, Random Subspace, Multi Class Classifier Updateable, Randomizable Filtered Classifier, Ibk, LWL, Logistics, Multilayer Perceptron, SGDText, Simple Logistics, SMO, Bayes Net, Naive Bayes, Naive Bayes, Multinomial Text, Naive Bayes Updateable, Filtered Classifier, Multi Class Classifier, Random Committee, J48. The best 10 classifiers were chosen based on their Accuracy, True Positive Rate, False Positive Rate, and ROC. Table 1.1 below shows the performance of the chosen classifiers. By implementing this proposed advanced model using ensembling, those samples which were having meager outcome will now have high balance accuracy [11].

 
Source
< Prev   CONTENTS   Source   Next >