Breast Cancer and Machine Learning: Interactive Breast Cancer Prediction Using Naive Bayes Algorithm
- Introduction to Breast Cancer
- Types of Breast Cancer
- Breast Cancer Symptoms
- Treating Breast Cancer
- Introduction to Machine Learning Algorithm
- Classification of Machine Learning
- Grouping Machine Learning Based on the Results Obtained
- Implementing Machine Learning
- Risk Factors of Breast Cancer
- Risk Factors
- TNM Staging System
- T Category
- N Category
- M Category
Introduction to Breast Cancer
Cancer is all about the changes that happen when human body cells develop abnormally. The human body is the composed of minute blocks called cells. These cells usually build in the body whenever necessary and get expired when they are not necessary. Cancer cells are unnatural cells that get build in the human body. Commonly, in all cancer types these unnatural cells grow unlimitedly and form a lump in the human body called a tumor.
Often the abnormal cells are formed in either lobules or ducts of the breast. Lobules are the glands in the women that produce milk whereas ducts are of the channels that bring milk to the nipple. Fatty tissue or fibrous connective tissues are other places cancer can occur. These uncontrolled cancer cells even travel to the lymph nodes under the arms, from these lymph nodes cancer cells move to other parts of the body. When breast cancer spreads or breast cancer cells move to other parts of the body through the blood vessels or lymph vessel then it is called as metastasis.
Types of Breast Cancer
The classifications of breast cancer are of two types: invasive or noninvasive. The invasive type of cancer transfers to nearby tissues . Noninvasive breast cancers do not transfer away from the milk ducts inside the breast. Cancer that steps in first to the ducts or lobes is termed as ductal carcinoma or lobular carcinoma, respectively.
- • Ductal carcinoma. Majority or most part of breast cancer of this type appear in the cells lining the milk ducts.
- • Ductal carcinoma in situ (DCIS). A kind of disease usually present alone in the duct.
- • Invasive or infiltrating ductal carcinoma. A kind of disease explicitly transferred beyond the duct.
- • Lobular carcinoma. This kind of disease or problem which appears first within the lobules.
- • Lobular carcinoma in situ (LCIS). The condition when abnormal cells are present only in the lobules is called LCIS, which is not considered as cancer. However, there is the problem of stimulating invasive breast cancer in the breast by this LCIS.
- • Invasive lobular carcinoma. When lobular carcinoma expands beyond the duct.
Breast Cancer Symptoms
There are several symptoms for breast cancer; one of the most important is a lump or thick breast tissue.
Some of the common symptoms are as follows:
- • Different feel in the surrounding tissue due to breast lump or thickening.
- • Change in the appearance of a breast as such as size or shape.
- • Dimpling on the breast.
- • Modification occurring in the nipple.
- • Peeling, scaling, crusting, etc. occurring on the skin around the nipple.
- • Other kinds of disease or a change in color of the skin on the breast to red.
Factors that in increase the risk of breast cancer:
- • Older age
- • A family history
- • A previous records of breast cancer
- • Overweight or obesity
- • Excess intake of alcohol
Treating Breast Cancer
Cancers that are diagnosed at an early stage can be treated.
Breast cancer treatment includes the combination of:
- • Surgery
- • Chemotherapy
- • Radiotherapy
Surgery is the first type of treatment performed, followed by chemotherapy or radiotherapy or, in some cases, hormone or biological treatments .
Introduction to Machine Learning Algorithm
One of the subfields in artificial intelligence (AI) is machine learning. The main goal of machine learning is to understand the nature or structure of data and implement the data in the models for analysis to determine the facts for best utility.
Even though machine learning comes under the field of computer science, it is different from traditional systems. In traditional learning, algorithms are set with predefined programs to analyze the data to find the solution. Whereas in machine learning, the system is trained with a set of inputs and special algorithms are implemented to find a set of outputs. Machine learning is an automated process of decision making in various fields (Fig.l 1.1).
In the field of AI the term machine learning is coined as “It gives computers the ability to learn without being explicitly programmed.”
In the field of data analytics, machine learning is used to develop complex models and algorithms used to predict hidden facts by the researcher, data scientists. This process is called predictive analysis. Historical relationships and trends in data are taken into consideration for analysis.
Classification of Machine Learning
Depending on the nature of learning machine learning is classified as:
- • Supervised
- • Unsupervised
- • Reinforcement
- • Semi-supervised
Supervised learning: Learning from the sample data and associated target responses and trying to make predictions for new examples comes under the category of supervised learning. Sample data can be numeric values or string labels.
Unsupervised learning: Learning without sample data and associated target responses and trying to determine the patterns is termed as unsupervised learning. This type tries to analyze the data to determine new features that can be used as input to the supervised learning.
Reinforcement learning: This type uses the concepts of trial and error. The algorithm is accompanied with the examples that pose both positive and negative feedback based on the solution.
Semi-supervised learning: In this type incomplete training data with missing output is given as sample data .
Grouping Machine Learning Based on the Results Obtained
Machine learning systems can also be grouped based on the result obtained after the analysis.
- 1. Classification: This type is usually carried out in a supervised way; the model that is developed by the software programmer learns from the data that is fed into the model. Based on this learning process the model creates the classification for the newly observed data.
- 2. Regression: This type is a kind of supervised problem. In this case, where the outputs are continuous rather than discrete.
- 3. Clustering: This is a type of unsupervised learning approach among machine learning algorithms where the set of input is classified into groups, where the groups are unrevealed earlier.
Implementing Machine Learning
Prediction: Machine learning is used as a prediction system. Machine learning is used in the field of healthcare to predict disease, for example types of breast cancer.
Image recognition: Machine learning can be used to recognize an image. For example, various X-rays needed to have their images recognized to diagnose the problem.
Speech Recognition: At the present time voice recognition has become an important aspect for the purpose of security, searching etc.
Financial industry and trading: Machine learning is a fast-moving technology used in various companies for fraud investigations and credit card security purpose.
Risk Factors of Breast Cancer
In today’s world one of the most common cancers among women is breast cancer. This kind of disease usually occurs in humans when abnormal cells grow abundantly, accumulate at a fast rate, which in turn develops lumps in the human body. The travel of the abnormal cells start from the breast to the lymph node then later to the remaining parts of the body. This travel is considered to be metastasize spread. The reasons for this kind disease could be the daily lifestyle, food habits, atmosphere change factors, etc.
Factors that can be considered to be the reason for high chances of breast cancer can be:
- • Feminine. According to research, women are more commonly affected when compared to men.
- • Age Factor. Increasing lifespan of an individual is one factor.
- • A previous medical history of breast ailments. Breast conditions such as LCIS or atypical hyperplasia of the breast, can be another reason.
- • A chance of occurrence in another breast. Breast cancer is the kind of disease that spreads, so there is a chance of occurrence in the other side, too.
- • Earlier generations of family members who had breast cancer.
Hereditary is considered to be one of the factors for breast cancer.
- • Inheritance of genes from family members considered as risk. Genes that are inherited from parent to child are another risk factor. The most commonly known genes that are inherited are BRCAl and BRCA2. These genes increased chances of breast cancer and other cancers.
- • Radiation treatment. If any such kind of treatments to the chest were undergone at a young age, they can be considered as a risk factor.
- • Diet factor. Obesity increases the chances of disease.
- • Childhood onset of menstruation. If a woman began her periods before the age of 12, she has greater chances of breast cancer.
- • Menopause. Menopause increases the chances of breast cancer.
- • Giving birth to a child at an older age. When a woman delivers a child after the age of 30 she is more at risk for breast cancer.
- • Less chances of pregnancy. Women who have less chances of pregnancy have a greater chances of breast cancer.
- • Postmenopausal hormone therapy treatment. A woman who is under the treatment of hormone therapy with the combination of estrogens and progesterone to overcome the problems of periods increases her risk of developing breast cancer.
- • Alcohol intake. Alcohol intake increases the chances of breast cancer .
TNM Staging System
The most common method the healthcare professionals use to elaborate the condition of breast cancer is the tumor, nodes, metastasis (TNM) system. Specialists elaborate on the outcomes obtained as a result of various examinations made, to analyze the following factors:
- • Tumor (T): The T represents size of the primary tumor and the place where it is located.
- • Node (N): N represents to what extend tumor moved to the lymph nodes.
- • Metastasis (M): M represents to what extent the cancer moved to other parts of the body.
The TNM analysis explains to the cancer patient their condition and how far they are affected. There are five stages: Stage 0 (zero), which is noninvasive DCIS, and stages I through IV (1 through 4), which are used for invasive breast cancer. This procedure of analyses helps the specialist to establish proper planning for the best treatment of the patient.
The T category (TO, Tis, Tl, T2, T3, or T4) represents the size of the tumor cells and determines whether it has been extended to the skin on the breast or whether to the chest wall below the breast. Higher T numbers indicate larger tumor cells such that large spread has occurred to the tissues.
TX: Fundamental tumor cannot be diagnosed.
TO: (T plus zero): There is no symptom of cancer cells.
Tis: Cancer is present inside the ducts or lobules in the breast without beginning its spread to remaining parts of the breast. This case is termed carcinoma in situ.
Tl: Measurement of the tumor size is accounted to be 20 millimeters (mm) or smaller in size. This measurement is considered to be less than an inch. This category is further categorized into four different stages based on the measurement of tumor size:
- • Tlmi is stage where the measurement is 1 mm or smaller.
- • Tla is stage where the measurement is a maximum 5 mm but minimum 1 mm or minor.
- • Tib is a tumor size that is a maximum 10 mm but minimum 5 mm or minor.
- • Tic is a tumor size that is a maximum 20 mm but minimum 10 mm or minor.
T2: In this case tumor size is a minimum 20 mm but not more than 50 mm.
T3: In this case tumor size with a maximum 50 mm.
T4: Additional subdivision of this stage is to be as follows.
- • T4a implies the tumor cell has started its origin in the chest wall.
- • T4b implies the tumor cell has started its origin in the skin.
- • T4c implies the tumor cell has started its origin in both the chest wall and the skin.
- • T4d is kind of group termed inflammatory breast cancer.
The N category (N0, N1, N2, or N3) which implies whether the cancer started its origin in the lymph nodes in the breast and, if that is the case, how many lymph nodes are affected. The higher the number the more lymph node are affected. If none of the lymph nodes were identified, then the stage in the N category is NX.
NX: None of the lymph nodes are infected.
N0: Cases falls into either of two categories:
- • No such disease present in the lymph nodes.
- • Measurement of cell size is smaller than 0.2 mm.
N1: The breast infection has continued to one to three auxiliary lymph nodes and/or else to the internal mammary lymph nodes.
N2: Cells are extended to four to nine auxiliary lymph nodes or cells have extended to the internal mammary lymph nodes.
N3: The infection has extended to ten or more auxiliary lymph nodes or the infection had its origin in the lymph nodes positioned beneath the clavicle, or collarbone. Or it may have its origin in the internal mammary lymph nodes. If this is the case, where the cancer has moved to the lymph nodes on top of the clavicle, is termed as supraclavicular lymph nodes, is also characterized as N3.
The M category in the TNM staging system describes how far the cancer has moved to remaining parts of the body, and hence is called distant metastasis.
MX: Distant spread of the cancer cannot be evaluated.
MO: Cancer has not spread yet.
MO (i+): None of the clinical records like radiographic shows the infection as distant metastases. Microscopic records show that tumor cells are found in the blood, bone marrow, or other lymph nodes whose measurement is not more than 0.2 mm.
Ml: Clinical confirmation of metastasis to remaining part of the body means that abnormal cells developed or spread in other organs of the body .