Sarfaraz Ahmed Mohammed and Anca Ralescu
Adv. Artif. Intell. Mach. Learn., 3 (3):1494-1525
Sarfaraz Ahmed Mohammed : College of Engineering and Applied Science
Anca Ralescu : Department of Computer Science, College of Engineering and Applied Science
DOI: 10.54364/AAIML.2023.1187
Article History: Received on: 22-Jul-23, Accepted on: 23-Sep-23, Published on: 30-Sep-23
Corresponding Author: Sarfaraz Ahmed Mohammed
Email: mohammsm@mail.uc.edu
Citation: Sarfaraz Ahmed Mohammed, Senuka Abeysinghe, Anca Ralescu (2023). Feature Selection and Comparative Analysis of Breast Cancer Prediction using Clinical Data and Histopathological Whole Slide Images. Adv. Artif. Intell. Mach. Learn., 3 (3 ):1494-1525
Breast Carcinoma
is a common cancer among women, with invasive ductal carcinoma and lobular
carcinoma being the two most frequent types. Early detection is critical to
prevent cancer from becoming malignant. Diagnostic tests include mammogram,
ultrasound, MRI, or biopsy. Machine Learning algorithms can play a key role in
analyzing complex clinical datasets to predict disease outcomes. This study
uses machine learning and deep learning techniques to analyze publicly
available clinical and medical image data. For clinical data, Principal
Component Analysis (PCA) and Particle Swarm Optimization (PSO) are applied on
the Wisconsin Breast Cancer dataset (WDBC) for feature selection and evaluate
the performance of each modality in distinguishing between benign and malignant
tumors. The results obtained show that the Random Forest (RF) classifier
outperforms other classification algorithms using both PSO and PCA feature
selections, achieving predictive accuracies of 95.7% and 97.2% respectively.
The first part of the paper contains a comprehensive analysis of the two
feature selection methods on clinical data to optimize predictive performance. The
second part of the paper is concerned with image data. Although
Histopathological Whole Slide Imaging (WSI) has been validated for a variety of
pathological applications for over two decades of manual detection of cancerous
tumors, it remains challenging and prone to human error. With the potential of
deep learning models to aid pathologists in detecting cancer subtypes, and the
increasing predictive ability of current image analysis techniques in
identifying the underlying genomic data and cancer-causing mutations, the
second half of the paper focusses on feature extraction using a deep
convolutional neural network (U-Net) trained on WSI’s from The Cancer Genome
Atlas (TCGA) to accurately classify and extract relevant features. The focus is
on feature extraction, nuclei-based instance segmentation, H&E-stained
image extraction, and quantifying intensity information for a given WSI to
classify the disease type. A comprehensive analysis of feature selection
methods is presented for both clinical and medical image data.