Ayad Hameed Mousa, Elham Mohammed Thabit A. Alsaadi, Mohammed Abdallazez Mohammed, Hussam Mezher Merdas and Shahad Dakhil Khalaf
Adv. Artif. Intell. Mach. Learn., XX (XX):-
1. Ayad Hameed Mousa: University of Kerbala
2. Elham Mohammed Thabit A. Alsaadi: Department of Information Technology, College of Computer Science and Information Technology, University of Kerbala, Karbala, Iraq
3. Mohammed Abdallazez Mohammed: Computer Science Department, College of Computer Science and Information Technology, University of Kerbala, Karbala, Iraq.
4. Hussam Mezher Merdas: Department of Artificial Intelligence Engineering, Faculty of Engineering and Information Technology, Al-Zahraa University for Women, Karbala, Iraq
5. Shahad Dakhil Khalaf: College of Pharmacy, Universitas of Kerbala, Iraq
DOI: 10.54364/AAIML.2026.62296
Article History: Received on: 25-Jan-26, Accepted on: 11-Apr-26, Published on: 18-Apr-26
Corresponding Author: Ayad Hameed Mousa
Email: ayad.h@uokerbala.edu.iq
Citation: Ayad Hameed Mousa, et al. Evaluating the Impact of SMOTE and SHAP on Machine Learning Classifiers: Enhancing Predictive Performance through Imbalance Mitigation and Interpretability. Advances in Artificial Intelligence and Machine Learning. 2026. (Ahead of Print). https://dx.doi.org/10.54364/AAIML.2026.62296
In medical datasets class imbalance is an issue when the number of healthy cases significantly exceeds the number of Thalassemia cases and this hampers machine learning (ML) prediction accuracy. The focus of this paper is on the effectiveness of SMOTE and SHAP being combined in addressing imbalance and subsequently improving model interpretability for Thalassemia diagnosis through ML. The study involved training and testing five different algorithms, namely, SVM, Logistic Regression, Decision Tree, Random Forest, and XGBoost on an imbalanced set of patients. SMOTE was used to oversample the minority class thus balancing the dataset and reducing the predisposition towards the majority class. SHAP executed the pinpointing and dissecting of the most representative diagnostic features thereby the interpretability was improved dramatically. The evaluation revealed that SMOTE has remarkably enhanced the aspect of performance of the minority class; thus, using XGBoost the F1, score witnessed an increment of 60% (0.57 to 0.86) while AUC, ROC saw a rise of 21% (0.74 to 0.95). Altogether, the average F1, score and AUC, ROC were increased by approximately 41% and 21%, respectively. The upshot was SHAP brought out features corresponding to the clinical realm such as the concentration of hemoglobin and the total number of RBCs which is in line with medical knowledge. Physicians checked the SHAP output and confirmed their agreement with the diagnostic procedures. This approach of a combination from two different sides covers the challenge of imbalance in the data while at the same time enhancing interpretability thus augmenting the dependability and transparency of the model when used in the clinical perspective. The availability of our open, source code is an offer for the implementation of widespread thalassemia screening initiatives.