Ines Rahmany
Adv. Artif. Intell. Mach. Learn., 2 (4):516-532
Ines Rahmany : Faculty of Science and Techniques of Sidi Bouzid, University of Kairouan
DOI: 10.54364/AAIML.2022.1135
Article History: Received on: 19-Oct-22, Accepted on: 01-Dec-22, Published on: 10-Dec-22
Corresponding Author: Ines Rahmany
Email: ines.rahmani@fstsbz.u-kairouan.tn
Citation: Ines Rahmany (2022). Missing Data Recovery in the e-health context based on Machine Learning models. Adv. Artif. Intell. Mach. Learn., 2 (4 ):516-532
Diabetes mellitus is a set of metabolic illnesses characterized by abnormally high blood sugar levels. In 2017, 8.8% of the world’s population
had diabetes. By 2045, it is expected that this percentage will have
risen to approximately 10%. Missing data, a prevalent problem even
in a well-designed and controlled study, can have a major impact on
the conclusions that can be derived from the available data. Missing
data may decrease a study’s statistical validity and lead to erroneous
results due to distorted estimations. In this study, we hypothesize
that (a) replacing missing values using machine learning techniques
rather than the mean value and group mean value and (b) using
SVM kernel RBF classifier will result in the highest level of accuracy in comparison to traditional techniques such as DT, RF, NB,
SVM, AdaBoost, and ANN. The classification results improved significantly when using regression to replace the missing values over the group median or the mean. This is a 10% improvement over
previously developed strategies that have been reported in the literature.