Shambhab Chaki, Souhardya Das, Proma Mondal, Pratyusha Rakshit and Archana Chowdhury
Adv. Artif. Intell. Mach. Learn., 4 (3):2648-2664
Shambhab Chaki : Jadavpur University
Souhardya Das : Jadavpur University
Proma Mondal : Jadavpur University
Pratyusha Rakshit : Jadavpur University
Archana Chowdhury : Christian College of Engineering and Technology, Bhilai
DOI: https://dx.doi.org/10.54364/AAIML.2024.43154
Article History: Received on: 16-Jul-24, Accepted on: 21-Sep-24, Published on: 28-Sep-24
Corresponding Author: Shambhab Chaki
Email: shambhabc@gmail.com
Citation: Souhardya Das, Proma Mondal, Shambhab Chaki, Pratyusha Rakshit, Archana Chowdhury. (2024). EHR Innovations: Shedding Light on Anemia in the Healthcare Paradigm. Adv. Artif. Intell. Mach. Learn., 4 (3 ):2648-2664
This
study introduces a novel approach to Electronic Health Record (EHR) analysis,
extending the use of phenotyping with machine learning (ML) models to enhance
the recognition and treatment of anemia. It first examines the healthcare
scenario in India and suggests potential improvements through data-driven
personalized care. Using the MIMIC-III dataset, the research involves extensive
data preprocessing and analysis to uncover key insights into anemia's
prevalence, gender distribution, comorbidities, and Intensive Care Unit (ICU)
stays. Partitioning clustering algorithms like K-Means, K-medoids, Fuzzy
C-means, and hierarchical clustering algorithms such as Agglomerative
Clustering, DIANA, and HDBSCAN are used to identify groups of patients with
similar medical profiles. The distance metrics employed are Levenshtein and
Euclidean distances combined with TF-IDF Vectorization. The effectiveness of
these algorithms is evaluated based on Length of Stay (LoS) estimation, a
critical parameter in EHR studies. To predict a new patient's LoS, the patient
is at first classified into an existing cluster, which shows the highest
support to the patient’s clinical activities. A decision tree regressor is then
trained using data from the selected cluster to predict the new patient's LoS,
significantly improving predictive accuracy and reliability. Notably, the
HDBSCAN algorithm, applied to the Tf-Idf Vectorizer object, achieves a 50.82%
reduction in Root Mean Squared Error (RMSE) compared to baseline model. The
novelty of this study lies in proposing an efficient approach for EHR analysis,
specifically for predicting ICU patients' LoS, and identifying the most
effective clustering algorithm to improve healthcare delivery for anemic
patients in healthcare scenario of India.