Arshmeet Kaur
Adv. Artif. Intell. Mach. Learn., 4 (1):2091-2102
Arshmeet Kaur : Evergreen Valley College, Student, Transferring to Bioengineering
DOI: https://dx.doi.org/10.54364/AAIML.2024.41119
Article History: Received on: 10-Jan-24, Accepted on: 15-Mar-24, Published on: 22-Mar-24
Corresponding Author: Arshmeet Kaur
Email: Arka7783@stu.evc.edu
Citation: Arshmeet Kaur and Morteza Sarmadi (2024). Predicting Loss-of-Function Impact of Genetic Mutations: A Machine Learning Approach. Adv. Artif. Intell. Mach. Learn., 4 (1 ):2091-2102
The innovation of next-generation sequencing (NGS) techniques has significantly reduced
the price of genome sequencing, lowering barriers to future medical research; it is now feasible
to apply genome sequencing to studies where it would have previously been cost-inefficient.
Identifying damaging or pathogenic mutations in vast amounts of complex, high-dimensional
genome sequencing data may be of particular interest for researchers. Thus, this paper’s aims
were to train machine learning models on the attributes of a genetic mutation to predict LoFtool
scores (which measure a gene’s intolerance to loss-of-function mutations). These attributes included, but were not limited to, the position of a mutation on a chromosome, changes in amino
acids, and changes in codons caused by the mutation. Models were built using the univariate
feature selection technique f-regression combined with K-nearest neighbors (KNN), Support
Vector Machine (SVM), Random Sample Consensus (RANSAC), Decision Trees, Random Forest, and Extreme Gradient Boosting (XGBoost). These models were evaluated using five-fold
cross-validated averages of r-squared, mean squared error, root mean squared error, mean absolute error, and explained variance. The findings of this study include the training of multiple
models with testing set r-squared values of 0.97.