Abstract
The increasing availability of healthcare data and advancements in machine learning have created opportunities to improve patient outcomes through predictive analytics. This thesis addresses critical healthcare challenges in intensive care units and hospital settings by developing ML models to predict key clinical outcomes. The research is structured around three studies: early prediction of renal replacement therapy requirement, non-invasive ventilation failure, and hospital readmissions for uncontrolled diabetic patients.This research utilizes two primary datasets: the HiRID high-resolution ICU dataset and the UCI Diabetes dataset. An in-depth analysis of these datasets was conducted to identify relevant features and define problem assumptions. Rigorous data pre- processing, including imputation of missing values, feature selection, and class balancing techniques, was applied to ensure high-quality data for modelling. The datasets were then prepared for machine learning analysis to address the specific predictive tasks.
The first study focuses on predicting Renal replacement therapy requirements within 24 hours of ICU admission using demographic, clinical, and laboratory data from HiRID ICU dataset. Models including Random Forest, Neural Networks, and XGBoost were evaluated with and without dimensionality reduction and feature evaluation analysis. The XGBoost model demonstrated the highest performance, with an AUROC of 0.94 using the top 10 features identified through SHAP analysis. This study highlights the potential of ML to enhance decision-making in resource-intensive interventions by identifying at-risk patients early.
The second study develops a genetic algorithm -optimized ensemble framework to predict early (≤48 hours) and late (>48 hours) NIV failure within the first 2 hours of ICU admission. Using data from the HiRID dataset, the GA-optimized model combined predictions from base classifiers, including RF, Gradient Boosting, and LightGBM. The ensemble model achieved a mean AUROC of 0.71, identifying critical features such as ventilation mode, mean arterial pressure, and peripheral oxygen saturation. This approach demonstrates the value of early predictions and interpretable feature analysis for clinical decision-making in respiratory failure management.
The third study applies six ML algorithms, including RF, SVM, and NB, to predict short- term (within 30 days) and long-term hospital readmissions in uncontrolled diabetic patients using the UCI Diabetes dataset. For short-term readmissions, RF achieved the highest accuracy (86.38%) and an AUROC of 0.63, while SVM showed superior performance in predicting all readmissions (accuracy: 64%, AUROC: 0.65). These findings emphasize the potential of ML to support early intervention and reduce healthcare costs in chronic disease management.
Altogether, these studies highlight the transformative potential of machine learning in enhancing patient outcomes and optimizing resource allocation within hospital and ICU settings. By integrating advanced predictive models into clinical workflows, this research contributes to the growing field of Health Informatics and predictive analytics in healthcare. The findings emphasize the importance of a multidisciplinary approach, encouraging collaboration between data scientists, healthcare professionals, and domain experts to advance ML applications in clinical practice. Future work will focus on external validation, real-time implementation, and expanding feature sets to improve model robustness and generalizability.
Date of Award | 6 May 2025 |
---|---|
Original language | English |
Awarding Institution |
|
Supervisor | Mohamed Bader-El-Den (Supervisor), James McNicholas (Supervisor) & Adrian Hopgood (Supervisor) |