Committee Chair

Sartipi, Mina

Committee Member

Fell, Nancy; Gao, Lani; Cho, Jin

Department

Dept. of Computer Science and Engineering

College

College of Engineering and Computer Science

Publisher

University of Tennessee at Chattanooga

Place of Publication

Chattanooga (Tenn.)

Abstract

Stroke is one of the leading causes of long-term disability and death in the United States. Stroke patients often face severe health consequences, significantly impacting their lives and placing a substantial financial burden on their families and the wider healthcare system. Therefore, reliable predictions of various patient outcomes, such as early hospital readmission, length of stay (LOS) in the hospital, and risk of mortality, can help patients and healthcare providers in various aspects. Furthermore, successful modeling of such phenomena can help identify the influential factors affecting the patient outcomes, and, by this, improve the quality of care for patients. In this research, we have combined statistical analysis and machine learning (ML) algorithms to enhance the prediction of three patient outcomes — i.e. 30-day readmission, LOS, and mortality — for stroke patients in Tennessee. Since typically such a dataset is imbalanced, due to a small fraction of those events, various ML algorithms, suitable for imbalanced data, such as XGBoost, LightGBM, and CatBoost, were employed in this work. To further improve the performance of the models, various data-level approaches were used to overcome the imbalanced nature of the data. These methods include cluster centroids, NearMiss, and Instant Hardness Threshold. It was shown that such a combination of data modification, especially with under-sampling methods, and suitable ML algorithms can lead to high model performance, measured in terms of Recall and other metrics. Furthermore, based on the features of the data available in our work, using SHAP explainable ML method, the influential factors affecting these outcomes were identified; higher age and mostly the vital signs at the time of admission play an important role in LOS. For 30-day readmission peripheral artery disease, sleep disorders, as well as prescribed medicine such as anticoagulant and antibiotic agents were among the most influential features. For mortality, static patient health conditions were the most influential factors. A simple Graphical User Interface (GUI) was also developed for one of the LOS outcomes, which can be extended to other outcomes, to demonstrate the capability of this work for practical applications.

Acknowledgments

I would like to express my gratitude to my advisor, Dr. Sartipi, for her invaluable mentorship, encouragement, and the financial support throughout my doctoral studies. Her insightful guidance, thoughtful feedback, and commitment to excellence have been instrumental at every stage of this research. Dr. Sartipi has been not only an exceptional mentor but also someone who is always supportive and genuinely invested in the success of the entire research group. Her openness to discuss concerns and her dedication to fostering a collaborative environment have meant a great deal to me. I am also thankful to the members of my dissertation committee: Dr. Nacy Fell, Dr. Lany Gao, and Dr. Jin Cho, for their time, expertise, and constructive feedback, all of which greatly enhanced the quality and depth of this dissertation. I am especially grateful to Dr. Cho, whose patience and guidance helped me get started on this project and navigate the essential steps of conducting academic research. I would also like to acknowledge the financial support provided through the UTC’s employee benefits program, which made it possible for me to continue my research. This support played a vital role in making my doctoral journey possible.

Degree

Ph. D.; A dissertation submitted to the faculty of the University of Tennessee at Chattanooga in partial fulfillment of the requirements of the degree of Doctor of Philosophy.

Date

8-2025

Subject

Cerebrovascular disease--Patients--Tennessee--Data processing; Medical informatics; Medical statistics

Keyword

Machine Learning; Statistical analysis; Healthcare; Stroke; Data analytics

Document Type

Doctoral dissertations

DCMI Type

Text

Extent

xvii, 119 leaves

Language

English

Rights

http://rightsstatements.org/vocab/InC/1.0/

License

http://creativecommons.org/licenses/by/4.0/

Date Available

8-31-2026

Available for download on Monday, August 31, 2026

Share

COinS