Committee Chair

Liang, Yu

Committee Member

Wu, Dalei; Nasab, Ahad; Heath, Gregory

Department

Dept. of Computer Science and Engineering

College

College of Engineering and Computer Science

Publisher

University of Tennessee at Chattanooga

Place of Publication

Chattanooga (Tenn.)

Abstract

Multimodal generative models are reshaping digital therapeutics by enabling real-time synthesis of personalized content aligned with a user’s physiological and affective state. However, existing systems remain fragmented across modalities and often lack a unified framework that can jointly represent biosignals, natural language intent, music, and video under long-context constraints. This dissertation presents an end-to-end multimodal large language model ecosystem for personalized therapeutic music generation that combines discrete tokenization, evidence-grounded reasoning, and stabilized preference alignment. At the representation layer, the dissertation develops a family of tokenizers that convert continuous biomedical and media signals into compact discrete sequences for transformer-based modeling. Harmonizer provides high-fidelity music tokenization for stable long-horizon conditioning and controllable generation. EEG-Harmonizer introduces neural tokenization with a Token-Transformer and Electrode-Aware Importance mechanism, achieving 99.97% classification accuracy while maintaining strong performance with only 50% of electrodes. Biomedical-Harmonizer extends the same encoder--quantizer--decoder principles to multi-lead ECG, producing morphology-preserving token sequences for physiological conditioning and safety-aware personalization. Video-Harmonizer, a quick-learning tokenizer framework that reduces learning time by 83.3%, extends the ecosystem to ultra-high-resolution visual data, and supports future EEG- and ECG-conditioned audiovisual therapy generation. At the reasoning layer, Qmusic-MLLM unifies text, EEG, ECG, and music token spaces within a single generative framework. Patient or session context is grounded through retrieval-augmented generation, and a chain-of-thought planner produces concise therapeutic conditioning plans before long-horizon music-token generation. To support adaptive personalization without expensive token-level reinforcement learning, the system also introduces a contextual-bandit prompt-bank mechanism that selects interpretable prompts using EEG-grounded reward signals. To improve reliability under preference learning, this dissertation further proposes Hallucination-Suppressed Preference Optimization, a reference-anchored alignment method that constrains updates relative to a frozen pretrained model while improving preference adherence. Together, these contributions establish a scalable foundation for EEG- and ECG-conditioned therapeutic generation by combining high-fidelity tokenization, grounded reasoning, adaptive personalization, and stabilized multimodal preference optimization within the Qmusic-MLLM ecosystem.

Acknowledgments

I would like to express my deepest appreciation to my advisors, Dr. Yu Liang and Dr. Dalei Wu, for their guidance, support, and mentorship throughout my doctoral studies. Their expertise and commitment to excellence have profoundly shaped this work and my growth as a researcher. I am also grateful to the members of my dissertation committee, Dr. Ahad Nasab and Dr. Gregory Heath, for their constructive feedback, insightful discussions, and encouragement. Additionally, I thank UTC MDRB for providing the supercomputing resources and large-scale storage that supported this work. Finally, I acknowledge the financial support from NIH AIM-AHEAD (grant number OT2OD032581) and NSF (grant number 192847), which made this work possible.

Degree

Ph. D.; A dissertation submitted to the faculty of the University of Tennessee at Chattanooga in partial fulfillment of the requirements of the degree of Doctor of Philosophy.

Date

5-2026

Subject

Artificial intelligence; Combined modality therapy

Keyword

Multimodal AI; Multimodal Large Language Models (MLLM); Multimodal Tokenization; Biomedical Signal Modeling; EEG and Audio Tokenization; Chain of Thought (CoT)

Document Type

Doctoral dissertations

DCMI Type

Text

Extent

xx, 284 leaves

Language

English

Rights

http://rightsstatements.org/vocab/InC/1.0/

License

http://creativecommons.org/licenses/by/4.0/

Share

COinS