Explainable Clinical LLM for Disease Prediction from Multi-modal Clinical Data

Last updated:: 2 July 2025

ID:: 499202
Start date:: 14 May 2025
Project status:: Current
Principal investigator:: Dr Diego Paez-Granados
Lead institution:: ETH Zurich, Switzerland

In recent years, the rapid advancement of LLM has had a profound impact across various domains, including healthcare. However, current healthcare LLM are usually text-based models fine-tuned by medical question- answering datasets. There is limited exploration into handling multi-modal clinical data, such as sequential records (e.g., ECG, PPG), tabular medical records (e.g., temperature, blood test results), and medication records. Exploring this area could be highly valuable, as it simulates the decision-making process of a doctor and offers significant support to real-world medical practice. This project will develop an explainable clinical Large Language Model based on LLAMA, an open-source LLM, to predict disease types from diverse clinical signals. Unlike traditional LLMs, which primarily process text data, this model will integrate multi-modal clinical inputs. Leveraging data from the Biobank, we aim to fine-tune this model to achieve high prediction accuracy and robust explainability for use in clinical decision-making.

Research Questions:
How can we embed multi-modal clinical features effectively into an LLM framework to enable disease prediction?
How can the model be designed and explainable AI techniques applied to ensure disease predictions are interpretable and clinically meaningful?

We plan to publish relevant academic papers for our results, but we will follow the rules and AI policy, and will not open-source the model weights trained on Biobank data without permission from the Biobank team.