Last updated:
ID:
1108406
Start date:
21 November 2025
Project status:
Current
Principal investigator:
Mr Pranav Iyer
Lead institution:
Sedona Health Inc, United States of America

Aim:
To develop a foundation model that learns mechanistic representations of human biology from multi-modal UK Biobank data including proteomics, genomics, imaging, and health records to predict disease risk and uncover biological pathways, beginning with proteomics and genomics.
Background & Rationale:
Proteomic data reflects the functional state of the body and captures disease processes more directly than genetic or transcriptomic data. Yet transformer models remain underpowered on omics data due to its high dimensionality and non-Euclidean structure.
We propose a graph-aware transformer that embeds inferred protein-protein interaction (PPI) networks directly into its attention mechanism, enabling reasoning over biological pathways rather than individual biomarkers.Because proteomic interactions are shaped by genetic context, we will condition proteomic networks on whole-genome data by aligning both modalities in a shared latent space. Model outputs will be evaluated against ICD-coded disease outcomes and electronic health records.
Research Questions:
(1) Can we construct biologically meaningful PPI graphs directly for serum proteomics data?
(2) Can graph-aware attention mechanisms improve interpretability and prediction accuracy of disease models over pure sequential methods?
(3) How transferable are learned embeddings across omics modality, through multi-omic models or model fusion between omic models.
(4) Can we condition proteomic networks on genomic data with biologically consistent outcomes?
Objectives:
(1) Create, fine-tune and interpret PPI structure using models such as variation graph autoencoders.
(2) Unify proteomic and genomic latent spaces using self-supervised ML techniques.
(3) Integrate these embeddings into a transformer trained on masked protein reconstruction and disease prediction.