Optimizing cross-population proteomics prediction for PrediXcan

Last updated:: 5 August 2025

ID:: 898835
Start date:: 5 August 2025
Project status:: Current
Principal investigator:: Dr Heather E. Wheeler
Lead institution:: Loyola University Chicago, United States of America

Our method PrediXcan has been widely adopted in the complex trait genetics community [1]. We have built and shared genetic prediction models of gene expression and protein levels in diverse populations and thus broadened the scope of PrediXcan [2-5]. Matthew Fischer, a graduate student in our lab, has trained Olink plasma proteome across populations in TOPMed MESA. He seeks to test the performance his models in UK Biobank using genotype and proteomics data in all available ancestries.

Objective: Improve proteome prediction models for maximum utility in diverse populations. Rationale: Since allele frequencies and linkage disequilibrium structures differ between genetic ancestries due to different demographic histories, genetic prediction models trained in one population often do not perform as well in another and thus are currently of limited utility for risk prediction and mechanistic interpretation. Hypothesis: We expect the underlying biological mechanisms of complex disease to be shared across human populations, and thus proteome prediction methods that account for allelic heterogeneity and better pinpoint causal effects will improve discovery and interpretation of proteome-wide association studies (PWAS) across populations. Approach: We will use fine-mapping, elastic net, multivariate adaptive shrinkage, and ultimate deconvolution to optimize genotypic prediction of protein levels across populations. We will use whole genome sequencing and Olink proteomics data from the Trans-Omics for Precision Medicine (TOPMed) Multi-Ethnic Study of Atherosclerosis (MESA) for model training. We will test how the models predict protein levels from genotypes in UK Biobank. We will also perform PWAS with S-PrediXcan to identify replicable protein-disease trait associations using GWAS summary statistics from Pan-UK Biobank and All of Us.

1. Gamazon 2015
2. Mogil 2018
3. Okoro 2021
4. Schubert 2022
5. Araújo 2023