Predicting missing omics modality in UK Biobank

Last updated:: 2 July 2025

ID:: 134941
Start date:: 3 January 2024
Project status:: Current
Principal investigator:: Dr Bingxin Zhao
Lead institution:: University of Pennsylvania, United States of America

Omics data such as gene expression and DNA methylation are generally missing in UK Biobank. It could be extremely useful for various functional omics research if such data is available in large-scale cohort such as UK Biobank. By utilizing pre-trained predictive model using data collected by other large-scale cross-omics cohort studies as reference data, missing omics data in UK Biobank can be recovered through prediction. Thus, not only the missing entries can be predicted, but completely missing omics features can also be imputed when other non-omics variables are available in UK Biobank. This project will build predictive model for omics features that can be directly applied to UK Biobank dataset in order to recover missing omics entries or omics features using existing observed non-omics data. We expect the predictive models generated by this project will be valuable research resource and will for sure be popularly adapted in the various omics research community.