Genotype-to-phenotype mapping with artificial intelligence

Last updated:: 2 July 2025

ID:: 575761
Start date:: 8 January 2025
Project status:: Current
Principal investigator:: Dr Nadav Brandes
Lead institution:: NYU Grossman School of Medicine, United States of America

Research Question: Can predictions made by modern Artificial Intelligence (AI) such as genomic large language models increase the power of genetic studies to identify causal genes and predict disease risk?

Objectives:
1. Integrate AI-driven predictions with traditional GWAS to improve the identification of causal variants and genes.
2. Enhance the accuracy of polygenic risk scores with variant effect predictions.
3. Characterize the multi-dimensional effects of genetic variants to distinguish between loss-of-function, gain-of-function and other classes of genetic effects.
4. Expand the application of AI predictions to non-coding regions of the genome.

Scientific Rationale: The Brandes lab has been at the forefront of leveraging powerful AI models such as protein language models (Brandes et al., Bioinformatics 2022) to predict variant effect (Brandes et al., Nature Genetics 2023) and discover gene-phenotype associations (Brandes et al., Genome biology 2020). We intend to build on these results and further expand the use of AI to inform human genetic studies with genomic prior knowledge.

Dissemination of Findings:
We plan to disseminate our research findings through peer-reviewed scientific publications. All publications will adhere to UK Biobank’s AI policy by clearly describing the methodologies, ensuring transparency about the model architecture and parameters, and confirming that the trained model cannot retrieve or generate participant-level data (actual or synthetic) or create a research environment equivalent to using participant-level data. Additionally, we will strictly comply with UK Biobank’s policy by ensuring that participant-level data will not be incorporated, directly or indirectly, into any publicly accessible Generative AI or similar models. We will also take precautions to avoid any inadvertent exposure of participant-level data, such as by refraining from posting such data to publicly accessible repositories like GitHub.