Last updated:
ID:
295265
Start date:
27 May 2025
Project status:
Current
Principal investigator:
Professor James OLIVER JOSHUA Davies
Lead institution:
University of Oxford, Great Britain

Only 5% of the human genome encodes proteins, which are the molecules that are responsible for the way in which cells, tissues and organism’s function. The remainder of the genome determines when and how much of each protein should be made in different cell types. We have been able to interpret the protein coding part of the genome since the 1960s because the same code is used in all living organisms. However the non-coding genome is much more difficult to interpret because every cell type in every organism reads it as a different language. Our project aims to develop artificial intelligence based models to interpret the 95% of the genome that creates the instructions for complex life.
We plan to use these models to interpret the genome sequencing datasets in UK biobank to find new genes that are associated with human phenotypes and disease. In addition, the large datasets in the UK biobank could be used to test whether our models are accurate.
The project has the potential to impact at several different levels. First it might allow improved correlation between genotype and disease, allowing us to improve our ability to identify people at a high risk of developing disease. It is also likely to identify new genes and biological pathways that are important in developing disease and modulating disease outcomes, which will potentially have an important impact on drug discovery.