Exploring the role of non-coding structural variations in common traits and diseases in the UK biobank

Last updated:: 2 July 2025

ID:: 145111
Start date:: 27 June 2025
Project status:: Current
Principal investigator:: Dr Uira Souto Melo
Lead institution:: Max Planck Institute for Molecular Genetics, Germany

There are currently three major issues in the field of rare genetic disease diagnosis:
One of the most difficult issues is detecting large mutations known as structural variants (SVs). New technologies such as long-read sequencing have the ability to detect SVs with high precision, but they are prohibitively expensive, unscalable, and lack the computational tools essential for comprehensive analysis in the field of rare diseases. We created a machine-learning system that trains prior, low-cost technologies using data generated by novel, high-throughput sequencing machines.
The prioritizing of genetic variations is the second essential challenge in diagnostics. To address this issue, we created a machine-learning algorithm that learns from publicly available data and prioritizes genetic variations based on their likelihood of being deleterious, regardless of their location in the genome.
The third problem comes from the fact that clinical geneticists struggle to understand non-coding variations. Lastly, we developed an automated approach to interpret mutations detected in both coding and non-coding parts of the DNA, taking into account the three-dimensional architecture of the DNA’s contribution to the patients’ condition.
The aim of this proposal is to gain access to a UK biobank cohort of common traits and disorders in order to investigate the potential of our established machine learning algorithms and score system for identifying new molecular pathways. We would like to look at the role of rare SVs in common traits/diseases, as well as the potential for identifying new druggable targets in the genome by combining common SVs and single nucleotide variants analysis. Our primary goal is to provide early disease detection, which can lead to preventative measures, changes in lifestyle based on prognosis, therapy decisions after disease start, and the identification of potential medications for common diseases. The project is expected to last for 12-18 months.