Skip to navigation Skip to main content Skip to footer

Approved Research

Improving the power of genetic studies for complex diseases with deep learning

Principal Investigator: Dr Silvia Paracchini
Approved Research ID: 80652
Approval date: February 15th 2022

Lay summary

Neurodevelopmental and psychiatric disorders, like dyslexia and schizophrenia, are caused by a large genetic component (~70%). The identification of specific genetic factors could improve the clinical management for these diseases. Gene identification requires two criteria. One, classification of patients. And two, computational methods for genetic analysis.  

Current clinical categories tend to be very crude. They fail to capture the spectrum of symptoms that characterise these disorders. Furthermore, co-occurance of multiple disorders in the same individuals is often dismissed. As a result, the same category is very mixed. It could include individuals with mild or severe symptoms, and with a range of disorders. In terms of data analysis, the current strategy focuses on increasing sample sizes. It is clear that many more factors will not be identified even in very large samples. Instead, new methods for data analysis are required.

Machine learning (ML) approaches have already proved to be useful in the medical field. For example, they improved predictions in cancer diagnosis from medical images. ML can be defined as a system that identifies patterns from complex data. ML methods have the potential to resolve the two above limiting factors. At the clinical level, ML can better categorise individuals from multiple assessments (e.g. cognitive, behavioural, clinical and brain imaging). Computationally, ML can exploit many layers of information associated with the human genome. Current methods treat the genome as a uniform line. Instead, different parts of the genome are more active than others and have different functions. For example, we know which genes and which sequences are important for brain development. Here, we propose to identify genes for psychiatric disorders using ML methods. This will be possible thanks to the multiple types of data now available. Such data are available from many public databases and from the UK Biobank.

We are a team with expertise in genomics, psychiatry, and machine learning united by the goal to advance research in the field of psychiatric disorders. We are based in three Institutions across Germany and the UK. The UK Biobank is the ideal dataset to apply these methods because of the unique combination of multiple types of data available for the same individuals. Our results have the potential to improve risk predictions and strategies for managing these conditions.