Improving the power of genetic studies for complex diseases with deep learning

Last updated:: 2 July 2025

ID:: 80652
Start date:: 15 February 2022
Project status:: Closed
Principal investigator:: Professor Silvia Paracchini
Lead institution:: University of St Andrews, Great Britain

Neurodevelopmental and psychiatric disorders, like dyslexia and schizophrenia, are caused by a large genetic component (~70%). The identification of specific genetic factors could improve the clinical management for these diseases. Gene identification requires two criteria. One, classification of patients. And two, computational methods for genetic analysis.

Current clinical categories tend to be very crude. They fail to capture the spectrum of symptoms that characterise these disorders. Furthermore, co-occurance of multiple disorders in the same individuals is often dismissed. As a result, the same category is very mixed. It could include individuals with mild or severe symptoms, and with a range of disorders. In terms of data analysis, the current strategy focuses on increasing sample sizes. It is clear that many more factors will not be identified even in very large samples. Instead, new methods for data analysis are required.

Machine learning (ML) approaches have already proved to be useful in the medical field. For example, they improved predictions in cancer diagnosis from medical images. ML can be defined as a system that identifies patterns from complex data. ML methods have the potential to resolve the two above limiting factors. At the clinical level, ML can better categorise individuals from multiple assessments (e.g. cognitive, behavioural, clinical and brain imaging). Computationally, ML can exploit many layers of information associated with the human genome. Current methods treat the genome as a uniform line. Instead, different parts of the genome are more active than others and have different functions. For example, we know which genes and which sequences are important for brain development. Here, we propose to identify genes for psychiatric disorders using ML methods. This will be possible thanks to the multiple types of data now available. Such data are available from many public databases and from the UK Biobank.

We are a team with expertise in genomics, psychiatry, and machine learning united by the goal to advance research in the field of psychiatric disorders. We are based in three Institutions across Germany and the UK. The UK Biobank is the ideal dataset to apply these methods because of the unique combination of multiple types of data available for the same individuals. Our results have the potential to improve risk predictions and strategies for managing these conditions.