Principal Investigator: Professor Roland Eils
Charité – Universitätsmedizin Berlin, Berlin, GermanyTags: 49966, Epistasis, genotype-phenotype-relation, hierarchical structure, Machine Learning, neural network
Genetic contributions to many health-related phenotypes like blood pressure and body mass index often arise from complex interactions between multiple genes. While there are diseases related to variants of a single gene, like cystic fibrosis, this is not true for most diseases and thus an understanding of how variants of different genes interact, is of great interest. We plan to make use of recent advances in the field of machine-learning to analyse the comprehensive data available in the UK-Biobank. Starting with variants of individual genes, we want to predict how they influence the activity of the systems they are involved in, for example DNA repair. Based on these predictions, we want to predict the activity of more complex systems and repeat this process until the information of all genes is combined in one system describing the effect of all variants. At that point, the information on thousands of genes will be available in a very compressed representation, which will be the basis to predict phenotypes like the blood pressure.
Additionally, this approach can be used to predict which cellular systems are affected by a given set of variants resulting in a specific phenotype. For example, for an individual with increased blood pressure and variants in multiple genes, this could be used to predict, whether the DNA repair system is involved in increased blood pressure. Information like this can be used to check if the model makes its predictions in an intuitive way and can also be the starting point of new research, if strong connections are found.
Additionally, if the model’s predictions are accurate enough, it could be the basis for a tool that integrates genomic data of an individual to make prognoses for certain diseases and thus guide preventive actions. For this purpose, the model would be trained to predict risks for diseases and disease-associated phenotypes.