Approved Research

Statistical Methods for Reproducible Discoveries in Genome-Wide Association studies

Technical University of Darmstadt

Lay summary

In recent years, it has become evident that statistical and machine learning currently lack computationally scalable, tractable, and robust methods for large-scale high-dimensional data. Consequently, discoveries, for example, in genomic data can be the result of coincidental findings that happen to reach statistical significance.

The aim of this 36 month project is to develop new fast statistical learning methods for reproducible large-scale genome-wide association studies (GWAS). More specifically, the developed methods will control the false discovery rate while maximizing the total number of discoveries. This leads to fewer false positives and, therefore, more reliable discoveries. By applying the newly developed sophisticated statistical methods with provable performance guarantees to the UK biobank genomics data, we will systematically evaluate the reproducibility of existing discoveries in GWAS.

The expected value of our to be developed methods and their application to the UK biobank genomics data and a large variety of phenotypes is as follows:

Providing sophisticated statistical learning tools and theory that allow faster progress in areas, such as personalized medicine, drug discovery, and medical research.
Providing open source software packages that implement the developed statistical methods and benefit practitioners in the field of genomics.
Providing reproducibility evaluation for many existing discoveries in published GWAS (e.g., GWAS catalog).

In summary, it is our aim to contribute useful tools and methods that enable the discovery of valuable medical information from the UK biobank.