Last updated:
Author(s):
Zhimei Ren, Yuting Wei, Emmanuel Candès
Publish date:
14 September 2021
Journal:
Journal of the American Statistical Association

Abstract

Model-X knockoffs is a general procedure that can leverage any feature importance measure to produce a variable selection algorithm, which discovers true effects while rigorously controlling the number or fraction of false positives. Model-X knockoffs is a randomized procedure which relies on the one-time construction of synthetic (random) variables. This article introduces a derandomization method by aggregating the selection results across multiple runs of the knockoffs algorithm. The derandomization step is designed to be flexible and can be adapted to any variable selection base procedure to yield stable decisions without compromising statistical power. When applied to the base procedure of Janson and Su, we prove that derandomized knockoffs controls both the per family error rate (PFER) and the k family-wise error rate (k-FWER). Furthermore, we carry out extensive numerical studies demonstrating tight Type I error control and markedly enhanced power when compared with alternative variable selection algorithms. Finally, we apply our approach to multistage genome-wide association studies of prostate cancer and report locations on the genome that are significantly associated with the disease. When cross-referenced with other studies, we find that the reported associations have been replicated. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

Related projects

Our goal is to develop new data analysis methods that are well suited to discover the many genetic signals that influence traits of medical relevance.

Institution:
Stanford University, United States of America

All projects