Last updated:
Author(s):
Bingxin Zhao, Shurong Zheng, Hongtu Zhu
Publish date:
1 June 2024
Journal:
The Annals of Statistics
PubMed ID:
39281348

Abstract

Genetic prediction holds immense promise for translating genetic discoveries into medical advances. As the high-dimensional covariance matrix (or the linkage disequilibrium (LD) pattern) of genetic variants often presents a block-diagonal structure, numerous methods account for the dependence among variants in predetermined local LD blocks. Moreover, due to privacy considerations and data protection concerns, genetic variant dependence in each LD block is typically estimated from external reference panels rather than the original training data set. This paper presents a unified analysis of blockwise and reference panel-based estimators in a high-dimensional prediction framework without sparsity restrictions. We find that, surprisingly, even when the covariance matrix has a block-diagonal structure with well-defined boundaries, blockwise estimation methods adjusting for local dependence can be substantially less accurate than methods controlling for the whole covariance matrix. Further, estimation methods built on the original training data set and external reference panels are likely to have varying performance in high dimensions, which may reflect the cost of having only access to summary level data from the training data set. This analysis is based on novel results in random matrix theory for block-diagonal covariance matrix. We numerically evaluate our results using extensive simulations and real data analysis in the UK Biobank.

Related projects

We intend to explore different imaging and genetic characteristics to work on automatic early tumor detection, benign and malignant tumor discrimination, and prediction of tumor…

Institution:
University of North Carolina at Chapel Hill, United States of America

All projects