Last updated:
Author(s):
Amanda Elswick Gentry, Robert M. Kirkpatrick, Roseann E. Peterson, Bradley T. Webb
Publish date:
20 July 2023
Journal:
Frontiers in Genetics
PubMed ID:
37547462

Abstract

Introduction: The availability of large-scale biobanks linking genetic data, rich phenotypes, and biological measures is a powerful opportunity for scientific discovery. However, real-world collections frequently have extensive missingness. While missing data prediction is possible, performance is significantly impaired by block-wise missingness inherent to many biobanks. Methods: To address this, we developed Missingness Adapted Group-wise Informed Clustered (MAGIC)-LASSO which performs hierarchical clustering of variables based on missingness followed by sequential Group LASSO within clusters. Variables are pre-filtered for missingness and balance between training and target sets with final models built using stepwise inclusion of features ranked by completeness. This research has been conducted using the UK Biobank (n > 500 k) to predict unmeasured Alcohol Use Disorders Identification Test (AUDIT) scores. Results: The phenotypic correlation between measured and predicted total score was 0.67 while genetic correlations between independent subjects was high >0.86. Discussion: Phenotypic and genetic correlations in real data application, as well as simulations, demonstrate the method has significant accuracy and utility for increasing power for genetic loci discovery.

Related projects

Psychiatric disorders, including major depressive disorder and alcohol use disorder are common and contribute substantially to global morbidity. Efforts to identify genetic variants influencing risk…

Institution:
Virginia Commonwealth University, United States of America

All projects