Last updated:
Author(s):
Benjamin B Chu, Seyoon Ko, Jin J Zhou, Aubrey Jensen, Hua Zhou, Janet S Sinsheimer, Kenneth Lange
Publish date:
3 April 2023
Journal:
Bioinformatics
PubMed ID:
37067496

Abstract

MOTIVATION: In a genome-wide association study, analyzing multiple correlated traits simultaneously is potentially superior to analyzing the traits one by one. Standard methods for multivariate genome-wide association study operate marker-by-marker and are computationally intensive.

RESULTS: We present a sparsity constrained regression algorithm for multivariate genome-wide association study based on iterative hard thresholding and implement it in a convenient Julia package MendelIHT.jl. In simulation studies with up to 100 quantitative traits, iterative hard thresholding exhibits similar true positive rates, smaller false positive rates, and faster execution times than GEMMA’s linear mixed models and mv-PLINK’s canonical correlation analysis. On UK Biobank data with 470 228 variants, MendelIHT completed a three-trait joint analysis (n=185 656) in 20 h and an 18-trait joint analysis (n=104 264) in 53 h with an 80 GB memory footprint. In short, MendelIHT enables geneticists to fit a single regression model that simultaneously considers the effect of all SNPs and dozens of traits.

AVAILABILITY AND IMPLEMENTATION: Software, documentation, and scripts to reproduce our results are available from https://github.com/OpenMendel/MendelIHT.jl.

Related projects

The aim of the proposed research is to identify genetic variants which are associated with either one circulating biomarker or with several circulating biomarkers at…

Institution:
University of Arizona, United States of America

The aim of this proposal is to develop several statistical methods and computational algorithms that address challenges of large datasets for identifying biomarkers associated with…

Institution:
University of California, Los Angeles, United States of America

All projects