Last updated:
Author(s):
Mustafa İsmail Özkaraca, Mulya Agung, Pau Navarro, Albert Tenesa
Publish date:
13 March 2025
Journal:
Genetics
PubMed ID:
40080676

Abstract

Genome-wide association studies (GWAS) are computationally intensive, requiring significant time and resources with computational complexity scaling at least linearly with sample size. Here, we present an accurate and resource-efficient pipeline for GWAS that mitigates the impact of sample size on computational demands. Our approach involves (1) randomly partitioning the cohort into equally sized sub-cohorts, (2) conducting independent GWAS within each sub-cohort, and (3) integrating the results using a novel meta-analysis technique that accounts for population structure and other confounders between sub-cohorts. Importantly, we demonstrate through simulations and real-data examples in humans that our approach effectively manages analyzing related individuals, a critical factor in real datasets, while controlling for inflated effect sizes, a phenomenon known as winner’s curse. We show that our method achieves the same discovery levels as standard approaches but with significantly reduced computational costs. Additionally, it is well-suited for incremental GWAS as new samples are added over time. Our implementation within a bioinformatics workflow management system enhances reproducibility and scalability.

Related projects

Genes and environmental exposures determine susceptibility to common diseases such as diabetes or cancer. The relative contribution of genes to disease risk is known as…

Institution:
University of Edinburgh, Great Britain

All projects