Genome-wide investigation of copy number variation under selection and its phenotypic and public health implications across populations

Last updated:: 16 March 2026

ID:: 1072759
Start date:: 2 March 2026
Project status:: Current
Principal investigator:: Dr Ho-Young Son
Lead institution:: Seoul National University, Korea (South)

Copy number variation (CNV) is a major form of genetic diversity that influences disease susceptibility, complex traits, and population adaptation. Yet conventional CNV detection methods often fail in genomic regions with paralogous genes or pseudogenes, where read mapping is unreliable. This leads to incomplete understanding of CNV architecture and its relevance to human health.
The aim of this project is to overcome these limitations by applying computational strategies that can accurately estimate paralog- and allele-specific copy numbers at scale. Using UK Biobank’s whole genome sequencing (WGS) and SNP array data, we will: 1) Identify CNVs under selection by comparing frequency patterns across ancestry groups, highlighting loci that may have evolved under environmental or dietary pressures.; 2) Characterize phenotypic associations by conducting phenome-wide CNV association analyses across a broad set of traits (anthropometric, metabolic, lifestyle, clinical outcomes).; 3) Integrate SNP array data to compare CNV- and SNP-based associations and to detect SNPs in linkage disequilibrium with CNVs, thereby identifying tagging variants that connect CNVs to existing GWAS.; 4) Produce outputs including sample-level paralog-specific CNV estimates, ancestry-stratified frequency maps, and SNP-CNV correlation data. This project will provide new insights into the role of structural variation in health and disease, improve our understanding of population-specific adaptation, and create resources that will be shared with UK Biobank for use by the wider scientific community.