A human genome is a sequence of about 3 billion letters (A,C,G,T). Understanding which genetic variants, i.e. changes in this sequence, influence our health is crucial in order to understand the underlying molecular mechanisms and, ultimately, to devise medical interventions. Today, scientific studies to statistically test which genetic variants are associated with health and disease typically focus on so called single-nucleotide variants (SNVs). Such a SNV describes a difference in the genome of two individuals affecting a single letter, for instance where one person carries a T and another person carries a C. When comparing two individual genomes, there are usually 3-5 million of such SNV differences.
Beyond SNVs, human genomes also differ in terms of more drastic changes, for instance when a whole segment of DNA is present in one individual and absent in another individual. A difference that affects more than 50 letters of DNA are called structural variants (SV). While SVs are smaller in number with typically 25,000 to 30,000 SVs per genome, they are larger in size and collectively affect more letters in an individuals genome than SNVs. Despite this large impact on our genomes, such structural variants are rarely tested directly for association to human traits, mostly due to the technical challenges in analyzing them.
Within this project, we use new data resources created by the Human Pangenome Reference Consortium (HPRC) and new computational tools created in our lab to characterize SVs in the genome sequencing data of the UKBB and perform association tests to determine which SVs are correlated with measured biomarkers and health outcomes. This project will run for three years and, if successful, will yield many new associations between SVs and human disease. These findings can then serve as a foundation for future studies to elucidate the causal mechanisms and to potentially development new treatments.