Genetic variation is a difference in the DNA sequence between individuals. There are many types of variants. The most common affects a single building block of DNA. Another class of genetic variation called structural variants (SVs) can affect tens to millions of consecutive such building blocks. This class of variation accounts for a greater genetic difference between individuals than single sequence changes. Additionally, SVs can have a significant role in disease risk as seen in neurodegenerative disorders, like ALS. Despite their importance, SVs are challenging to analyze and have been understudied. We have created a diverse, high-quality genetic dataset across multiple variant classes, including SVs, that can be used to study the effect of SVs on disease. This dataset has been created from whole-genome sequence (WGS) data on 3,202 samples from 26 populations around the world through an international collaboration known as the 1000 Genomes Project. This dataset can be used as a reference for geneticists looking to study variants that were not directly targeted in their own study, a process called imputation. The idea that a set of variants in an individual of the same ancestry can provide useful information about other variants that were not directly targeted forms the basis of imputation. Imputation increases the likelihood that a genetic analysis will detect an effect (e.g., a gene contributing to the risk of developing a disease) when one truly exists.
Our goal is to use the UK Biobank to study SVs in health and disease. This includes first demonstrating the accuracy of our panel to impute SVs and study the risk these variants have on a disease or trait. For this analysis, we will use height and body mass index (BMI) data. Upon study completion, we will publicly release our reference dataset enabling other researchers to perform SV imputation in their own studies. We will also publish a best-practices guide to SV imputation to promote robust research practices in the community. Next, we will characterize the role of SVs on cardiovascular disease; neuropsychiatric conditions, like autism; and neurodegenerative diseases, such as Alzheimer’s. This analysis may pinpoint genes that contribute to a person’s risk of developing these diseases. We anticipate the duration of this project to be approximately 5 years. Overall, knowledge gained from these studies has the potential to uncover new relationships between SVs and traits, unveil important biological complexities, and ultimately assess individual disease risk.