Approved Research
Integrative analysis of inferred molecular traits and human phenotypic variation
Approved Research ID: 74519
Approval date: August 25th 2022
Lay summary
Determining the impact of inherited and acquired DNA sequence changes on human traits can help to reveal the ways in which diseases come about. The contribution of individual changes in the DNA sequence to a trait is usually subtle and can act either by changing the biochemical functions of the gene itself or by altering the level of activity of the gene. We plan to use insights from other data collections that include gene activity levels (known as gene expression) and other types of information to help derive a better understanding of how genomic variation may contribute to variation in human traits, with a particular focus on cancer risk. DNA sequencing allows the possibility of identifying not only the inherited set of DNA variations but also the acquired changes to the genome. Using sequencing data derived from the UK Biobank cohort, this project aims to test new statistical strategies to remove artefacts that arise during DNA sequencing. These can be difficult to distinguish from acquired changes to the DNA sequence. Because the changes to the genome that are acquired by our cells throughout our lifetime can lead to diseases (particularly cancer) and are likely to have a role in aging, understanding how the number and pattern of such changes varies between people may provide insights that will help to predict and ultimately prevent disease. We expect that advances in understanding the association of genome-wide variation with human traits may contribute to cancer prevention by identifying those at greatest risk as well as contributing insights relevant for developing novel treatments for human diseases more generally. We anticipate that this work will take three years to complete.
Current Scope
We are interested in understanding the relationship between genomic variation (germline and somatic) and human phenotypes, particularly the physical measure, biomarker and health outcomes (especially cancer) phenotypes in UKB. We plan to focus primarily on understanding and predicting phenotypic variation through the lens of molecular phenotypes. These molecular phenotypes will include those that can be predicted from genomic variants by integrating information from other data repositories (containing, for example transcriptome data) as well as molecular phenotypes that can be inferred from sequencing data within the UKB itself. The latter relates specifically to somatic mutations. Inferring patterns of somatic mutation from relatively low-depth sequencing will require the development and testing of new methods to distinguish somatic mutations from sequencing errors and other artifacts. We propose to develop statistical methods that can infer the frequency of distinct somatic mutation types at the sample level, even when individual mutations cannot be inferred with confidence. Given that somatic mutations are associated with ageing and human diseases (particularly cancer), we intend to assess the relationship between the inferred burden of classes of somatic mutations, age and disease risk. This could shed light on mechanisms leading to disease, informing disease risk prediction.
Extension in scope
We would like to include image data in the UKB in our analyses, particularly cardiovascular image data. We propose to use the image data as endophenotypes to assist in gaining a better understanding of the relationship between genotype and disease risk, particularly for age-associated conditions, including the genetics of innate differences in cardiovascular structures that may be relevant to the development of age associated cardiovascular conditions. In line with the original proposal, this is likely to include integrating transcriptomic data. This would include variation in gene expression predicted from genotype data as well as gene dosage implications of copy number and truncating variants. For this and for other phenotypes, we are also specifically interested in the impact of rare and ultra-rare variants, including development and application of methods to uncover aspects of disease aetiology through enrichment of rare variants in genomic regions of interest.