Skip to navigation Skip to main content Skip to footer

Approved research

Integration of UK Biobank data to improve diagnostic yield of exomes for rare genetic disorders

Principal Investigator: Dr Klaus Schmitz-Abe
Approved Research ID: 52650
Approval date: August 30th 2019

Lay summary

Whole exome sequencing (WES) is increasingly being utilized as part of diagnostic odyssey; however, a large proportion of patients remain undiagnosed, creating a need for a systematic approach to increase the diagnostic yield. Using a cohort enrolled at Manton Center for Orphan Disease Research at Boston Children's Hospital (BCH), our group have showed a strategy for retrospective reanalysis of clinical genetic diagnostic studies (Schmitz, et al., 'Unique bioinformatic approach and comprehensive reanalysis improve diagnostic yield of clinical exomes', European Journal of Human Genetics, 2019). In summary, we report our cumulative experience from reanalysis of negative clinical exome sequencing (CES) patients using a custom-built comprehensive variant detection and analysis pipeline along with updated phenotypic information, literature and databases in collaboration with an interdisciplinary team. These advances allowed us to reach a confirmed or potential diagnosis for up to a third of previously negative CES cases, demonstrating a meaningful improvement in clinical diagnostic yield while minimizing costs at the hospital. Using next generation data in combination with phenotypic information was crucial to enhance diagnostic yield in our custom-built Variant Explorer Pipeline', (VExP). We used 1,579 independent families (3712 samples) and HPO terms for each patient to optimize the system. Increasing the number of independent families using the UK Bio-Bank (50,000 samples) will certainly improve the predictive capability of VExP, and therefore enhance diagnostic yield in negative CES cases. The project will share to the scientific community our experience and the importance of the incorporation of genetic datasets into custom pipelines to improve time and costs in clinical diagnosis. In addition, it is very important to continue in the discovering of novel pathogenic mutations and expanding genotype-phenotype correlation, covering knowledge gaps in the gene-disease association. We estimate that I will take 18 to 24 months to integrate UK Biobank data, train our neural network, optimize our processes, publish our results and implement it at our hospital.