Skip to navigation Skip to main content Skip to footer

Approved Research

Improving of polygenic scores using exome and whole genome resequencing data and repeat imaging

Principal Investigator: Dr Shinichi Morishita
Approved Research ID: 171511
Approval date: April 4th 2024

Lay summary

Polygenic scores (PGSs) are widely examined to use for phenotypic prediction, disease diagnosis, and treatment by aggregating the effects of genetic variants; most PGS methods focus only on their additive effects, despite the non-negligible non-additive genetic effects, including recessive, dominant, over-recessive, and over-dominant inheritance models. Using the UK Biobank genotyping data for the last several years, we have developed a method named GenoBoost that jointly estimates additive and non-additive genetic effects from individual-level Axiom Array genotyping data. Applied to twelve phenotypes in UK Biobank, GenoBoost was competitive across all traits, ranking first for four phenotypes and second for three phenotypes among the seven widely accepted methods and our proposal. Our results demonstrate that the non-additive models can improve risk prediction for polygenic diseases.

In addition to Axiom Array genotyping data, we designed GenoBoost to also incorporate single nucleotide variants (SNVs) identified by exome and whole genome resequencing data that are collected from approximately 500,000 individuals by the UK Biobank.

We will also develop an approach to improve PGS by using repeat imaging to obtain time-series changes that represent disease progression. Using the vast amount of data in the UK BioBank, it may also be possible to use genetic information to predict the next image from the first image and assess disease progression in each individual. Therefore, our research question is whether incorporating exome and whole-genome resequencing data, and repeat imaging features into GenoBoost and other PGS methods can further improve prediction accuracy. It remains to prove whether our method is effective when applied to exome and whole-genome resequencing data from approximately 500,000 individuals. The number of SNVs that can be detected in exome and whole-genome resequencing data is orders of magnitude higher than the number of SNVs in Axiom Array genotyping data, but the number of individuals is fixed. This can result in our boosting method overfitting exome and whole-genome resequencing data. To circumvent this, it is fundamental to develop methods to select relevant SNVs from exome and whole-genome resequencing data. To solve these problems, we would conduct this research for 3 years.

Because application of our method to genotyping data from the Axiom Array proves to improve PGSs compared to other conventional methods, we expect that our proposal to apply our approach to exome and whole-genome rearrangement data will further improve PGSs and thereby contribute to future disease diagnosis and treatment.