Skip to navigation Skip to main content Skip to footer

Approved Research

Genetic and environmental analysis for disease prediction models

Principal Investigator: Professor Bermseok Oh
Approved Research ID: 83990
Approval date: June 7th 2022

Lay summary

This study aims to develop models for the prediction of disease or health index using individual genetic and environmental data. To this end, we will investigate 1) the distribution of health variables based on genome-wide polygenic risk score (GPS), 2) characterize subjects whose observed measures deviate from the predicted estimates of the GPS model, 3) determine the additional genetic and environmental factors, and 4) construct models based on these variables using both conventional and machine learning-based methods.

Although hundreds of thousands of genetic signals were identified in last two decades, the accuracy of prediction models for diseases and health indices is far below for their clinical practice. We found following factors from our previous study on obesity that 1) models using genome-wide SNPs (GPS) increased a lot of accuracy compared to using the limited number of genome-wide significant SNPs (GRS), 2) many subjects have their BMI values greatly deviated from the predicted values calculated by the GPS model, and 3) this phenomenon tends to be larger in the higher GPS groups. In addition, recent publications suggest rare genetic variants as a major source of the low accuracy of prediction. Therefore, we hypothesize that the low accuracy of prediction model with GPS is attributed to the following causes: 1) additional genetic factors such as rare variants affect the phenotypes in addition to GPS, 2) heteroscedasticity increases the residuals of model, and 3) gene-environment interaction modifies the genetic effect on the phenotype. This study will approximately take three years.

This study will improve the public health in the following points. A good prediction model can select genetically high-risk subjects who may have a benefit from the prevention program. As a result, the prevention program becomes more effective to decrease the disease incidence, and reduces the social cost for diseases. Further, subjects with genetically high risk may pay more attention to their health and will try to change their lifestyle and environment. Since the prediction model uses the lifestyle and environmental risk factors in addition to genetic risk factors for modelling, and it may provide personalized recommendations to high-risk individuals how to manage their health in respect to the lifestyle and environments. By the advance of prediction modelling, we ultimately hope to find the way to manage diseases effectively through the personalized prevention, treatment and medication.