Identify germline genetic variants, environmental risk factors and gene-by-environment interactions that are associated with cancer risks using non-linear machine learning models
Cancers are complex diseases caused by gene mutations, which are induced by both environmental factors and an individual's genetic background. Discovery of environmental and genetic risk factors contributing to cancer development will help us understand the cancer mechanisms and provide insights into the preventive treatments. So far, hundreds of genetic mutations and a few environmental factors have been found to be associated with cancer risks in a manner of linear relationships. However, few studies have focused on non-linear relationships between cancer and risk factors or higher-order interactions between genetic and environmental risk factors due to the lack of efficient models.
In this research project, we aim to build machine learning models to study the non-linear association between cancers and risk factors including environmental factors such as lifestyle, genetic mutations, and the genotype-by-environment interactions genome-widely. We would identify the genetic mutations in the whole genome, environmental factors and biomarker features that are best to predict cancer risks. We would also compare our results to linear models to elaborate on the different relationship models among genotypes, environment, and phenotypes.
If finished successfully, we expect our findings would complement the results from the previous linear models. Our non-linear machine learning models may identify novel cancer risk factors and predict individuals with high cancer risks that would be neglected by the linear models.
A patient is predicted as being at high risk of cancer can undertake preventive measures such as controlling some baseline indices or changing of lifestyle to lower his or her chance of getting cancer in the future.
We estimate this study would take up to two years for 2 full-time staff.