Identifying potential gene-by-environment interactions for lung cancer based on genome-wide single nucleotide polymorphisms (SNPs) using newly developed robust inference methods
Our research aims to promote newly developed flexible statistical methods for analyzing lung cancer data to discover additive gene-by-environment interactions affecting the cancer, where interaction is measured using difference of disease risks. Existing methods for additive gene-by-environment interaction assume very specific genetic models. We provide methods which is more flexible compared to these methods in the sense that our methods provide more reliable results when one has no knowledge of the actual genetic model. We further show that if we let gene and environment be independent in the whole population, our method can better detect a truly exisitng interaction. The project is currently estimated to continue for three years. Through this project, we aim to extend our understanding on the relative importance of various environmental factors on the development of lung cancer. The environmental variables that we are interested in are given below. - Addictions - Alcohol - Alcohol use - Alcoholic beverages yesterday - Anxiety - Arterial stiffness - Body size measures - Bone size, mineral and density by DXA - Bone-densitometry of heel - Brain MRI - Bread/pasta/rice yesterday - Breathing - Broad WGS pilot - COPD outcomes - Cancer register - Cancer screening - Dementia outcomes - Depression - Diet - Diet questionnaire performance - Early life factors - Education - Ethnicity - Family history - Female-specific factors - Medical conditions - Medical information - Medication - Medications - Mental distress - Mental health - Residential air pollution - Residential noise pollution.
Our primary research question is: 1) Are our newly developed robust inference methods for additive interaction able to uncover and validate hitherto undiscovered gene-by-environment interactions for lung cancer?
Our research aims are as follows.
1) Applying and promoting newly developed robust inference methods to identify gene-by-environment interactions using genome-wide single nucleotide polymorphisms (SNPs), and demonstrating the efficiency and flexibility of these methods.
2) Uncovering novel potential causal variants whose associations with lung cancer are moderated by one or more environmental risk factors (such as smoking, pollution etc.), where the effect of different levels of the environmental factor(s) on the genetic factor(s) can be heterogeneous.
3) Furthering our understanding on the nature and pattern of gene-by-environment interaction for subsequent primary lung cancer that might trigger more nuanced methodological development for such analysis in future.
Extended scope: Three additional aims relevant to the scope of current project.
1) Establishing environmental and genetic factors associated with the risk of subsequent primary lung cancer (SPLC) and validating previously developed a risk-prediction model for SPLC among lung cancer patients using the UK Biobank data
- Method: Cause-specific Cox regression will be conducted to establish the factors associated with SPLC risk. We will further validate the SPLC prediction model by applying it to the UK biobank data, based on predictive performance metrics including calibration, discrimination and predictive accuracy (Brier score)
2) Examining the effect of smoking cessation on risk of SPLC using biomarker and genetic data
- Method: Smoking is one of the environmental factors presenting the association with SPLC risk. By using the repeated measures of smoking data and smoking-related metabolomics in UK biobank data, we will examine the risk reduction of SPLC by smoking cessation, based on cause-specific Cox mode to take into account high competing risk of lung cancer patients
3) Extending the lung cancer cohort to breast cancer patients to examine SPLC outcome
- Method: Based on the developed pipeline we used to construct lung cancer cohort for SPLC outcome, we will construct breast cancer cohort to estimate SPLC risk and associated genomic and environmental factors