Scientific Rationale
Understanding the interplay between genomic, lifestyle, and environmental factors is crucial for distinguishing health maintenance from disease progression. Our group previously established a Genomic Information Management System (GIMS) and identified genetic factors for chronic diseases (e.g., kidney disease) using the Korean Genome and Epidemiology Study (KoGES). However, to validate these findings and discover robust biomarkers, a larger-scale multi-ethnic analysis is essential. We hypothesize that distinct clinical-genomic patterns explain the transition from health to disease, which can be uncovered through big data analysis.
Research Objectives
Our primary objective is to discover explanatory factors for health maintenance and disease mechanisms using machine learning on large-scale biobank data.
Specific aims include:
Integrate UK Biobank (n!488,000) data with the KoGES cohort (n=8,840) and public omics databases (TCGA, NCBI GEO, ClinicalTrials.gov).
Utilize machine learning algorithms (e.g., Random Forest, Decision Trees) via R and Python to identify key features from clinical, genomic (genotype, DNA methylation), lifestyle, and environmental data.
Perform comparative analyses (Healthy vs. Disease, Survival vs. Mortality, Recurrence vs. Cure) to isolate variables significantly associated with health outcomes.
Research Questions
What constitute the distinct clinical-genomic patterns that differentiate healthy individuals from those with specific diseases across different populations?
Can we identify universal biomarkers for health maintenance by cross-analyzing Korean (KoGES) and British (UK Biobank) datasets?
How do lifestyle and environmental factors interact with genetic backgrounds to influence disease prognosis, survival, and recurrence?