Approved Research
Causal inference for complex data structure with Mendelian randomization
Lay summary
Data-driven biological research has led to numerous breakthroughs in life sciences and medicine, including helping deliver new treatments for complex diseases and deepening our understanding of biological evolution and disease etiologies. Data-driven biological research certainly calls for better statistical methods for analyzing biological datasets, to ensure that the new knowledge drawn from data analyses is not confounded by the potential violation of the assumptions made or the biases in the data collection process. Good methods should also be able to leverage as much information as possible, including information from other data collection centers, other data modalities, and even existing biological knowledge on the biological system under study. The more information available to us, the less likely we make a mistake. However, the premise is that we have good methods to achieve that.
In this project, our aim is to partially fulfill this goal, by developing rigorous methods for drawing causal conclusions from UK Biobank data, using datasets from multiple centers, multiple modalities, and multiple time points. We expect that our new methods can leverage more information than the existing ones, including datasets and abstract information from databases like KEGG, Gene Ontology, etc. Our goal is to ground the statistical methodologies with rigorous statistical theory, in order to offer biologists convincing evidence from statistical analyses with as small statistical errors as possible. In the long run, we believe that results from such rigorous statistical methods should be more trustworthy. Based on such results, it is also more convincing to our "consumers", such as decision-makers, and physicians, to use the new knowledge when facing important public health questions.