A Data Mining-based Workbench: Advancing Precision Medicine by the Use of Machine Learning and Expert Knowledge

University Medical Center

Lay summary

Precision medicine is a form of healthcare where disease prevention and treatment is tailored to the individual patient. Besides environmental factors and lifestyle, also genetic variation is taken into account. This proposal will aim to simulate potential effects of pharmacological and lifestyle interventions in an individual person. Much of the knowledge we possess about genomic risk factors comes from statistical measures of association from large-scale population studies. The conceptual and practical disconnect between the populations we study and the individuals we want to treat is a major topic in research. The primary goal of this proposal is to develop a methodology based on machine learning to facilitate precision medicine for CVD patients by connecting population and individual genomic phenomena. We aim to develop a so-called data mining-based workbench, which will allow clinicians to carry out thought experiments about the treatment of individual patients using models of CVD risk derived from population-level studies. This will help clinicians understand how these risk factors might be useful for the diagnosis and treatment of an individual, accelerating the translation of genomic findings into the clinic.

The proposed APM-GDM is based on representation learning which means it can be fed with raw data and automatically extract necessary representation for predictions. An ensemble method or a DL network can provide representations at different levels. In neural networks, for example, the output of each of hidden layers is considered as the representation at that level. The higher layers the data belong, the more abstract representations we get for these data. In different studies, these higher-level representations of raw data prove to be very effective for classification or detection problems.

The project will be conducted for three years and intend to use as many individuals as available to satisfy the need for statistical and machine learning.