Principal Investigator: Dr Manuel Rivas
Department: Biomedical Data Science
Stanford University, Biomedical Data Science, 365 Lausen Street, Third Floor Littlefield, Stanford CA 94305, United StatesTags: 24983, genetics, High-dimensional methods, Learning, Therapeutics
Lead Collaborators: 1) Dr Matti Pirinen
Collaborating Institutions and Addresses:
- University of Helsinki, Institute for Melecular Medicine Finland, Human Genetics, Biomedicum Helsinki 2U, Tukholmankatu 8, Helsinki 00014, Finland
Internally funded by Stanford University’s start-up fund
1a: This proposal seeks access to UK Biobank data to support efforts to generate effect therapeutic hypotheses from genomic and hospital in-patient data. We have developed novel statistical methods to assess the impact of genetic variation across a broad range of disease outcomes. We plan to take advantage of the tree structure of the ICD-10 codes to improve inference. By doing so we hope to prioritize genetic effects that are consistent with a protective profile. This will result in a set of therapeutic hypotheses that academics, pharmaceutical companies, and the public may be able to pursue.
1b: The research we plan is in agreement with the stated aim of UK Biobank “research intended to improve the prevention, diagnosis and treatment of illness and the promotion of health throughout society”.
By communicating to the public the set of therapeutic hypotheses we can generate from the data that has been generated by UK Biobank we hope that this will expedite interest in drug development from these insights.
1c: We will combine assessments of genetic associations with the tree-structure of ICD-10 codes and apply new statistical learning techniques to the summary data.
A special class of genetic variants that we will focus on are protein-truncating variants (PTVs), commonly referred to as loss-of-function variants. Scanning for protective PTVs has been a successful strategy. These protective genetic variants reveal a process that is safe (naturally occurs in healthy adults) and effective (proven to reduce risk of disease).
1d: The full cohort.
We will use deep learning techniques to derive new features from the bulk field and assess how they are related to genetic variants that we prioritize as putative protective alleles or genetic variants that modify disease risk.