Approved Research

Assessment and characterization of protein-protein interaction affinity networks from whole exomes

Columbia University, New York

Lay summary

Proteins are the molecules of action encoded by genetic sequences and dictate a large portion of our biology. These proteins interact continuously in our body to facilitate a myriad of biological functions. We plan on characterizing natural variation in protein interactions in humans to improve our understanding of the normal range of interaction dynamics and therefore help determine when differences arising due to mutations meaningfully deviate from normal biology, and how they impact drug efficacy, disease predisposition, and biological phenotypes. However, measuring the affinity of these proteins to interact is not feasible by experimental means given the large number of protein pairs which can interact in multiple ways and vary between individuals. Recently, computational tools have been developed that take the amino acid sequence (protein building blocks) of proteins to determine their affinity for interacting. This new technology allows us to computationally quantify the interaction affinities of the UK biobank cohort to better our understanding of normal variation which can then be applied to determine meaningful differences in interactions between sub-populations such as those at higher risk of disease onset or adverse clinical outcomes. Given the complicated nature of these protein interactions, traditional statistical and contemporary machine learning approaches will be used to best determine the relationship between interaction network topology and sub-populations. These analytical pipelines would allow for scientists, clinicians, and other healthcare professionals to incorporate a protein interaction perspective into their decision making, ultimately with the goal of improving biological understanding. As this project is computational in nature its duration is dependent on the scope of analytical targets chosen; Targets will consist of biologically and clinically relevant phenotypes of interest such as height, lifespan, and cancer diagnosis. Agnostic of analytical target, determining interaction variation of the population would likely take approximately 2 years to develop an efficient pipeline capable of handling the entirety of the UK biobank cohort. With an additional year taken for the post-hoc analysis to determine meaningful conclusions which can be made along with their ensuing consequences and in some cases experimental followup in cell lines. The inclusion of multiple phenotypic targets for analysis would increase the timeline but would similarly increase the benefit of our work, as such a subset of meaningful targets will be chosen on the basis of prevalence, impact, and pre-existing knowledge.