PheMAP: Measured, Automated Profile to Facilitate High Throughput Phenotyping

Last updated:: 12 October 2025

ID:: 158652
Start date:: 12 October 2025
Project status:: Current
Principal investigator:: Dr Wei-Qi Wei
Lead institution:: Vanderbilt University Medical Center, United States of America

Electronic health records (EHRs) are a powerful and efficient tool for biological discovery globally. However, a vital step for EHR-based research is valid, accurate, and reliable phenotyping (i.e., correctly identifying individuals with a particular trait of interest). Conventional approaches to phenotyping are ad hoc, domain expert dependent, rule-based, and usually specific to EHR environments. However, each requires an extensive investment of time and resources to develop due to the heterogeneity, complexity, inaccuracy, and frequent fragmentation of EHRs. The lack of general, automatic, and portable approaches to enable accurate high-throughput phenotyping is a critical barrier that hampers our ability to leverage valuable clinical data in EHRs for better healthcare. We propose a new generalizable high-throughput approach: Phenotyping by Measured, Automated Profile (PheMAP) that we have developed from public resources and will further refine and implement across various EHRs. We recognize that mass information about phenotypes is often described in significant detail and continuedly accumulated within publicly available resources. We hypothesize this information can be retrieved, filtered, organized, measured, and formalized into standard EHR phenotype profiles. Indeed, we have used such an ensemble approach to integrate four generalizable online medication resources to create MEDI–a resource linking medications and indications. In preliminary studies, we extended this strategy to phenotyping and created a prototype PheMAP. For each phenotype, we identified relevant clinical concepts and weighted each based on its importance to the phenotype. We then mapped all associated concepts to commonly-used clinical terminologies. Our preliminary studies showed an average consistency of 98.6%±0.8% between our early-stage PheMAP and three validated eMERGE. We seek support to refine and optimize PheMAP and develop tools to allow researchers to implement PheMAP efficiently in different EHRs. This will allow researchers to rapidly and accurately determine the status of thousands of phenotypes for millions of individuals with minimal human intervention. Since PheMAP is created using independent resources that are more generalizable than a local clinical dataset, the implementation will generate more consistent outcomes in different EHRs for large-scale analyses.The work we propose is a necessary step toward being able to conduct high-throughput GWASs and PheWASs. We will use data from multiple biobanks to accomplish these tasks. Specifically, we will achieve the following goals in this grant: 1.refine PheMAP and conduct large-scale validation, 2. implement PheMAP and perform representative GWASs and PheWASs, 3. Use PheMAP to conduct GWASs for unstudied or understudied diseases and phenotypes.