Last updated:
Author(s):
Michael Lape, Daniel Schnell, Sreeja Parameswaran, Kevin Ernst, Shannon O'Connor, Nathan Salomonis, Lisa J. Martin, Brett M. Harnett, Leah C. Kottyan, Matthew T. Weirauch
Publish date:
20 June 2025
Journal:
Communications Medicine
PubMed ID:
40542146

Abstract

BackgroundMany relationships between pathogens and human disease are well-established. However, only a small fraction involve diseases considered non-communicable (NCDs). In this study, we sought to leverage the vast amount of newly available electronic health record data to identify potentially novel pathogen-NCD associations and find additional evidence supporting known associations.MethodsWe leverage data from The UK Biobank and TriNetX to perform a systematic survey across 20 pathogens and 426 diseases, primarily NCDs. To this end, we assess the association between disease status and infection history proxies using a logistic regression-based statistical approach.ResultsOur approach identifies 206 pathogen-disease pairs that replicate in both cohorts. We replicate many established relationships, including Helicobacter pylori, with several gastroenterological diseases and connections between Epstein-Barr virus and both multiple sclerosis and lupus. Overall, our approach identifies evidence of association for 15 pathogens and 96 distinct diseases, including a currently controversial link between human cytomegalovirus (CMV) and ulcerative colitis (UC). We validate the CMV-UC connection through two orthogonal analyses, revealing increased CMV gene expression in UC patients and enrichment for UC genetic risk signal near human genes that have altered expression upon CMV infection.ConclusionsCollectively, these results form a foundation for future investigations into mechanistic roles played by pathogens in the processes underlying NCDs. All results are easily accessible on our website, https://tf.cchmc.org/pathogen-disease.

Related projects

Polygenic risk scores (PRSs) represent a method for calculating a person’s predisposition for the development of a particular disease. Utilizing data generated from genome-wide association…

Institution:
Cincinnati Children's Hospital Medical Center, United States of America

All projects