Last updated:
Author(s):
Seon-Kyeong Jang, Zitian Wang, Richard Border, Dinh Tuan, Angela Wei, Ulzee An, Sriram Sankararaman, Vasilis Ntranos, Jonathan Flint, Noah Zaitlen
Publish date:
19 November 2025
Journal:
Cell Genomics
PubMed ID:
41265447

Abstract

Protein language models (PLMs) improve variant effect predictions, but their role in gene discovery for complex traits remains unclear. We introduce an allelic series-based regression test that uses PLM-derived variant effect predictions as proxies for effect sizes, identifying ∼46% more associations than standard burden tests. Extending this to isoform-level analysis, we find 26 gene-trait pairs with stronger associations in non-canonical versus canonical transcripts, highlighting isoform-specific effects. Finally, we identify evolutionary plausible variants (EPVs), missense variants assigned higher likelihoods than the wild-type alleles by PLMs, representing 0.45% of missense variants. EPVs show higher allele frequencies than synonymous variants, consistent with differential selection pressures, and are linked to nine traits, including protective associations with low-density lipoprotein (LDL) and bone mineral density. Together, our results demonstrate how PLMs can enhance rare-variant interpretation and gene-trait association discovery in exome data.

Related projects

The goal of the proposed work is to develop computational and statistical methods for analyzing large-scale genetic and phenotypic data. These methods include fast methods…

Institution:
University of California, Los Angeles, United States of America

All projects