Last updated:
Author(s):
David Curtis
Publish date:
15 August 2025
Journal:
Journal of Human Genetics
PubMed ID:
40813450

Abstract

A recently described method to predict pathogenicity of DNA variants uses a DNA language model and can be applied to both coding and non-coding variants. For coding variants the performance of this method, termed GPN-MSA (genomic pretrained network with multiple-sequence alignment), was reported to be superior to CADD. We compare the performance of this method against 45 other predictors applied to rare coding variants in 18 gene-phenotype pairs. We find that while GPN-MSA produces stronger evidence for association than CADD it is not the best-performing method for any gene and on average other prediction methods are superior. While GPN-MSA may be useful for predicting the pathogenicity of non-coding variants, it would seem sensible for clinicians and researchers to utilise other methods when dealing with coding variants.This research has been conducted using the UK Biobank Resource.

Related projects

Both common and rare genetic variants contribute to the heritability of complex traits. However, usually they are analysed separately using different analytical techniques, such as…

Institution:
University College London, Great Britain

All projects