Abstract
Although neuromuscular junction disorders (NMDs) and inflammatory polyneuropathies (IPNs) are biologically distinct, direct genetic comparisons between them remain limited, suggesting that additional underlying biological differences may yet be uncovered. Few studies have explored whether differences in variant patterns within shared biological pathways can be leveraged to distinguish NMDs and IPNs using machine learning (ML). We propose an interpretable ML framework based on Pathway-based Genetic Variant Dosage Average (PGVDA) to classify NMDs and IPNs and to identify key genes and pathways differentiating diseases. Using nonsynonymous variants from 667 UK Biobank participants, logistic regression identified disease-associated variants. Significant pathways were identified via pathway enrichment analysis with adjusted P-value < 0.05. PGVDA was calculated by assigning the log odds ratio to each variant dosage and then computing a weighted average at the pathway level. Dimensionality reduction was performed via hierarchical clustering based on gene-set overlaps and then PGVDAs with a variance inflation factor (VIF) > 10 were excluded. ML models were evaluated using leave-one-out cross validation. Utilizing the best-performing model, SHAP-based interpretation was applied using two distinct input configurations. Pathway-level interpretation using PGVDA input included stages of PGVDA scaling and ML-based classification, while variant-level interpretation using variant dosage input encompassed stages from odds ratio-based weight assignment to ML-based classification. Using logistic regression model with best performance, key differentiating five PGVDAs and 10 genes within each pathway were identified, suggesting that pathway-level variant aggregation enables accurate and interpretable classification of these two neuromuscular diseases. External validation is needed to ensure generalizability across populations.