Abstract
Deep neural networks have shown significant advancements in modelling complex non-linear relationships in high-dimensional biomedical data. Understanding the interplay between genetic variants and disease susceptibility is still a considerable challenge that prevents certain genomic diseases to be predicted accurately for clinical interventions. In this study, we introduce the Extensive Multi-Variant Deep Neural Network (EMV-DNN), an innovative deep learning methodology designed to enhance polygenic risk prediction. Unlike conventional polygenic risk score methods, EMV-DNN incorporates single nucleotide polymorphisms (SNPs) alongside structural variants including insertions and deletions (indels), short tandem repeats (STRs), and copy number variants (CNVs) using variant-specific subnetworks to extract informative embeddings which capture a richer and holistic genomic context. Evaluated on real-world cohorts from the UK Biobank and All of Us, EMV-DNN outperforms conventional PRS methods and classic machine learning algorithms across binary and multi-class prediction tasks. Beyond predictive performance, SHapley Additive exPlanations (SHAP) analysis revealed biologically plausible variant-gene-disease associations, highlighting pathways related to endometrial cell proliferation, fibrosis, and immune regulation. Our findings underscore the value of multi-variant integration and non-linear approaches to capture the intricate genetic architecture of complex genomic diseases. Despite challenges such as dataset limitations and the complexity of diseases with multiple contributing factors, the EMV-DNN methodology presents a promising avenue for enhancing the predictive accuracy of PRS, thereby facilitating personalized healthcare interventions and advancing our understanding of genetic predispositions to disease.