Last updated:
Author(s):
Wenjie Peng, Liubin Zhang, Bin Tang, Junhao Liang, Zhi Liu, Lihang Ye, Yangyang Yuan, Yifei Wang, Ruijie Tan, Nan Lin, Chao Xue, Hui Jiang, Li Fang, Miaoxin Li
Publish date:
30 September 2025
Journal:
Genome Biology
PubMed ID:
41029866

Abstract

Structural variants (SVs) contribute significantly to genetic diversity yet present computational challenges during analysis. We introduce SDFA, a standardized decomposition format and toolkit for efficient analysis of SVs in large-scale population genomics. SDFA efficiently stores and retrieves all SV types while providing algorithms for consistent SV merging, memory-efficient annotation, and precise gene feature annotation across large cohorts. SDFA outperforms existing tools, achieving at least 17.64 times faster merging than four tools and 120.93 times faster annotation than three tools, and uniquely handles complex SVs. We validate SDFA on 895,054 SVs from 150,119 individuals in the UK Biobank dataset.

Related projects

Mental diseases, like schizophrenia, bipolar disorder and depression, rank the top causes of disability in the worldwide. Multigenerational cohort studies demonstrate heritability of risk of…

Institution:
West China Hospital of Sichuan University, China

All projects