Last updated:
Author(s):
Uri Hartmann, Eran Shaham, Dafna Nathan, Ilana Blech, Danny Zeevi
Publish date:
28 October 2025
Journal:
GigaScience
PubMed ID:
41147705

Abstract

Phasing, the assignment of alleles to their respective parental chromosomes, is fundamental to studying genetic variation and identifying disease-causing variants. Traditional approaches, including statistical, pedigree-based, and read-based phasing, face challenges such as limited accuracy for rare variants and reliance on external reference panels. To address these limitations, we developed TinkerHap, a novel phasing algorithm that integrates a read-based phaser, based on a pairwise distance-based unsupervised classification, with external phased data, such as statistical or pedigree phasing. We evaluated TinkerHap’s performance against other phasing algorithms using 1,040 parent-offspring trios from the UK Biobank (Illumina short-reads) and GIAB Ashkenazi trio (PacBio long-reads). TinkerHap’s read-based phaser alone achieved higher phasing accuracies than all other algorithms with 95.1% for short-reads (second best: 94.8%) and 97.5% for long-reads (second best: 95.5%). Its hybrid approach further enhanced short-read performance to 96.3% accuracy and was able to phase 99.5% of all heterozygous sites. TinkerHap also extended haplotype block sizes to a median of 79,449 base-pairs for long-reads (second best: 68,303 bp) and demonstrated higher accuracy for both SNPs and indels. This combination of a robust read-based algorithm and hybrid strategy makes TinkerHap a uniquely powerful tool for genomic analyses.

Related projects

Type 1 Diabetes affects over 20 million people worldwide, and it has severe and costly implications on the health of patients. A better understanding of…

Institution:
Jerusalem Multidisciplinary College, Israel

All projects