Cancer remains a major global health burden, causing nearly 10 million deaths annually. Early detection significantly improves outcomes, yet many cancers, such as lung and oesophageal, remain difficult to diagnose at early stages due to asymptomatic onset and aggressive progression. Genomic instability-comprising somatic mutations and chromosomal alterations-drives tumour evolution and offers opportunities for molecular-based detection. Minimally invasive methods such as liquid biopsy and capsule sponge sampling show promise, but low tumour DNA content in these samples presents a major challenge, especially in early-stage disease.
To overcome this, we propose to leverage the UK Biobank’s extensive genotype and sequencing resources to perform population-scale statistical phasing, enabling the construction of high-confidence, sample-specific haplotypes. These will enhance sensitivity in detecting allelic imbalance and copy number alterations (CNAs) from tumour-derived DNA in low-purity samples.
Building on this principle, we developed AstroCNA, a tumour-naïve, cost-efficient method that uses statistical phasing of >500,000 individuals to infer haplotypes and aggregate signals across phased heterozygous SNPs. This approach could improve signal-to-noise ratio, enabling robust CNA detection from moderate-depth WGS data, even without matched normal samples.
In parallel, we aim to utilise UK Biobank data not only for applying AstroCNA at scale, but also to evaluate and refine statistical phasing methods themselves. By comparing phasing accuracy across diverse populations and genomic regions, we hope to identify factors influencing phasing quality and improve its utility for liquid biopsy applications.
This project will advance allele-specific CNA detection from non-invasive samples, enabling earlier diagnosis and longitudinal monitoring of cancer and precancerous conditions.