Last updated:
Author(s):
Olivier Delaneau, Jean-François Zagury, Matthew R. Robinson, Jonathan L. Marchini, Emmanouil T. Dermitzakis
Publish date:
28 November 2019
Journal:
Nature Communications
PubMed ID:
31780650

Abstract

The number of human genomes being genotyped or sequenced increases exponentially and efficient haplotype estimation methods able to handle this amount of data are now required. Here we present a method, SHAPEIT4, which substantially improves upon other methods to process large genotype and high coverage sequencing datasets. It notably exhibits sub-linear running times with sample size, provides highly accurate haplotypes and allows integrating external phasing information such as large reference panels of haplotypes, collections of pre-phased variants and long sequencing reads. We provide SHAPEIT4 in an open source format and demonstrate its performance in terms of accuracy and running times on two gold standard datasets: the UK Biobank data and the Genome In A Bottle.

Related projects

Despite decades of study, there is generally a poor understanding of the modifiable risk factors for common disease (lifestyle, diet, environmental exposure), with a limited…

Institution:
Institute of Science and Technology, Austria

All projects