Format

Send to

Choose Destination
Nat Commun. 2017 Nov 3;8(1):1293. doi: 10.1038/s41467-017-01389-4.

Dense and accurate whole-chromosome haplotyping of individual genomes.

Author information

1
European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Building 3226, 9713 AV, Groningen, The Netherlands.
2
Max Planck Institute for Informatics, Saarbrücken, Germany.
3
Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, 66123, Saarbrücken, Germany.
4
Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, 66123, Saarbrücken, Germany.
5
Graduate School of Computer Science, Saarland University, Saarland Informatics Campus E1.3, 66123, Saarbrücken, Germany.
6
European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117, Heidelberg, Germany.
7
Terry Fox Laboratory, BC Cancer Agency, 601 West 10th Avenue, Vancouver, BC, V5Z 1L3, Canada.
8
Department of Medical Genetics, University of British Columbia, 2350 Health Science Mall, Vancouver, BC, V6T 1Z3, Canada.
9
Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, 66123, Saarbrücken, Germany. t.marschall@mpi-inf.mpg.de.
10
Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, 66123, Saarbrücken, Germany. t.marschall@mpi-inf.mpg.de.

Abstract

The diploid nature of the human genome is neglected in many analyses done today, where a genome is perceived as a set of unphased variants with respect to a reference genome. This lack of haplotype-level analyses can be explained by a lack of methods that can produce dense and accurate chromosome-length haplotypes at reasonable costs. Here we introduce an integrative phasing strategy that combines global, but sparse haplotypes obtained from strand-specific single-cell sequencing (Strand-seq) with dense, yet local, haplotype information available through long-read or linked-read sequencing. We provide comprehensive guidance on the required sequencing depths and reliably assign more than 95% of alleles (NA12878) to their parental haplotypes using as few as 10 Strand-seq libraries in combination with 10-fold coverage PacBio data or, alternatively, 10X Genomics linked-read sequencing data. We conclude that the combination of Strand-seq with different technologies represents an attractive solution to chart the genetic variation of diploid genomes.

PMID:
29101320
PMCID:
PMC5670131
DOI:
10.1038/s41467-017-01389-4
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Nature Publishing Group Icon for PubMed Central
Loading ...
Support Center