Format

Send to

Choose Destination
See comment in PubMed Commons below
Mol Biol Evol. 2013 May;30(5):1145-58. doi: 10.1093/molbev/mst016. Epub 2013 Jan 30.

Maximum likelihood estimation of frequencies of known haplotypes from pooled sequence data.

Author information

1
Bioinformatics Interdepartmental Program, University of California, Los Angeles, USA.

Abstract

DNA samples are often pooled, either by experimental design or because the sample itself is a mixture. For example, when population allele frequencies are of primary interest, individual samples may be pooled together to lower the cost of sequencing. Alternatively, the sample itself may be a mixture of multiple species or strains (e.g., bacterial species comprising a microbiome or pathogen strains in a blood sample). We present an expectation-maximization algorithm for estimating haplotype frequencies in a pooled sample directly from mapped sequence reads, in the case where the possible haplotypes are known. This method is relevant to the analysis of pooled sequencing data from selection experiments, as well as the calculation of proportions of different species within a metagenomics sample. Our method outperforms existing methods based on single-site allele frequencies, as well as simple approaches using sequence read data. We have implemented the method in a freely available open-source software tool.

PMID:
23364324
PMCID:
PMC3670732
DOI:
10.1093/molbev/mst016
[Indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Silverchair Information Systems Icon for PubMed Central
    Loading ...
    Support Center