Send to

Choose Destination
See comment in PubMed Commons below
J Virol Methods. 2014 Jul;203:73-80. doi: 10.1016/j.jviromet.2014.03.008. Epub 2014 Mar 26.

PAPNC, a novel method to calculate nucleotide diversity from large scale next generation sequencing data.

Author information

Advanced Biomedical Computing Center, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, Frederick, MD, United States. Electronic address:
HIV Drug Resistance Program, NCI, Frederick, MD, United States.
Division of Infectious Diseases, University of Pittsburgh, Pittsburgh, PA, United States.
Department of Molecular Biology and Microbiology, Tufts University, Boston, MA, United States.


Estimating viral diversity in infected patients can provide insight into pathogen evolution and emergence of drug resistance. With the widespread adoption of deep sequencing, it is important to develop tools to accurately calculate population diversity from very large datasets. Current methods for estimating diversity that are based on multiple alignments are not practical to apply to such data. In this study, the authors report a novel method (Pairwise Alignment Positional Nucleotide Counting, PAPNC) for estimating population diversity from 454 sequence data. The diversity measurements determined using this method were comparable to those calculated by average pairwise difference (APD) of multiply aligned sequences using MEGA5. Diversities were estimated for 9 patient plasma HIV samples sequenced with Titanium 454 technology and by single-genome sequencing (SGS). Diversities calculated from deep sequencing using PAPNC ranged from 0.002 to 0.021 while APD measurements calculated from SGS data ranged proximately from 0.001 to 0.018, with the difference being attributable to PCR error (contributing background diversity of 0.0016 in a control sample). Comparison of APDs estimated from 100 sets of sequences drawn at random from 454 generated data and from corresponding SGS data showed very close correlation between the two methods with R(2) of 0.96, and differing on average by about 1% (after correction for PCR error). The authors have developed a novel method that is good for calculating genetic diversities for large scale datasets from next generation sequencing. It can be implemented easily as a function in available variation calling programs like SAMtools or haplotype reconstruction software for nucleotide genetic diversity calculation. A Perl script implementing this method is available upon request.


HIV-1; Next generation sequencing; Viral population diversity calculation

[Indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Elsevier Science Icon for PubMed Central
    Loading ...
    Support Center