Format

Send to

Choose Destination
See comment in PubMed Commons below
BMC Evol Biol. 2016 Dec 1;16(1):262.

PhyInformR: phylogenetic experimental design and phylogenomic data exploration in R.

Author information

1
North Carolina Museum of Natural Sciences, Raleigh, North Carolina, 27601, USA. alex.dornburg@naturalsciences.org.
2
Department of Biostatistics, Yale University, New Haven, Connecticut, 06510, USA.
3
Center for Infectious Disease Modeling and Analysis, Yale School of Public Health, Yale University, New Haven, Connecticut, 06510, USA.
4
Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, 06525, USA.
5
Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, 06511, USA.

Abstract

BACKGROUND:

Analyses of phylogenetic informativeness represent an important step in screening potential or existing datasets for their proclivity toward convergent or parallel evolution of molecular sites. However, while new theory has been developed from which to predict the utility of sequence data, adoption of these advances have been stymied by a lack of software enabling application of advances in theory, especially for large next-generation sequence data sets. Moreover, there are no theoretical barriers to application of the phylogenetic informativeness or the calculation of quartet internode resolution probabilities in a Bayesian setting that more robustly accounts for uncertainty, yet there is no software with which a computationally intensive Bayesian approach to experimental design could be implemented.

RESULTS:

We introduce PhyInformR, an open source software package that performs rapid calculation of phylogenetic information content using the latest advances in phylogenetic informativeness based theory. These advances include modifications that incorporate uneven branch lengths and any model of nucleotide substitution to provide assessments of the phylogenetic utility of any given dataset or dataset partition. PhyInformR provides new tools for data visualization and routines optimized for rapid statistical calculations, including approaches making use of Bayesian posterior distributions and parallel processing. By implementing the computation on user hardware, PhyInformR increases the potential power users can apply toward screening datasets for phylogenetic/genomic information content by orders of magnitude.

CONCLUSIONS:

PhyInformR provides a means to implement diverse substitution models and specify uneven branch lengths for phylogenetic informativeness or calculations providing quartet based probabilities of resolution, produce novel visualizations, and facilitate analyses of next-generation sequence datasets while incorporating phylogenetic uncertainty through the use parallel processing. As an open source program, PhyInformR is fully customizable and expandable, thereby allowing for advanced methodologies to be readily integrated into local bioinformatics pipelines. Software is available through CRAN and a package containing the software, a detailed manual, and additional sample data is also provided freely through github: https://github.com/carolinafishes/PhyInformR .

PMID:
27905871
PMCID:
PMC5134231
DOI:
10.1186/s12862-016-0837-3
[Indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for BioMed Central Icon for PubMed Central
    Loading ...
    Support Center