Send to

Choose Destination
See comment in PubMed Commons below
Front Genet. 2013 Apr 2;4:41. doi: 10.3389/fgene.2013.00041. eCollection 2013.

Multivariate analysis of functional metagenomes.

Author information

Department of Biology, San Diego State University San Diego, CA, USA.


Metagenomics is a primary tool for the description of microbial and viral communities. The sheer magnitude of the data generated in each metagenome makes identifying key differences in the function and taxonomy between communities difficult to elucidate. Here we discuss the application of seven different data mining and statistical analyses by comparing and contrasting the metabolic functions of 212 microbial metagenomes within and between 10 environments. Not all approaches are appropriate for all questions, and researchers should decide which approach addresses their questions. This work demonstrated the use of each approach: for example, random forests provided a robust and enlightening description of both the clustering of metagenomes and the metabolic processes that were important in separating microbial communities from different environments. All analyses identified that the presence of phage genes within the microbial community was a predictor of whether the microbial community was host-associated or free-living. Several analyses identified the subtle differences that occur with environments, such as those seen in different regions of the marine environment.


canonical discriminant analysis; metagenomics; microbiology; principal component analysis; random forest; statistics

PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Frontiers Media SA Icon for PubMed Central
    Loading ...
    Support Center