Format

Send to

Choose Destination
See comment in PubMed Commons below
IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):619-28. doi: 10.1109/TCBB.2011.111. Epub 2011 Aug 4.

The impact of normalization and phylogenetic information on estimating the distance for metagenomes.

Author information

  • 1Academia Sinica, Taipei.

Abstract

Metagenomics enables the study of unculturable microorganisms in different environments directly. Discriminating between the compositional differences of metagenomes is an important and challenging problem. Several distance functions have been proposed to estimate the differences based on functional profiles or taxonomic distributions; however, the strengths and limitations of such functions are still unclear. Initially, we analyzed three well-known distance functions and found very little difference between them in the clustering of samples. This motivated us to incorporate suitable normalizations and phylogenetic information into the functions so that we could cluster samples from both real and synthetic data sets. The results indicate significant improvement in sample clustering over that derived by rank-based normalization with phylogenetic information, regardless of whether the samples are from real or synthetic microbiomes. Furthermore, our findings suggest that considering suitable normalizations and phylogenetic information is essential when designing distance functions for estimating the differences between metagenomes. We conclude that incorporating rank-based normalization with phylogenetic information into the distance functions helps achieve reliable clustering results.

[PubMed - indexed for MEDLINE]
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for IEEE Engineering in Medicine and Biology Society
    Loading ...
    Write to the Help Desk