Display Settings:

Format

Send to:

Choose Destination
    J Bioinform Comput Biol. 2009 Jun;7(3):455-71.

    An ORFome assembly approach to metagenomics sequences analysis.

    Source

    School of Informatics, Indiana University, Bloomington, IN 47408, USA. yye@indiana.edu

    Abstract

    Metagenomics is an emerging methodology for the direct genomic analysis of a mixed community of uncultured microorganisms. The current analyses of metagenomics data largely rely on the computational tools originally designed for microbial genomics projects. The challenge of assembling metagenomic sequences arises mainly from the short reads and the high species complexity of the community. Alternatively, individual (short) reads will be searched directly against databases of known genes (or proteins) to identify homologous sequences. The latter approach may have low sensitivity and specificity in identifying homologous sequences, which may further bias the subsequent diversity analysis. In this paper, we present a novel approach to metagenomic data analysis, called Metagenomic ORFome Assembly (MetaORFA). The whole computational framework consists of three steps. Each read from a metagenomics project will first be annotated with putative open reading frames (ORFs) that likely encode proteins. Next, the predicted ORFs are assembled into a collection of peptides using an EULER assembly method. Finally, the assembled peptides (i.e. ORFome) are used for database searching of homologs and subsequent diversity analysis. We applied MetaORFA approach to several metagenomics datasets with low coverage short reads. The results show that MetaORFA can produce long peptides even when the sequence coverage of reads is extremely low. Hence, the ORFome assembly significantly increases the sensitivity of homology searching, and may potentially improve the diversity analysis of the metagenomic data. This improvement is especially useful for metagenomic projects when the genome assembly does not work because of the low sequence coverage.

    PMID:
    19507285
    [PubMed - indexed for MEDLINE]
    PMCID:
    PMC2829862
    Free PMC Article

    Images from this publication.See all images (6) Free text

    Fig. 2
    Fig. 4
    Fig. 6
    Fig. 1
    Fig. 3
    Fig. 5

      Supplemental Content

      Icon for PubMed Central

      Save items

      loading

      Recent activity

      Your browsing activity is empty.

      Activity recording is turned off.

      Turn recording back on

      See more...
      Write to the Help Desk