Logo of narLink to Publisher's site
Nucleic Acids Res. Jul 2010; 38(13): 4207–4217.
Published online Mar 9, 2010. doi:  10.1093/nar/gkq140
PMCID: PMC2910039

Transposases are the most abundant, most ubiquitous genes in nature

Abstract

Genes, like organisms, struggle for existence, and the most successful genes persist and widely disseminate in nature. The unbiased determination of the most successful genes requires access to sequence data from a wide range of phylogenetic taxa and ecosystems, which has finally become achievable thanks to the deluge of genomic and metagenomic sequences. Here, we analyzed 10 million protein-encoding genes and gene tags in sequenced bacterial, archaeal, eukaryotic and viral genomes and metagenomes, and our analysis demonstrates that genes encoding transposases are the most prevalent genes in nature. The finding that these genes, classically considered as selfish genes, outnumber essential or housekeeping genes suggests that they offer selective advantage to the genomes and ecosystems they inhabit, a hypothesis in agreement with an emerging body of literature. Their mobile nature not only promotes dissemination of transposable elements within and between genomes but also leads to mutations and rearrangements that can accelerate biological diversification and—consequently—evolution. By securing their own replication and dissemination, transposases guarantee to thrive so long as nucleic acid-based life forms exist.

INTRODUCTION

Since life first emerged, organisms have been struggling for survival and competing over the finite resources within their ecosystems (1,2). This struggle for survival not only is confined to the organism level but it also applies to individual genes (3) and even non-coding DNA segments (4,5). As a corollary, a gene’s success can be determined by its ability to persist in nature and to be spread throughout genomes and biomes (6). For this to take place, genes need some sequence plasticity to adapt to different environments while retaining enough sequence conservation to preserve the structure of their encoded proteins and the identity of their encoded biological functions (7,8).

Every time a new genome is sequenced, many genes are identified and annotated based on their homology to sequences available in databases, but new genes with novel functions are also identified, adding to the universal gene pool. To date, no study has systematically and directly surveyed the millions of protein-encoding genes (PEGs) deposited in sequence databases to identify their relative prevalence. There have been several challenges to such an endeavor: (i) the absence of numerical parameters to assess a gene’s prevalence; (ii) the lack of fair representation of the tree of life within available sequence data (9,10) and (iii) the difficulty of defining what is meant by ‘same gene’ in different organisms and ecosystems.

To overcome these difficulties, (i) we calculated both the abundance and ubiquity of all known biological functions encoded in genomes and ecosystems to estimate their prevalence, with the assumption that these values will be correlated with gene fitness; (ii) we surveyed both genomic and metagenomic data sets to reduce bias caused by the uneven sampling of the tree of life in genomic data sets; and (iii) we defined similar genes as those encoding proteins with similar specific biological functions. In some instances, this definition could be regarded as an oversimplification, notably in cases of convergent evolution or homoplasy, where multiple genes of different origin evolve to perform similar biological functions. However, the majority of current gene annotations are specific enough to distinguish many instances of paralogous genes or different classes within gene/protein families. It is also understandable that different genes are under different selection pressures, as some are forced to endure mutations and tolerate sequence variability to escape pressure (e.g. bacterial genes encoding immunogenic proteins that are under pressure of the host immune system and genes encoding surface proteins that are easily recognized by predators) while others are under strict sequence conservation pressure (e.g. genes encoding housekeeping enzymes and essential biological functions).

Importantly, in determining gene prevalence we distinguished between ubiquity and abundance. Ubiquity is one of the indicators of essentiality, while abundance without ubiquity is an indicator of adaptive, organism-specific or habitat-specific functionality. In other words, ubiquitous genes are assumed to be those that carry essential functions and are thus indispensable in every genome (elements of core genomes) or every ecosystem (eco-essential genes). On the other hand, genes that are overly abundant in few ecosystems and absent in others are likely to play essential habitat-specific roles (e.g. photosynthesis, anaerobic metabolism, detoxification, etc.).

Contenders for the ‘fittest gene’ title include the gene encoding ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO), an enzyme that plays a critical role in the fixation of carbon dioxide via the Calvin cycle and that has been touted as the single most successful, most abundant enzyme on the planet (11). Genes encoding ribosomal proteins are also plausible candidates. However, those are largely limited to cellular life forms, are not essential and almost absent in viruses (12), and are divergent between eukaryotes and prokaryotes. Additionally, DNA polymerase genes and other genes involved in DNA synthesis and nucleotide/nucleoside metabolism (e.g. ribonucleotide reductases, RNR) are essential for DNA-based life and are not restricted to cellular organisms, being found in viral genomes as well. Their essentiality favors them as strong competitors; yet, they are often present at one or few copies per genome. To our surprise, none of the previous candidate genes topped the list of the most abundant, most ubiquitous genes. Instead, our analysis singled out genes encoding transposases as the most abundant genes in genomes and metagenomes, and the most ubiquitous in metagenomes.

ANALYSIS OF GENOMES AND METAGENOMES

To determine the most abundant non-hypothetical PEG, we examined almost 10 million annotated genes or gene tags: over 3.2 million PEGs in fully sequenced viral, bacterial, archaeal and eukaryotic genomes (2137 genomes on 1 May 2009) and over 6.7 million environmental gene tags (EGT)—with significant matches to known proteins—in 187 random community genomes (metagenomes). For functional assignments, we mostly relied on the annotations available in the SEED database (13) because it uses subsystems-based controlled vocabulary curated by human experts and automatically propagated among genomes (14). For consistency, the same SEED subsystems were used for the annotation of all metagenomic data sets described in this study (15).

Analysis of complete genome sequences

We screened 2137 complete genomes (47 archaeal, 725 bacterial, 29 eukaryotic and 1336 viral genomes at the time this study was performed) available in the SEED database (URL: http://seed-viewer.theseed.org) and identified 37 258 PEGs (1.163% of all PEGs) annotated as transposase-related. Out of these, 26 625 (0.825% of all PEGs) were explicitly annotated as ‘transposases’, 360 were annotated as ‘degenerate transposases’, and then there were a variety of insertion sequence-related transposases, which may or may not be functional. Even when these ambiguous annotations were excluded from the final counts, transposases remained the most abundant PEGs in the completely sequenced genomes (Figure 1).

Figure 1.
Abundance of different functional roles in 2137 genomes plotted against the ubiquity of these functional roles (defined as the number of genomes in which the functional role is represented at least once). r, Pearson’s product moment correlation ...

These data imply that out of a set of 2000 randomly sampled genes (the average number of genes in a typical bacterial genome), 22 genes are expected to encode transposases, at least 16 of which are likely functional. Obviously, genomes that have transposase genes tend to have them in multiple copies; this explains why although two-thirds of sequenced genomes (mostly viral) lack known functional transposases, the average number of transposases—when present—is 38.42 per genome (Table 1 and Supplementary Table S1). This observation is also in agreement with reports that transposases are unequally distributed among bacterial genomes, with higher abundance in facultative pathogens and free-living bacteria than in obligate pathogens and endosymbionts (16), and with extraordinarily high numbers in some species, e.g. Crocosphaera watsonii (17,18).

Table 1.
The 20 most abundant non-hypothetical protein-encoding genes in all sequenced genomes

While the abundance of transposase genes in microbial genomes has been recognized for long time, only recently has it been exploited for inferring microbial cohabitation patterns and lateral gene transfer (19). Next to transposases, the most abundant functional roles in all sequenced genomes include ABC transporters, transcriptional regulators of different families, signal transduction kinases, chemotaxis proteins, acetyl- and glycosyl- transferases and cysteine desulfurase (Table 1 and Supplementary Table S1). On the other hand, the most ubiquitous functional roles in sequenced genomes are encoded by low-copy-number genes that consequently have a low overall abundance. Only four out of the 100 most ubiquitous functional genes have a mean copy number >2 per genome. These are genes encoding thioredoxin reductase; thioredoxin; cysteine desulfurase and the ABC transporter, ATP-binding protein (Supplementary Table S2). The list of most ubiquitous functional roles in genomes was topped by tRNA synthetases (Figure 1), and other genes associated with protein synthesis and post-translational protein sorting (e.g. translation elongation factor and preprotein translocase, Supplementary Table S2).

Analysis of metagenomic sequences

In spite of the striking prevalence and high copy numbers of transposase genes in fully sequenced genomes, the use of those data sets is prone to biases. The available genomes unevenly represent the tree of life as they mostly correspond to cultured organisms from just four bacterial phyla (9). Moreover, there is an over-representation of microbes of interest to humans (20), such as bacterial pathogens and microbes used in agriculture or industry (21). Finally, while viruses are at least 10 times as abundant as bacteria in nature (22,23), sequenced viral genomes are lagging behind both in terms of numbers (~2:1 viral to bacterial genome ratio) and annotation quality (most encoded proteins are of unknown functions). In contrast, analysis of community genomes (metagenomes) offers a less-biased representation of life forms and biological functions in various habitats.

The term ‘metagenome’ describes the collective genomes found in a particular ecosystem (24,25). Since the first uncultured viral community genomic sequences were published in 2002 (26), metagenomics has emerged as a rapid and efficient method of identifying not only the species present in a given ecosystem but also the ecosystem-associated metabolic signatures or patterns (27–31). The emergence of low-cost, high-throughput next-generation sequencing technologies (32–37) has enabled the quick implementation of metagenomics in the analysis of different environments, allowing an unprecedented view of biodiversity (25,38–42). Over the past few years, metagenomic sequencing has been used to explore a wide range of environments, encompassing various marine ecosystems (28,43–47), hydrothermal vents (48,49), corals (50–52), salterns (53,54), soil (55–57), sludge (58), mines (59), human and animal guts (60–64) and lungs (65), microbialites (66,67) and even mosquitoes (31).

Metagenomic analysis is shifting the paradigm from organism/genome-centric to gene-centric and pathway-centric approaches to understanding biodiversity (68,69). Several bioinformatics and statistical tools allow the metabolic reconstruction of a particular ecosystem by enumerating EGTs in metagenomes and binning them either phylogenetically or biochemically (15,68,70–73), as well as the comparison of multiple metagenomes (59,74,75).

In this study, we followed a gene-centric approach by enumerating EGTs, and estimating the abundance and ubiquity of their different functional roles in 187 different metagenomic samples representing a broad range of environments. Assessing EGT abundance in metagenomic data is slightly different from determining PEG frequency in fully sequenced genomes. In genomes, a single, full-length copy of a gene reflects a single occurrence of that gene in one cell of an organism. In metagenomic data, multiple occurrence of an EGT can be attributed to either multiple copies of the same gene, multiple orthologs (from different genomes), multiple paralogs or just multiple sequences covering different parts of the exact same DNA segment. Moreover, the coding sequence length is a potential confounding factor: longer genes are more likely to be sampled by random sequencing (unless the sample is large enough to provide 100% coverage). For those reasons, the frequency of each EGT was normalized to the mean length of the most similar proteins [from BLASTX (76) results] to generate an abundance index, which was further divided by the number of informative sequence reads (those sequence tags matching annotated proteins in known databases) to generate a normalized abundance index (see the legend of Table 2 for more details).

Table 2.
The 20 most abundant functional roles in metagenomes

The metagenomic data sets, which have been sequenced by different research groups, have been analyzed, consistently annotated and made publicly available through the metagenomics RAST server [http://metagenomics.theseed.org (15)]. They include both free-living and metazoan-associated viral, bacterial and eukaryotic sequences from autotrophic and heterotrophic communities from a wide variety of environments. In the analyzed metagenomes, the two most abundant functional genes are related to transposable elements [transposase and the retrotransposon-related p150 protein (77)]. Next to these, a set of photosynthesis-related genes; genes encoding viral structural, nonstructural, capsid and integrase proteins; genes associated with DNA replication; and genes involved in DNA synthesis are all among the most abundant biological functions in environmental metagenomes (Table 2 and Supplementary Table S3).

Since gene abundance in metagenomes is sensitive to sampling bias and sequencing depth, we also combined ubiquity with abundance data. The combined analysis confirmed the prevalence of transposases (abundant in 95% of the analyzed metagenomes) over the retrotransposon-related p150 genes (overly abundant in only 36% of these metagenomes) and other replication and DNA metabolism-related genes that are equally ubiquitous but less abundant than transposases (Figure 2). The abundance of all analyzed non-hypothetical functions does not necessarily correlate with their ubiquity (Pearson correlation index = 0.524, Figure 2), i.e. many EGTs were pervasive in some ecosystems but absent in others (e.g. photosystem II proteins, p150 and viral structural genes; Table 2). Ubiquitous EGTs, on the other hand, include those matching transposases, DNA polymerases and enzymes involved in nucleotide metabolism (e.g. dTDP-glucose 4,6-dehydratase, UDP-glucose 4-epimerase and RNR; see Table 3 and Supplementary Table S4). Most of the ubiquitous EGTs are likely to be ‘housekeeping’ and essential for life, rather than habitat-specific (Figure 2). Additionally, many of these EGTs (e.g. DNA polymerases and RNRs) are found in all cellular and non-cellular biological entities, including viruses. As with genome sequence data, transposases are unequally distributed in ecosystems. This unequal distribution is in accordance with studies of ocean community genomics that showed a depth-dependent abundance of transposase genes (30) and a recent study that reported an unusually high abundance of transposase and retroviral integrase genes in a hydrothermal chimney biofilm (49).

Figure 2.
The normalized cumulative abundance indices (nCAI) of different functional roles in 187 metagenomes plotted against the ubiquity of these functional roles (defined as the number of metagenomes in which the functional role is represented at least once). ...
Table 3.
The 20 most ubiquitous functional roles in metagenomes

Other than the predominance of transposases, ABC transporter ATP-binding proteins and phage integrases (Table 2), there is little agreement in the gene abundance data between genomes and metagenomes (Tables 1 and and2).2). In genomic data, the most abundant functional roles reflect the over-representation of bacterial proteins in currently available fully sequenced genomes (2.5 million bacterial proteins versus 560 000 eukaryotic, 100 000 archaeal and 40 000 viral proteins). This bias may decrease when more viral genomes are sequenced and better annotated to reflect their actual distribution in nature. In metagenomic data, abundance indices reflect an overrepresentation of bacterial, archaeal and viral over eukaryotic sequences in currently available data sets; however, this overrepresentation is in agreement with reports that bacteria and archaea dominate the cellular world (78) while viruses are the most abundant biological entities (22,23).

DISCUSSION

The main assumption of this study is that the most successful genes are likely to be prevalent in genomes and ecosystems. We defined the most prevalent gene as the one ‘spreading its DNA around’ and not the one expressing the most protein molecules. Thus, while RuBisCO, for example, is claimed as the most abundant enzyme on Earth (11) based on the estimated number of its protein molecules, its genes are neither the most abundant nor most widely distributed (Supplementary Table S5). In addition, we focused on PEGs and did not include genes encoding ribosomal RNA in the analysis; those are absent in viruses and usually present in multiple copies in cellular genomes [1–15, mean = 4, (79)], which would place them at the 12th rank in gene abundance in all sequenced genomes (compare with Table 1).

This study demonstrates that transposases are the most abundant genes in both completely sequenced genomes and environmental metagenomes, and are also the most ubiquitous in metagenomes. Transposase genes encode DNA-binding enzymes, members of the polynucleotidyl transferase superfamily, that catalyze ‘cut-and-paste’ or ‘copy-and-paste’ reactions promoting the movement of DNA segments to new sites (80). The term transposase is often used to describe what are classically known as DNA transposases or type II transposases. These move double-stranded DNA directly by excision and insertion, and are sometimes associated with insertion sequences, but often just catalyze their own mobilization (81,82). The major group of dsDNA transposases is known as DDE transposases due to their possession of a non-contiguous, highly conserved catalytic triad of two aspartate (D) and one glutamate (E) residues (83). Other protein families that essentially use transposition but lack the DDE motif include tyrosine and serine recombinases, and rolling-circle transposases (82). In addition, within these transposase subclasses, several protein family domains [PFam domains (84)] have been described (49,83), yet a large fraction of transposases identified in genomes and metagenomes remain unclassified.

There are two other classes of transposable elements (Types I and III) that are distinguished as separate categories and were not as abundant or ubiquitous as Type II transposases in our analyzed data sets. Type I includes retrotransposons, which use the enzyme retrotransposase to move DNA by reverse transcription of an RNA intermediate (85). Retrotransposases (Type I transposases) are suggested to be responsible for the majority of ‘junk’ repeats, which make up >40% of the human genome and seem to code for no other genes (86–88). Type III transposable elements are associated with miniature inverted-repeat transposable elements (MITEs) (89,90). Transposases, in general, and Type II transposases, in particular, constitute a highly diverse group of enzymes. It is difficult to provide a robust, consistent scheme for classifying transposase sequences in ecosystems; however, structure-based classification schemes are being developed (83).

The prevalence of transposons (Type II) and retrotransposons (Type I) in eukaryotic genomes has been well documented, but in these genomes they are mostly associated with non-coding, repetitive DNA (91–93). Moreover, Type II transposases are continuously being detected in bacterial, archaeal and, to a lesser extent, bacteriophage genomes. In this work, we demonstrate that these jumping genes are also almost omnipresent in every ecosystem that contains nucleic acid-based life forms.

OUTLOOK

Transposase genes have been classically considered as ‘selfish genes’ with no other purpose than spreading themselves and are thus expected to be universal DNA parasites (6,85). If this were their only raison-d’être, they have certainly fulfilled it by surviving, persisting and prevailing in all ecosystems. An open question is whether their ubiquity is also an indication of eco-essentiality. The finding that transposases are as ubiquitous as housekeeping DNA-processing enzymes but that they outnumber all essential genes (Figure 3) supports the idea that these mobile, self-replicating genes strive to inhabit and multiply in as many genomes as possible.

Figure 3.Figure 3.
Word clouds (created on http://www.wordle.net) representing (A) the 100 most abundant functional roles (Supplementary Table S3) and (B) the 100 most ubiquitous functional roles (Supplementary Table S4) in metagenomes. The font size of each functional ...

Besides the obvious detrimental effect that transposition can cause to host genomes—by inactivating housekeeping genes or impairing the chromosome’s structure—transposases also play beneficial roles (92). For example, transposases may mobilize or activate genes that enhance their hosts’ fitness (94,95), induce advantageous rearrangements (96) or enrich the host’s gene pool (97–100). There are accruing documented examples of transposase genes co-opted by the host to encode transcription factors (99), centromere-binding proteins (100) or generators of diversity in the immune system (97,98), a process described as exaptation [or domestication, from a host-centric view (94)]. Such cases can involve one or a few transposases per genome or, as more recently shown, thousands of transposases (95).

Despite their ubiquity and abundance, there is neither evidence nor reason to believe that transposases encode conserved essential cellular functions. In our opinion, the role of transposases as diversifying agents (94,101) is beneficial enough to be selected for; however, the cost of transposon-induced mutations also puts pressure on the cells to inactivate or delete their transposases (16,91,93,101).

In conclusion, the prevalence of transposases in metagenomes and completely sequenced genomes from bacteria, archaea, eukaryotes and viruses is in accordance with suggestions that they may offer a selective advantage to the genomes and ecosystems that they ‘parasitize’ (17,94,101). The diversification they induce in these genomes and ecosystems is arguably an essential way of maintaining, diversifying and evolving life on our planet.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

National Science Foundation, Division of Biological Infrastructure (DBI-0850356 to R.A.E. and DBI-0850206 to M.B.); the NMPDR project was supported by National Institutes of Health (HHSN266200400042C). Funding for open access charge: National Science Foundation, Division of Biological Infrastructure (DBI-0850356 to R.A.E.).

Conflict of interest statement. None declared.

Supplementary Material

[Supplementary Data]

ACKNOWLEDGEMENTS

The authors thank Anca Segall, Elizabeth Dinsdale, Forest Rohwer, Peter Salamon, Jim Nulton and Ben Felts for stimulating discussions and helpful suggestions, and Moselio Schaechter and Stanley Maloy for valuable suggestions to improve the manuscript.

REFERENCES

1. Darwin C. On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life. London: John Murray; 1859.
2. Huxley JS. Evolution: The Modern Synthesis. 1st edn. London: Harper; 1942.
3. Dawkins R. The Selfish Gene. Oxford: Oxford University Press; 1976.
4. Edgell DR, Fast NM, Doolittle WF. Selfish DNA: the best defense is a good offense. Curr. Biol. 1996;6:385–388. [PubMed]
5. Doolittle WF, Sapienza C. Selfish genes, the phenotype paradigm and genome evolution. Nature. 1980;284:601–603. [PubMed]
6. Orgel LE, Crick FH. Selfish DNA: the ultimate parasite. Nature. 1980;284:604–607. [PubMed]
7. Drummond DA, Wilke CO. Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell. 2008;134:341–352. [PMC free article] [PubMed]
8. Koonin EV. Darwinian evolution in the light of genomics. Nucleic Acids Res. 2009;37:1011–1034. [PMC free article] [PubMed]
9. Hugenholtz P. Exploring prokaryotic diversity in the genomic era. Genome Biol. 2002;3:REVIEWS0003. [PMC free article] [PubMed]
10. Wu D, Hugenholtz P, Mavromatis K, Pukall R, Dalin E, Ivanova NN, Kunin V, Goodwin L, Wu M, Tindall BJ, et al. A phylogeny-driven genomic encyclopaedia of bacteria and archaea. Nature. 2009;462:1056–1060. [PMC free article] [PubMed]
11. Dhingra A, Portis A.R., Jr, Daniell H. Enhanced translation of a chloroplast-expressed RbcS gene restores small subunit levels and photosynthesis in nuclear RbcS antisense plants. Proc. Natl Acad. Sci. USA. 2004;101:6315–6320. [PMC free article] [PubMed]
12. Kristensen DM, Mushegian AR, Dolja VV, Koonin EV. New dimensions of the virus world discovered through metagenomics. Trends Microbiol. 2010;18:11–19. [PMC free article] [PubMed]
13. Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, de Crecy-Lagard V, Diaz N, Disz T, Edwards R, et al. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 2005;33:5691–5702. [PMC free article] [PubMed]
14. Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, Formsma K, Gerdes S, Glass EM, Kubal M, et al. The RAST Server: rapid annotations using subsystems technology. BMC Genomics. 2008;9:75. [PMC free article] [PubMed]
15. Meyer F, Paarmann D, D'Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez A, Stevens R, Wilke A, et al. The metagenomics RAST server—a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics. 2008;9:386. [PMC free article] [PubMed]
16. Ochman H, Davalos LM. The nature and dynamics of bacterial genomes. Science. 2006;311:1730–1733. [PubMed]
17. Mes TH, Doeleman M. Positive selection on transposase genes of insertion sequences in the Crocosphaera watsonii genome. J. Bacteriol. 2006;188:7176–7185. [PMC free article] [PubMed]
18. Zehr JP, Bench SR, Mondragon EA, McCarren J, DeLong EF. Low genomic diversity in tropical oceanic N2-fixing cyanobacteria. Proc. Natl Acad. Sci. USA. 2007;104:17807–17812. [PMC free article] [PubMed]
19. Hooper SD, Mavromatis K, Kyrpides NC. Microbial co-habitation and lateral gene transfer: what transposases can tell us. Genome Biol. 2009;10:R45. [PMC free article] [PubMed]
20. Aziz RK. The case for biocentric microbiology. Gut. Pathogen. 2009;1:16. [PMC free article] [PubMed]
21. Ahmed N. A flood of microbial genomes-do we need more? PLoS ONE. 2009;4:e5831. [PMC free article] [PubMed]
22. Furuse K, Osawa S, Kawashiro J, Tanaka R, Ozawa A, Sawamura S, Yanagawa Y, Nagao T, Watanabe I. Bacteriophage distribution in human faeces: continuous survey of healthy subjects and patients with internal and leukaemic diseases. J. Gen. Virol. 1983;64(Pt 9):2039–2043. [PubMed]
23. Breitbart M, Rohwer F. Here a virus, there a virus, everywhere the same virus? Trends Microbiol. 2005;13:278–284. [PubMed]
24. Handelsman J, Rondon MR, Brady SF, Clardy J, Goodman RM. Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. Chem. Biol. 1998;5:R245–R249. [PubMed]
25. Riesenfeld CS, Schloss PD, Handelsman J. Metagenomics: genomic analysis of microbial communities. Annu. Rev. Genet. 2004;38:525–552. [PubMed]
26. Breitbart M, Salamon P, Andresen B, Mahaffy JM, Segall AM, Mead D, Azam F, Rohwer F. Genomic analysis of uncultured marine viral communities. Proc. Natl Acad. Sci. USA. 2002;99:14250–14255. [PMC free article] [PubMed]
27. Tyson GW, Chapman J, Hugenholtz P, Allen EE, Ram RJ, Richardson PM, Solovyev VV, Rubin EM, Rokhsar DS, Banfield JF. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature. 2004;428:37–43. [PubMed]
28. Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004;304:66–74. [PubMed]
29. Tringe SG, von Mering C, Kobayashi A, Salamov AA, Chen K, Chang HW, Podar M, Short JM, Mathur EJ, Detter JC, et al. Comparative metagenomics of microbial communities. Science. 2005;308:554–557. [PubMed]
30. DeLong EF, Preston CM, Mincer T, Rich V, Hallam SJ, Frigaard NU, Martinez A, Sullivan MB, Edwards R, Brito BR, et al. Community genomics among stratified microbial assemblages in the ocean’s interior. Science. 2006;311:496–503. [PubMed]
31. Dinsdale EA, Edwards RA, Hall D, Angly F, Breitbart M, Brulc JM, Furlan M, Desnues C, Haynes M, Li L, et al. Functional metagenomic profiling of nine biomes. Nature. 2008;452:629–632. [PubMed]
32. Ronaghi M, Karamohamed S, Pettersson B, Uhlen M, Nyren P. Real-time DNA sequencing using detection of pyrophosphate release. Anal. Biochem. 1996;242:84–89. [PubMed]
33. Ronaghi M. Pyrosequencing sheds light on DNA sequencing. Genome Res. 2001;11:3–11. [PubMed]
34. Bennett S. Solexa Ltd. Pharmacogenomics. 2004;5:433–438. [PubMed]
35. Kartalov EP, Quake SR. Microfluidic device reads up to four consecutive base pairs in DNA sequencing-by-synthesis. Nucleic Acids Res. 2004;32:2873–2879. [PMC free article] [PubMed]
36. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376–380. [PMC free article] [PubMed]
37. Schuster SC. Next-generation sequencing transforms today’s biology. Nat. Methods. 2008;5:16–18. [PubMed]
38. Schloss PD, Handelsman J. Biotechnological prospects from metagenomics. Curr. Opin. Biotechnol. 2003;14:303–310. [PubMed]
39. Handelsman J. Metagenomics: application of genomics to uncultured microorganisms. Microbiol. Mol. Biol. Rev. 2004;68:669–685. [PMC free article] [PubMed]
40. Edwards RA, Rohwer F. Viral metagenomics. Nat. Rev. Microbiol. 2005;3:504–510. [PubMed]
41. Xu J. Microbial ecology in the age of genomics and metagenomics: concepts, tools, and recent advances. Mol. Ecol. 2006;15:1713–1731. [PubMed]
42. Casas V, Rohwer F. Phage metagenomics. Methods Enzymol. 2007;421:259–268. [PubMed]
43. Angly FE, Felts B, Breitbart M, Salamon P, Edwards RA, Carlson C, Chan AM, Haynes M, Kelley S, Liu H, et al. The marine viromes of four oceanic regions. PLoS Biol. 2006;4:e368. [PMC free article] [PubMed]
44. Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, Remington K, Eisen JA, Heidelberg KB, Manning G, Li W, et al. The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biol. 2007;5:e16. [PMC free article] [PubMed]
45. Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, Yooseph S, Wu D, Eisen JA, Hoffman JM, Remington K, et al. The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biol. 2007;5:e77. [PMC free article] [PubMed]
46. McDaniel L, Breitbart M, Mobberley J, Long A, Haynes M, Rohwer F, Paul JH. Metagenomic analysis of lysogeny in Tampa Bay: implications for prophage gene expression. PLoS ONE. 2008;3:e3263. [PMC free article] [PubMed]
47. Persson OP, Pinhassi J, Riemann L, Marklund BI, Rhen M, Normark S, Gonzalez JM, Hagstrom A. High abundance of virulence gene homologues in marine bacteria. Environ. Microbiol. 2009;11:1348–1357. [PMC free article] [PubMed]
48. Grzymski JJ, Murray AE, Campbell BJ, Kaplarevic M, Gao GR, Lee C, Daniel R, Ghadiri A, Feldman RA, Cary SC. Metagenome analysis of an extreme microbial symbiosis reveals eurythermal adaptation and metabolic flexibility. Proc. Natl Acad. Sci. USA. 2008;105:17516–17521. [PMC free article] [PubMed]
49. Brazelton WJ, Baross JA. Abundant transposases encoded by the metagenome of a hydrothermal chimney biofilm. ISME J. 2009;3:1420–1424. [PubMed]
50. Dinsdale EA, Pantos O, Smriga S, Edwards RA, Angly F, Wegley L, Hatay M, Hall D, Brown E, Haynes M, et al. Microbial ecology of four coral atolls in the Northern Line Islands. PLoS ONE. 2008;3:e1584. [PMC free article] [PubMed]
51. Vega Thurber RL, Barott KL, Hall D, Liu H, Rodriguez-Mueller B, Desnues C, Edwards RA, Haynes M, Angly FE, Wegley L, et al. Metagenomic analysis indicates that stressors induce production of herpes-like viruses in the coral Porites compressa. Proc. Natl Acad. Sci. USA. 2008;105:18413–18418. [PMC free article] [PubMed]
52. Vega Thurber R, Willner-Hall D, Rodriguez-Mueller B, Desnues C, Edwards RA, Angly F, Dinsdale E, Kelly L, Rohwer F. Metagenomic analysis of stressed coral holobionts. Environ. Microbiol. 2009;11:2148–2163. [PubMed]
53. Santos F, Meyerdierks A, Pena A, Rossello-Mora R, Amann R, Anton J. Metagenomic approach to the study of halophages: the environmental halophage 1. Environ. Microbiol. 2007;9:1711–1723. [PubMed]
54. Rodriguez-Brito B, Li L, Wegley L, Furlan M, Angly F, Breitbart M, Buchanan J, Desnues C, Dinsdale E, Edwards R, et al. Viral and microbial community dynamics in four aquatic environments. ISME J. 2010 [12 February 2010, Epub ahead of print] [PubMed]
55. Kim KH, Chang HW, Nam YD, Roh SW, Kim MS, Sung Y, Jeon CO, Oh HM, Bae JW. Amplification of uncultured single-stranded DNA viruses from rice paddy soil. Appl. Environ. Microbiol. 2008;74:5975–5985. [PMC free article] [PubMed]
56. Pang H, Zhang P, Duan CJ, Mo XC, Tang JL, Feng JX. Identification of cellulase genes from the metagenomes of compost soils and functional characterization of one novel endoglucanase. Curr. Microbiol. 2009;58:404–408. [PubMed]
57. Zhang K, He J, Yang M, Yen M, Yin J. Identifying natural product biosynthetic genes from a soil metagenome by using T7 phage selection. Chembiochem. 2009;10:2599–2606. [PubMed]
58. Kunin V, He S, Warnecke F, Peterson SB, Garcia Martin H, Haynes M, Ivanova N, Blackall LL, Breitbart M, Rohwer F, et al. A bacterial metapopulation adapts locally to phage predation despite global dispersal. Genome Res. 2008;18:293–297. [PMC free article] [PubMed]
59. Edwards RA, Rodriguez-Brito B, Wegley L, Haynes M, Breitbart M, Peterson DM, Saar MO, Alexander S, Alexander E.C., Jr, Rohwer F. Using pyrosequencing to shed light on deep mine microbial ecology. BMC Genomics. 2006;7:57. [PMC free article] [PubMed]
60. Breitbart M, Hewson I, Felts B, Mahaffy JM, Nulton J, Salamon P, Rohwer F. Metagenomic analyses of an uncultured viral community from human feces. J. Bacteriol. 2003;185:6220–6223. [PMC free article] [PubMed]
61. Gill SR, Pop M, Deboy RT, Eckburg PB, Turnbaugh PJ, Samuel BS, Gordon JI, Relman DA, Fraser-Liggett CM, Nelson KE. Metagenomic analysis of the human distal gut microbiome. Science. 2006;312:1355–1359. [PMC free article] [PubMed]
62. Frank DN, Pace NR. Gastrointestinal microbiology enters the metagenomics era. Curr. Opin. Gastroenterol. 2008;24:4–10. [PubMed]
63. Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, Ley RE, Sogin ML, Jones WJ, Roe BA, Affourtit JP, et al. A core gut microbiome in obese and lean twins. Nature. 2009;457:480–484. [PMC free article] [PubMed]
64. Tuohy KM, Gougoulias C, Shen Q, Walton G, Fava F, Ramnani P. Studying the human gut microbiota in the trans-omics era–focus on metagenomics and metabonomics. Curr. Pharm. Des. 2009;15:1415–1427. [PubMed]
65. Willner D, Furlan M, Haynes M, Schmieder R, Angly FE, Silva J, Tammadoni S, Nosrat B, Conrad D, Rohwer F. Metagenomic analysis of respiratory tract DNA viral communities in cystic fibrosis and non-cystic fibrosis individuals. PLoS ONE. 2009;4:e7370. [PMC free article] [PubMed]
66. Desnues C, Rodriguez-Brito B, Rayhawk S, Kelley S, Tran T, Haynes M, Liu H, Furlan M, Wegley L, Chau B, et al. Biodiversity and biogeography of phages in modern stromatolites and thrombolites. Nature. 2008;452:340–343. [PubMed]
67. Breitbart M, Hoare A, Nitti A, Siefert J, Haynes M, Dinsdale E, Edwards R, Souza V, Rohwer F, Hollander D. Metagenomic and stable isotopic analyses of modern freshwater microbialites in Cuatro Cienegas, Mexico. Environ. Microbiol. 2009;11:16–34. [PubMed]
68. Eisen JA. Environmental shotgun sequencing: its potential and challenges for studying the hidden world of microbes. PLoS Biol. 2007;5:e82. [PMC free article] [PubMed]
69. Hugenholtz P, Tyson GW. Microbiology: metagenomics. Nature. 2008;455:481–483. [PubMed]
70. Angly F, Rodriguez-Brito B, Bangor D, McNairnie P, Breitbart M, Salamon P, Felts B, Nulton J, Mahaffy J, Rohwer F. PHACCS, an online tool for estimating the structure and diversity of uncultured viral communities using metagenomic information. BMC Bioinformatics. 2005;6:41. [PMC free article] [PubMed]
71. Krause L, Diaz NN, Goesmann A, Kelley S, Nattkemper TW, Rohwer F, Edwards RA, Stoye J. Phylogenetic classification of short environmental DNA fragments. Nucleic Acids Res. 2008;36:2230–2239. [PMC free article] [PubMed]
72. Kunin V, Copeland A, Lapidus A, Mavromatis K, Hugenholtz P. A bioinformatician’s guide to metagenomics. Microbiol. Mol. Biol. Rev. 2008;72:557–578. [PMC free article] [PubMed]
73. Schloss PD, Handelsman J. A statistical toolbox for metagenomics: assessing functional diversity in microbial communities. BMC Bioinformatics. 2008;9:34. [PMC free article] [PubMed]
74. Rodriguez-Brito B, Rohwer F, Edwards RA. An application of statistics to comparative metagenomics. BMC Bioinformatics. 2006;7:162. [PMC free article] [PubMed]
75. Huson DH, Richter DC, Mitra S, Auch AF, Schuster SC. Methods for comparative metagenomics. BMC Bioinformatics. 2009;10(Suppl. 1):S12. [PMC free article] [PubMed]
76. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. [PMC free article] [PubMed]
77. Sassaman DM, Dombroski BA, Moran JV, Kimberland ML, Naas TP, DeBerardinis RJ, Gabriel A, Swergold GD, Kazazian H.H., Jr Many human L1 elements are capable of retrotransposition. Nat. Genet. 1997;16:37–43. [PubMed]
78. Whitman WB, Coleman DC, Wiebe WJ. Prokaryotes: the unseen majority. Proc. Natl Acad. Sci. USA. 1998;95:6578–6583. [PMC free article] [PubMed]
79. Lee ZM, Bussema C, 3rd, Schmidt TM. rrnDB: documenting the number of rRNA and tRNA genes in bacteria and archaea. Nucleic Acids Res. 2009;37:D489–D493. [PMC free article] [PubMed]
80. Rice PA, Baker TA. Comparative architecture of transposase and integrase complexes. Nat. Struct. Biol. 2001;8:302–307. [PubMed]
81. Craig NL, Craigie R, Gellert M, Lambowitz AM. Mobile DNA II. Washington, DC: ASM press; 2002.
82. Curcio MJ, Derbyshire KM. The outs and ins of transposition: from mu to kangaroo. Nat. Rev. Mol. Cell. Biol. 2003;4:865–877. [PubMed]
83. Hickman AB, Chandler M, Dyda F. Integrating prokaryotes and eukaryotes: DNA transposases in light of structure. Crit. Rev. Biochem. Mol. Biol. 2010;45:50–69. [PMC free article] [PubMed]
84. Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, et al. The Pfam protein families database. Nucleic Acids Res. 2008;36:D281–D288. [PMC free article] [PubMed]
85. Wright S, Finnegan D. Genome evolution: sex and the transposable element. Curr. Biol. 2001;11:R296–R299. [PubMed]
86. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. [PubMed]
87. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, et al. The sequence of the human genome. Science. 2001;291:1304–1351. [PubMed]
88. Cordaux R, Batzer MA. The impact of retrotransposons on human genome evolution. Nat. Rev. Genet. 2009;10:691–703. [PMC free article] [PubMed]
89. Wessler SR, Bureau TE, White SE. LTR-retrotransposons and MITEs: important players in the evolution of plant genomes. Curr. Opin. Genet. Dev. 1995;5:814–821. [PubMed]
90. Feschotte C, Mouches C. Evidence that a family of miniature inverted-repeat transposable elements (MITEs) from the Arabidopsis thaliana genome has arisen from a pogo-like DNA transposon. Mol. Biol. Evol. 2000;17:730–737. [PubMed]
91. Ohshima K, Hattori M, Yada T, Gojobori T, Sakaki Y, Okada N. Whole-genome screening indicates a possible burst of formation of processed pseudogenes and Alu repeats by particular L1 subfamilies in ancestral primates. Genome Biol. 2003;4:R74. [PMC free article] [PubMed]
92. Mills RE, Bennett EA, Iskow RC, Devine SE. Which transposable elements are active in the human genome? Trends Genet. 2007;23:183–191. [PubMed]
93. Pace J.K., 2nd, Feschotte C. The evolutionary history of human DNA transposons: evidence for intense activity in the primate lineage. Genome Res. 2007;17:422–432. [PMC free article] [PubMed]
94. Benjak A, Forneck A, Casacuberta JM. Genome-wide analysis of the “cut-and-paste” transposons of grapevine. PLoS ONE. 2008;3:e3107. [PMC free article] [PubMed]
95. Nowacki M, Higgins BP, Maquilan GM, Swart EC, Doak TG, Landweber LF. A functional role for transposases in a large eukaryotic genome. Science. 2009;324:935–938. [PMC free article] [PubMed]
96. Mendiola MV, Bernales I, de la Cruz F. Differential roles of the transposon termini in IS91 transposition. Proc. Natl Acad. Sci. USA. 1994;91:1922–1926. [PMC free article] [PubMed]
97. Agrawal A, Eastman QM, Schatz DG. Transposition mediated by RAG1 and RAG2 and its implications for the evolution of the immune system. Nature. 1998;394:744–751. [PubMed]
98. Hiom K, Melek M, Gellert M. DNA transposition by the RAG1 and RAG2 proteins: a possible source of oncogenic translocations. Cell. 1998;94:463–470. [PubMed]
99. Lin R, Ding L, Casola C, Ripoll DR, Feschotte C, Wang H. Transposase-derived transcription factors regulate light signaling in Arabidopsis. Science. 2007;318:1302–1305. [PMC free article] [PubMed]
100. Casola C, Hucks D, Feschotte C. Convergent domestication of pogo-like transposases into centromere-binding proteins in fission yeast and mammals. Mol. Biol. Evol. 2008;25:29–41. [PMC free article] [PubMed]
101. Condit R, Stewart FM, Levin BR. The population biology of bacterial transposons: A priori conditions for maintenance as parasitic DNA. Am. Nat. 1988;132:129–147.

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...