• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Feb 19, 2008; 105(7): 2510–2515.
Published online Feb 11, 2008. doi:  10.1073/pnas.0711165105
PMCID: PMC2268167

Genome evolution in cyanobacteria: The stable core and the variable shell


Cyanobacteria are the only known prokaryotes capable of oxygenic photosynthesis, the evolution of which transformed the biology and geochemistry of Earth. The rapid increase in published genomic sequences of cyanobacteria provides the first opportunity to reconstruct events in the evolution of oxygenic photosynthesis on the scale of entire genomes. Here, we demonstrate the overall phylogenetic incongruence among 682 orthologous protein families from 13 genomes of cyanobacteria. However, using principal coordinates analysis, we discovered a core set of 323 genes with similar evolutionary trajectories. The core set is highly conserved in amino acid sequence and contains genes encoding the major components in the photosynthetic and ribosomal apparatus. Many of the key proteins are encoded by genome-wide conserved small gene clusters, which often are indicative of protein–protein, protein–prosthetic group, and protein–lipid interactions. We propose that the macromolecular interactions in complex protein structures and metabolic pathways retard the tempo of evolution of the core genes and hence exert a selection pressure that restricts piecemeal horizontal gene transfer of components of the core. Identification of the core establishes a foundation for reconstructing robust organismal phylogeny in genome space. Our phylogenetic trees constructed from 16S rRNA gene sequences, concatenated orthologous proteins, and the core gene set all suggest that the ancestral cyanobacterium did not fix nitrogen and probably was a thermophilic organism.

Keywords: horizontal (lateral) gene transfer, oxygenic photosynthesis, gene family, nitrogen fixation

Oxygenic photosynthesis is arguably the most important biological process on Earth. Approximately 2.3 billion years ago (Ga) (14), that energy transduction pathway transformed Earth's atmosphere and upper ocean, ultimately facilitating the development of complex life forms that depend on aerobic metabolism (57). Cyanobacteria are widely accepted as the progenitor of oxygenic photosynthesis, and the clade has evolved into one of the largest and most diverse groups of bacteria on this planet (8). Cyanobacteria contribute significantly to global primary production (9, 10), and diazotrophic taxa are central to global nitrogen cycle (1113). Arguably, no other prokaryotic group has had a greater impact on the biogeochemistry and evolutionary trajectory of Earth, yet its own evolutionary history is poorly understood.

The availability of complete genomes of related organisms provides the first opportunity to reconstruct events of genomic evolution through the analysis of entire functional classes (14). Currently, cyanobacteria represent one of the densest clusters of fully sequenced genomes [supporting information (SI) Table 1]. Comparisons of genome sequences of closely related marine Prochlorococcus and Synechococcus species have demonstrated an intimate link between genome divergence in specific strains and their physiological adaptations to different oceanic niches (15, 16). This ecotypic flexibility appears to be driven by myriad selective pressures that govern genome size, GC content, gene gains and losses, and rate of evolution (17, 18). Moreover, phylogenetic analyses of genes shared by all of the five known phyla of photosynthetic bacteria, including cyanobacteria, purple bacteria (Proteobacteria), green sulfur bacteria (Chlorobi), green filamentous bacteria (Chloroflexi), and Gram-positive heliobacteria (Firmicutes), have provided important insights into the origin and evolution of photosynthesis, an intensively debated subject in the past decades (1929). This information has been substantially extended by genome-wide comparative informatics (3032). One of the major implications of the latter work is a significant extent of horizontal gene transfer (HGT) among these photosynthetic bacteria. The observation that cyanophages sometimes carry photosynthetic genes (3335) provides one mechanism of rapid HGT among these phyla. However, HGTs almost certainly do not occur with equal probability for all genes. For example, informational genes (those involved in transcription, translation, and related processes), which are thought to have more macromolecular interactions than operational genes (those involved in housekeeping), are postulated to be seldom transferred (36, 37). The existence of a core of genes that remain closely associated and resistant to HGT has been reported in recent studies using relatively intensive taxon sampling (38, 39). Identification of such core genes potentially allows separation of true phylogenetic signals from “noise.” It is, therefore, of considerable interest to transcribe all coherent genome data into pertinent phylogenetic information and to identify which genes are more susceptible to HGT.

Here, we report on identification and reconstruction of the phylogeny of 682 orthologs from 13 genomes of cyanobacteria. Our primary goals are twofold: (i) to examine the impact of HGT on the evolution of photosynthesis and the radiation of cyanobacterial lineages; and (ii) to identify a core set of genes that are resistant to HGT on which robust organismal phylogeny can be reconstructed. Our results reveal that >52% (359) of the orthologs are susceptible to HGT within the cyanobacterial phylum and hence are responsible for the inconsistent phylogenetic signal of this taxon in genome space. In contrast, the remaining 323 orthologs show broad phylogenetic agreement. This core set is comprised of key photosynthetic and ribosomal proteins. This observation suggests that the macromolecular interactions in complex protein structures (e.g., ribosomal proteins) and metabolic pathways (e.g., oxygenic photosynthesis) are strongly resistant to piecemeal HGT. Transfer was ultimately accomplished by wholesale incorporation of cyanobacteria into eukaryotic host cells, giving rise to primary photosynthetic endosymbionts that retained both photosynthetic genes and genes coding for their own ribosomes (4044).


Conserved Protein Families in Genomes of Cyanobacteria.

Our pairwise genome comparison reveals a total of 682 orthologs common to all 13 genomes examined (SI Table 2). These orthologs constitute the core gene set and some define aspects of the genotype that are uniquely cyanobacterial. This core set represents only 8.9% (in the case of the largest genome, Nostoc punctiforme) to 39.7% (in the case of the smallest genome, Prochlorococcus marinus MED4) of the total number of protein-coding genes from each genome under study (see SI Table 1) but seems to account for all of the principal functions (SI Table 3). Our analysis leads to an estimate of the pool of orthologs similar to what has been identified from 10 cyanobacterial genomes (45), but nearly three times more than the number of cyanobacterial signature genes bioinformatically characterized by Martin et al. (46) and only 65% of the number of cyanobacterial clusters of orthologous groups (31). The discrepancy mostly results, in the case of the former, from a filtering procedure to remove homologs of chloroplasts and anoxygenic photoautotrophs and, in the case of the latter, from a less-stringent unidirectional BLAST hit scheme used. In addition, some of the incomplete genomes used in this study are still undergoing confirmation from the final assembly, hence equivalent genes may have been overlooked in some cases. It is highly possible that, because of the overly restrictive criterion (47), even without the use of any particular threshold (e.g., the default BLAST e value threshold is 10), the set of orthologs identified by the reciprocal top BLAST hit scheme would underestimate the actual number of orthologs (18).

Phylogenetic Incongruence Among Conserved Protein Families.

Based on amino acid sequences, we built phylogenetic trees for each of the 682 orthologous protein families, using both neighbor joining (NJ) and maximum likelihood (ML) methods. Surprisingly, the frequency distribution of observed topologies fails to reveal a predominant, unanimous topology that represents a large number of orthologs (Fig. 1). In contrast, most of the orthologs (58% and 67% for NJ and ML, respectively) exhibit their own unique topologies. As a result, the maximum number of orthologs that share a particular topology accounts for only 1.9–2.1% of the orthologous datasets (Fig. 1).

Fig. 1.
Distribution of tree topologies among 682 sets of orthologs. Both NJ (black bars) and ML (red bars) tree topologies give similar distribution patterns. There is no unanimous support for a single topology; rather, most of the orthologs (58% and 67% for ...

Phylogenomic Reconcilement.

To determine whether a common signal can be extracted from phylogenetic incongruence, we used the consensus, the supertree, and the reconstruction of phylogeny based on the concatenation of all of the 682 individual proteins. These approaches greatly resolve the topological incongruence, leading to five topologies as shown in Fig. 2. Specifically, the NJ and ML trees, using the concatenated sequences give three topologies in total (T1 and T2 for NJ; T2 and T3 for ML), one of which is in agreement with that of the 16S ribosomal RNA (rRNA) gene tree repeatedly obtained with NJ, ML, or maximum parsimony methods. The consensus and supertree built on the 682 individual NJ trees show two other topologies (T4 and T5), whereas those of ML trees reveal an identical topology to one of the concatenated ML trees. These five topologies are remarkably similar in that Synechocystis sp. PCC6803 and five diazotrophic species form a monophyletic clade, and that Synechococcus sp. WH8102 and three Prochlorococcus ecotypes form three different monophyletic clades. The notable conflicts concern the species Synechococcus elongates PCC7942 and the thermophilic Thermosynechococcus elongates BP-1, which tend to cluster at the base of the two major subgroups but form aberrant topologies.

Fig. 2.
Representative backbone tree topologies. Phylogenetic trees were constructed by using both 16S rRNA gene and orthologous proteins through phylogenomic approaches (see Materials and Methods for details). Phylogenetic tree construction methods are highlighted ...

Analyses of the fitness of a particular topology to the 682 sequence alignments (SI Figs. 6 and 7) indicate that almost all (97.5 to 99.6%) of the datasets support topologies T1-T5 at the 95% confidence level (P = 0.95), suggesting a lack of resolution of single gene phylogenies.

The Stable Core and the Variable Shell in Genome Space.

To extrapolate evolutionary trajectories least affected by artificial paralogs, or genes potentially obtained by HGT, we calculated tree distances among all possible pairs of the orthologous sets. The pairwise distances were then used to conduct a principal coordinates analysis (PCoA). This results in a core set of 323 genes that share similar evolutionary histories (i.e., coevolving and rarely transferred) as opposed to the other 359 that exhibit divergent phylogenies (i.e., independently evolving and frequently transferred) (Fig. 3A). Ribosomal proteins are almost all grouped in the densest core, whereas the much sparser region of the cloud is formed largely by operational and nonribosomal informational genes. Additionally, the core is comprised of proteins constituting the scaffolds of the photosynthetic apparatus, and, at least partially, those that participate in ATP synthesis, chlorophyll biosynthesis, and the Calvin cycle. Based on an approach of “embedded quartets” that allows detection of HGT events with significantly improved resolution, Zhaxybayeva et al. (32) found that some of the major photosynthetic genes were subject to HGT and that the bias toward metabolic (operational) gene transfers was only detectable in transfers between cyanobacteria and other phyla. The apparent conflict between our analysis and that of Zhaxybayeva et al. is almost certainly due to methodological differences.

Fig. 3.
PCoA of trees compared with topological distance. (A) Plot of the two first axes of the PCoA made from 628 ML trees. The other 54 genes are excluded as a result of axis demarcation. The same experiment with NJ trees gave very similar results. The ellipse ...

Using the sum of amino acid substitution per site in the tree as a “rough-and-ready” measure of protein variability (48), we compared the rates of evolution of genes in different functional categories. Both the PcoA (Fig. 3B) and the frequency distribution (Fig. 4) versus protein variability analyses reveal that ribosomal and photosynthetic genes are highly conserved, whereas the operational and other informational genes are strongly skewed toward high protein variability. A high degree of protein sequence conservation is significantly biased toward genes that are in the core and those encoded in genome-wide conserved gene clusters (Fig. 4), most likely because of the large number of ribosomal and photosynthetic genes present. This result suggests that the core gene set appears to have remained relatively stable throughout the evolutionary history of cyanobacteria, whereas genes in the shell are much more likely to be acquired via HGT.

Fig. 4.
Frequency distribution of genes belonging to designated categories within each 1.0 interval of protein variability. Protein variability was measured according to Rujan and Martin's method (48). Dashed lines denote the threshold that segregates the predominance ...

We further reconstructed the phylogeny of the 13 genomes on the basis of a superalignment of 100,776 sites obtained via concatenating the 323 core proteins. We used three methods, all of which result in a tree having the same topology as that for the consensus, supertree, and concatenation of all of the 682 protein families (T3 in Fig. 2 and tree presented in Fig. 5). It differs only slightly from other tested topologies that are not rejected by most individual alignments but exhibits a superior likelihood support (Fig. 2). Intriguingly, all of the diazotrophic cyanobacteria fall within a distinct group, and their divergence from other nondiazotrophic taxa appears to occur much later after the origin of the clade based on rooting with Gloeobacter violaceus PCC7421, most possibly the earliest lineage within the radiation of cyanobacteria (49), and T. elongatus BP-1, a unicellular thermophilic cyanobacterium that inhabits hot springs. The early diazotrophic cyanobacteria appear to have been nonheterocystous, with heterocyst-forming lineages emerging later, possibly as a result of elevated levels of atmospheric O2 (50).

Fig. 5.
Phylogenetic tree reconstructed based on the concatenation of the 323 core proteins. The topology shown agrees with the consensus topology of the 682 orthologs (T3 in Fig. 2) and is supported by almost all individual datasets (Fig. 2 and SI Fig. 7). Bootstrap ...


Our analyses reveal an overwhelming phylogenetic discordance among the set of genes selected as likely orthologs (Fig. 1). Conflicting phylogenies can be a result of artifacts of phylogenetic reconstruction, HGT, or unrecognized paralogy. In our reciprocal best hit approach, we retained as orthologs those containing only one gene per species. Therefore, only orthologous replacement and hidden paralogy (i.e., differential loss of the two copies in two lineages) can occur in selected families. These two types of events are expected to be comparatively rare under application of the reciprocal hit criterion (51). Thus, phylogenetic incongruence is unlikely due to artifacts from a biased selection of orthologs. Furthermore, the overall phylogenetic disagreement does not seem to be caused by tree reconstruction or model selection artifacts because both NJ and ML individual trees unambiguously support plural partitions (Fig. 2). HGT is likely one of the most important driving forces that lead to the discrete evolutionary histories of the conserved protein families. Indeed, HGT has played an important role in the evolution of prokaryotic genomes (5254). A hallmark of HGT is that the transferred genes often exhibit aberrant organismal distributions, which contrast with the relationships inferred from both the 16S rRNA gene tree and phylogenies of vertically inherited individual protein-coding genes. But how can this superficially random gene transfer event explain the conserved nature of many of the key genes that comprise the functional core across all cyanobacterial taxa?

Although phylogenomic approaches are capable of capturing the consensus or frequent partitions that silhouette the trend in genome evolution, they may not necessarily guarantee the paucity of a conflicting phylogenetic signal in genome space. The plural support for the consensus/supertree/concatenation topologies indicates that the five top topologies are not significantly different from each other; that is, >90% of the datasets do not discriminate among the topologies (Fig. 2). Do the consensus/supertree/concatenation trees accurately reflect organismal history, or, on the contrary, do they blur the vertical inheritance signal by incorporating potential HGTs? There is a large margin of uncertainty. Part of the uncertainty may be due to the strength of the Shimodaira–Hasegawa (SH) test (55), especially when examining the accuracy of similar topologies. Indeed, the SH test was based on the evaluation of only 15 of a total of 13,749,310,575 possible unrooted tree topologies for 13 species (SI Figs. 6 and 7). Although the majority of the possible topologies would not be supported by any dataset, the selection of a limited number of trees may have biased the analyses. But part of the uncertainty can also be attributed to the data, most notably genes that are subjected to HGT and homologous recombination between closely related species. This is even more pronounced in the PCoA, which demonstrates clearly that ≈53% of the orthologs are subject to HGT complicating/diluting the vertical inheritance signal within the cyanobacterial phylum (Fig. 3).

Our results reveal that both photosynthetic and ribosomal genes share similar evolutionary histories and belong to the cyanobacterial genome core (Fig. 3). This finding of limited HGT in proteins with extraordinarily conserved primary structure is consistent with the complexity hypothesis; that is, genes coding for large complex systems that have more macromolecular interactions are less subject to HGT than genes coding for small assemblies of a few gene products (37). Translation in prokaryotes requires coordinated assembly of at least 100 gene products, including ribosomal small and large subunits, which interact with 5S, 16S, and 23S rRNA; numerous tRNA and mRNA; initiation and termination factors; ions; etc. Similarly, the oxygenic photosynthetic apparatus needs an investment of a huge number of proteins, pigments, cofactors, and trace elements for effective functionality. All of the components required in both machines are presumed to be present in a potential host, and the complexity of gene product interactions is a significant factor that restricts their successful HGT rates relative to the high HGT rates observed for operational genes. It is noteworthy, however, that not all photosynthetic genes are significantly resistant to HGT. Photosynthetic genes outside the core, including genes encoding proteins whose functions are yet to be confirmed (ycf) and those that tend to form the periphery (i.e., supplemental “add ons”) of the photosynthetic scaffold, may be less critical to biophysical interactions and hence more readily transferred between cyanobacteria compared with the large integral membrane proteins that belong to the functional core of the photosynthetic apparatus. The impact of membrane protein interactions appears to continue to limit the transfer of core photosynthetic genes to the nucleus in higher plants and algae, even after the endosymbiosis event. This is supported by the fact that proteins whose genes are most resistant to transfer to the nucleus constitute the functional physical core of the photosynthetic apparatus (56). In contrast, some genes that belong to the peripheral scaffold of photosynthesis, for example, the petC and psbO, have been transferred to the nucleus as did thousands of other easily transferred cyanobacterial genes (57). A striking feature of these HGT-resistant components is that they tend to cluster together in a putative operon, containing two to four genes, that is conserved among all cyanobacteria and plastids (58). The mechanism underpinning the conservation of gene order is unknown. It could be an advantage in gene expression for coordinated transcription of the genes and assembly of the subunits of a multimetric complex. However, it is more likely that protein–protein, protein–cofactor, and protein–membrane interactions exert a strong selection pressure to maintain synteny to reduce the chance of being perturbed by HGT via genetic recombination (58, 59). These interactions not only govern the conservation of gene order, but also the tempo of evolution of these genes (Figs. 3 and and4).4). There seems to be a link between the tempo of evolution and resistance to HGT; the probability of HGT increases with decreased conservation of amino acid sequence in a gene product (48). Moreover, the complexity of oxygenic photosynthetic machinery makes it difficult to transfer components piecemeal to nonphotosynthetic prokaryotes. Indeed, operon splitting of the photosynthetic apparatus requires many independent transfers of noncontiguous operons. Although large-scale HGT among photosynthetic prokaryotes (30) may suggest a complex nonlinear process of evolution that results in a mosaic structure of photosynthetic pathway (60), transfer of the key photosynthetic genes are very rare (33, 34). Transfer of this key pathway was only achieved by wholesale incorporation of cyanobacteria into eukaryotic host cells (4044).

Phylogenetic analysis of the cyanobacterial genome core strongly suggests that the last common ancestor of extant cyanobacteria was incapable of N2 fixation (Fig. 5). This metabolic pathway appears to have been acquired via HGT much later after the origin of this clade, possibly as a result of a “fixed nitrogen crisis” in the late Archean and early Proterozoic eons (61). In this scenario, the accumulation of a small concentration of oxygen (resulting from oxygenic photosynthesis) would have led to massive denitrification of the upper ocean with a concomitant loss of fixed inorganic nitrogen for growth of marine photoautotrophs. This process created an evolutionary bottleneck, which potentially selected for the stable transfer of the nif operon from a (presumably) heterotrophic prokaryote to cyanobacteria. It should be noted that selection of the specific nitrogenase was probably not related to metal availability, because Fe was abundant under these mildly oxidizing conditions (62). It was only after the “great oxidation event,” ≈2.3 Ga (13) and later, that Fe would become limiting, leading to the sequential selection for V- and Mo-containing nitrogenases. Thus, under the mildly oxidizing conditions that prevailed in the late Archean to early Proterozoic, Fe-based nitrogenases would have been naturally selected within the archaea and subsequently transferred to a large group of bacteria via HGT (63). In the late Proterozoic and throughout the Phanerozoic, oxygenic photosynthesis ultimately led to precipitation of insoluble oxidized (ferric) Fe, thereby making this element a major factor limiting N2 fixation in the ocean, a condition that appears to continue to limit the productivity in the contemporary ocean (64). Macrogenomic features of cyanobacteria potentially provide clues regarding the ability of these organisms to acquire nitrogenase and other genes. For example, all cyanobacterial diazotrophs have significantly larger genomes than their nonfixing counterparts (SI Table 1), suggesting that the genomes may have been exposed to frequent HGT and are more competent to incorporate genes.

Materials and Methods

Gene Family Selection.

We performed all-against-all BLAST (65) comparisons of protein sequences for all possible pairs of the 13 genomes of cyanobacteria (SI Table 1), using an e value of 10−4 as a lower limit cutoff, and reciprocal genome-specific best hits were identified. A total number of 682 protein families consisting of one gene per genome were retrieved and assigned to functional categories according to those defined for the cluster of orthologous groups (66).

Alignments and Tree Construction.

Protein sequences were aligned with ClustalW (67), followed by selecting unambiguous parts of the alignments excluding all gap sites. ML trees were computed with PHYML (68), using the JTT model of substitution and the Gamma (Γ)-based method for correcting the rate heterogeneity among sites. Neighbor joining (NJ) trees were constructed by using the distance matrix provided by TREE-PUZZLE (69) under a Γ-based model of substitution (alpha parameter estimated, eight Γ rate categories) and bootstrapped by using SEQBOOT and CONSENSE from PHYLIP (70). See SI Methods for concatenation, consensus, and supertree reconstruction.

Comparisons Among Trees.

Trees were compared with Treedist program in PHYLIP using the branch score distance of Kuhner and Felsenstein (71) to generate an n × n distance matrix where n is the number of trees. Principal coordinates analysis (PCoA) was then performed with the multidimensional scaling procedure in SAS software, Version 8.2 (SAS Institute). PCoA allowed us to embed the n trees in a space of up to n − 1 dimensions. By plotting the objects (the trees) along the most significant two first dimensions, the major trends and groupings in the data can be visualized graphically.

For each of the 682 alignments, a comparison of the likelihood of the best topology with that of the candidate topologies (SI Figs. 6 and 7) was performed with the SH test (55) implemented in TREE-PUZZLE. Similarly, a comparison of the five backbone topologies (Fig. 2) was conducted with SH, Kishino–Hasegawa (72), and expected likelihood weight (73) tests, using the concatenated 323-core-gene set. All tests were conducted by using a 5% significance level and were performed by using the resampling of estimated log-likelihood method with 1,000 replications.

Supplementary Material

Supporting Information:


We thank Robert Blankenship and J. Peter Gogarten for critical reviews of the manuscript; William Martin, Colomban de Vargas, and Kay Bidle for stimulating discussions and insightful comments; and Lin Jiang for assistance with the principal coordinates analysis. This work was supported by the grants from the Agouron Foundation and National Aeronautics and Space Administration Exobiology Program Grant NX7AK14G (to P.G.F.).


The authors declare no conflict of interest.

This article contains supporting information online at www.pnas.org/cgi/content/full/0711165105/DC1.


1. Bekker A, et al. Dating the rise of atmospheric oxygen. Nature. 2004;427:117–120. [PubMed]
2. Anbar AD, et al. A whiff of oxygen before the great oxidation event? Science. 2007;317:1903–1906. [PubMed]
3. Kaufman AJ, et al. Late Archean biospheric oxygenation and atmospheric evolution. Science. 2007;317:1900–1903. [PubMed]
4. Knoll AH, Summons RE, Waldbauer JR, Zumberge JE. In: Evolution of Primary Producers in the Sea. Falkowski PG, Knoll AH, editors. New York: Academic; 2007. pp. 133–163.
5. Blankenship RE, Hartman H. The origin and evolution of oxygenic photosynthesis. Trends Biochem Sci. 1998;23:94–97. [PubMed]
6. Falkowski PG, et al. The rise of oxygen over the past 205 million years and the evolution of large placental mammals. Science. 2005;309:2202–2204. [PubMed]
7. Raymond J, Segre D. The effect of oxygen on biochemical networks and the evolution of complex life. Science. 2006;311:1764–1767. [PubMed]
8. Whitton BA, Potts M. In: The Ecology of Cyanobacteria. Whitton BA, Potts M, editors. Dordrecht, The Netherlands: Kluwer Academic; 2000. pp. 1–11.
9. Waterbury JB, Watson SW, Guillard RRL, Brand LE. Widespread occurrence of a unicellular, marine, planktonic, cyanobacterium. Nature. 1979;277:293–294.
10. Chisholm SW, et al. A novel free-living prochlorophyte abundant in the oceanic euphotic zone. Nature. 1988;334:340–343.
11. Capone DG, Zehr JP, Paerl H, Bergman B, Carpenter EJ. Trichodesmium, a globally significant marine cyanobacterium. Science. 1997;276:1221–1229.
12. Zehr JP, et al. Unicellular cyanobacteria fix N2 in the subtropical North Pacific Ocean. Nature. 2001;412:635–638. [PubMed]
13. Karl D, et al. Nitrogen fixation in the world's oceans. Biogeochemistry. 2002;57/58:47–98.
14. Koonin EV, Aravind L, Kondrashov AS. The impact of comparative genomics on our understanding of evolution. Cell. 2000;101:573–576. [PubMed]
15. Palenik B, et al. The genome of a motile marine Synechococcus. Nature. 2003;424:1037–1042. [PubMed]
16. Rocap G, et al. Genome divergence in two Prochlorococcus ecotypes reflects oceanic niche differentiation. Nature. 2003;424:1042–1047. [PubMed]
17. Hess WR. Genome analysis of marine photosynthetic microbes and their global role. Current Opin Biotech. 2004;15:191–198. [PubMed]
18. Dufresne A, Garczarek L, Partensky F. Accelerated evolution associated with genome reduction in a free-living prokaryote. Genome Biol. 2005;6:R14. [PMC free article] [PubMed]
19. Olson JM, Pierson BK. Origin and evolution of photosynthetic reaction centers. Orig Life. 1987;17:419–430.
20. Blankenship RE. Origin and early evolution of photosynthesis. Photosynth Res. 1992;33:91–111. [PubMed]
21. Vermaas WFJ. Evolution of heliobacteria: Implications for photosynthetic reaction center complexes. Photosynth Res. 1994;41:285–294. [PubMed]
22. Xiong J, Inoue K, Bauer CE. Tracking molecular evolution of photosynthesis by characterization of a major photosynthesis gene cluster from Heliobacillus mobilis. Proc Natl Acad Sci USA. 1998;95:14851–14856. [PMC free article] [PubMed]
23. Xiong J, Fischer WM, Inoue K, Nakahara M, Bauer CE. Molecular evidence for the early evolution of photosynthesis. Science. 2000;289:1724–1730. [PubMed]
24. Baymann F, Brugna M, Muhlenhoff U, Nitschke W. Daddy, where did (PS)I come from? Biochim Biophys Acta Bioenerget. 2001;1507:291–310. [PubMed]
25. Gupta RS. Evolutionary relationships among photosynthetic bacteria. Photosynth Res. 2003;76:173–183. [PubMed]
26. Rutherford AW, Faller P. Photosystem II: Evolutionary perspectives. Philos Trans R Soc Lond B. 2003;358:245–253. [PMC free article] [PubMed]
27. Olson JM, Blankenship RE. Thinking about the evolution of photosynthesis. Photosynth Res. 2004;80:373–386. [PubMed]
28. Sadekar S, Raymond J, Blankenship RE. Conservation of distantly related membrane proteins: Photosynthetic reaction centers share a common structural core. Mol Biol Evol. 2006;23:2001–2007. [PubMed]
29. Xiong J. Photosynthesis: What color was its origin? Genome Biol. 2006;7:245. [PMC free article] [PubMed]
30. Raymond J, Zhaxybayeva O, Gogarten JP, Gerdes SY, Blankenship RE. Whole-genome analysis of photosynthetic prokaryotes. Science. 2002;298:1616–1620. [PubMed]
31. Mulkidjanian AY, et al. The cyanobacterial genome core and the origin of photosynthesis. Proc Natl Acad Sci USA. 2006;103:13126–13131. [PMC free article] [PubMed]
32. Zhaxybayeva O, Gogarten JP, Charlebois RL, Doolittle WF, Papke RT. Phylogenetic analyses of cyanobacterial genomes: Quantification of horizontal gene transfer events. Genome Res. 2006;16:1099–1108. [PMC free article] [PubMed]
33. Lindell D, et al. Transfer of photosynthesis genes to and from Prochlorococcus viruses. Proc Natl Acad Sci USA. 2004;101:11013–11018. [PMC free article] [PubMed]
34. Millard A, Clokie MRJ, Shub DA, Mann NH. Genetic organization of the psbAD region in phages infecting marine Synechococcus strains. Proc Natl Acad Sci USA. 2004;101:11007–11012. [PMC free article] [PubMed]
35. Coleman ML, et al. Genomic islands and the ecology and evolution of Prochlorococcus. Science. 2006;311:1768–1770. [PubMed]
36. Rivera MC, Jain R, Moore JE, Lake JA. Genomic evidence for two functionally distinct gene classes. Proc Natl Acad Sci USA. 1998;95:6239–6244. [PMC free article] [PubMed]
37. Jain R, Rivera MC, Lake JA. Horizontal gene transfer among genomes: The complexity hypothesis. Proc Natl Acad Sci USA. 1999;96:3801–3806. [PMC free article] [PubMed]
38. Daubin V, Gouy M, Perriere G. A phylogenomic approach to bacterial phylogeny: Evidence of a core of genes sharing a common history. Genome Res. 2002;12:1080–1090. [PMC free article] [PubMed]
39. Lerat E, Daubin V, Moran NA. From gene trees to organismal phylogeny in prokaryotes: The case of the gamma-proteobacteria. PLoS Biol. 2003;1:101–109. [PMC free article] [PubMed]
40. Martin W, et al. Gene transfer to the nucleus and the evolution of chloroplasts. Nature. 1998;393:162–165. [PubMed]
41. Bhattacharya D, Medlin L. Algal phylogeny and the origin of land plants. Plant Physiol. 1998;116:9–15.
42. Delwiche CF. Tracing the thread of plastid diversity through the tapestry of life. Am Nat. 1999;154:S164–S177. [PubMed]
43. Grzebyk D, Schofield O, Vetriani C, Falkowski PG. The mesozoic radiation of eukaryotic algae: The portable plastid hypothesis. J Phycol. 2003;39:259–267.
44. Falkowski PG, Knoll AH. Evolution of Primary Producers in the Sea. New York: Academic; 2007.
45. Zhaxybayeva O, Lapierre P, Gogarten JP. Genome mosaicism and organismal lineages. Trends Genet. 2004;20:254–260. [PubMed]
46. Martin K, et al. Cyanobacterial signature genes. Photosynth Res. 2003;75:211–221. [PubMed]
47. Tatusov RL, Koonin EV, Lipman DJ. A genomic perspective on protein families. Science. 1997;278:631–637. [PubMed]
48. Rujan T, Martin W. How many genes in Arabidopsis come from cyanobacteria? An estimate from 386 protein phylogenies. Trends Genet. 2001;17:113–120. [PubMed]
49. Nakamura Y, et al. Complete genome structure of Gloeobacter violaceus PCC 7421, a cyanobacterium that lacks thylakoids. DNA Res. 2003;10:137–145. [PubMed]
50. Berman-Frank I, Lundgren P, Falkowski P. Nitrogen fixation and photosynthetic oxygen evolution in cyanobacteria. Res Microbiol. 2003;154:157–164. [PubMed]
51. Zhaxybayeva O, Gogarten JP. An improved probability mapping approach to assess genome mosaicism. BMC Genomics. 2003;4:37. [PMC free article] [PubMed]
52. Doolittle WF. Lateral genomics. Trends Cell Biol. 1999;9:M5–M8. [PubMed]
53. Ochman H, Lawrence JG, Groisman EA. Lateral gene transfer and the nature of bacterial innovation. Nature. 2000;405:299–304. [PubMed]
54. Gogarten JP, Doolittle WF, Lawrence JG. Prokaryotic evolution in light of gene transfer. Mol Biol Evol. 2002;19:2226–2238. [PubMed]
55. Shimodaira H, Hasegawa M. Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol. 1999;16:1114–1116.
56. Race HL, Herrmann RG, Martin W. Why have organelles retained genomes? Trends Genet. 1999;15:364–370. [PubMed]
57. Martin W, et al. Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus. Proc Natl Acad Sci USA. 2002;99:12246–12251. [PMC free article] [PubMed]
58. Shi T, Bibby TS, Jiang L, Irwin AJ, Falkowski PG. Protein interactions limit the rate of evolution of photosynthetic genes in cyanobacteria. Mol Biol Evol. 2005;22:2179–2189. [PubMed]
59. Dandekar T, Snel B, Huynen M, Bork P. Conservation of gene order: A fingerprint of proteins that physically interact. Trends Biochem Sci. 1998;23:324–328. [PubMed]
60. Blankenship RE. Molecular evidence for the evolution of photosynthesis. Trends Plants Sci. 2001;6:4–6. [PubMed]
61. Fennel K, Follows M, Falkowski PG. The co-evolution of the nitrogen, carbon and oxygen cycles in the Proterozoic ocean. Am J Sci. 2005;305:526–545.
62. Anbar AD, Knoll AH. Proterozoic ocean chemistry and evolution: A bioinorganic bridge? Science. 2002;297:1137–1142. [PubMed]
63. Raymond J, Siefert JL, Staples CR, Blankenship RE. The natural history of nitrogen fixation. Mol Biol Evol. 2004;21:541–554. [PubMed]
64. Falkowski PG. Evolution of the nitrogen cycle and its influence on the biological sequestration of CO2 in the ocean. Nature. 1997;387:272–275.
65. Altschul SF, et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucl Acids Res. 1997;25:3389–3402. [PMC free article] [PubMed]
66. Tatusov RL, et al. The COG database: New developments in phylogenetic classification of proteins from complete genomes. Nucl Acids Res. 2001;29:22–28. [PMC free article] [PubMed]
67. Thompson J, Higgins D, Gibson T. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl Acids Res. 1994;22:4673–4680. [PMC free article] [PubMed]
68. Guindon S, Gascuel O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol. 2003;52:696–704. [PubMed]
69. Strimmer K, von Haeseler A. Quartet puzzling: A quartet maximum-likelihood method for reconstructing tree topologies. Mol Biol Evol. 1996;13:964–969.
70. Felsenstein J. PHYLIP (Phylogeny Inference Package), Version 3.6. Seattle, WA: Department of Genetics, University of Washington; 2002.
71. Kuhner MK, Felsenstein J. A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol Biol Evol. 1994;11:459–468. [PubMed]
72. Kishino H, Hasegawa M. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J Mol Evol. 1989;29:170–179. [PubMed]
73. Strimmer K, Rambaut A. Inferring confidence sets of possibly misspecified gene trees. Proc R Soc Lond B. 2002;269:137–142. [PMC free article] [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...