• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Jan 5, 2010; 107(1): 472–477.
Published online Dec 29, 2009. doi:  10.1073/pnas.0908007107
PMCID: PMC2806719
Plant Biology

Angiosperm genome comparisons reveal early polyploidy in the monocot lineage

Abstract

Although the timing and extent of a whole-genome duplication occurring in the common lineage of most modern cereals are clear, the existence or extent of more ancient genome duplications in cereals and perhaps other monocots has been hinted at, but remain unclear. We present evidence of additional duplication blocks of deeper hierarchy than the pancereal rho (ρ) duplication, covering at least 20% of the cereal transcriptome. These more ancient duplicated regions, herein called σ, are evident in both intragenomic and intergenomic analyses of rice and sorghum. Resolution of such ancient duplication events improves the understanding of the early evolutionary history of monocots and the origins and expansions of gene families. Comparisons of syntenic blocks reveal clear structural similarities in putatively homologous regions of monocots (rice) and eudicots (grapevine). Although the exact timing of the σ-duplication(s) is unclear because of uncertainties of the molecular clock assumption, our data suggest that it occurred early in the monocot lineage after its divergence from the eudicot clade.

Keywords: collinearity, paleopolyploidy, synteny

Whole-genome duplications (WGDs) have occurred in the lineages of plants (13), animals (4, 5), and fungi (6), with consequences ranging from the origin of evolutionary novelty (7) to the provision of genetic “buffer capacity” that increases genetic robustness (810). Reciprocal gene loss following a WGD can contribute to reproductive isolation through divergent resolution of duplicate copies (11), foreshadowing the diversification of species (1214).

WGDs are particularly widespread in the phylogeny of flowering plants, giving rise to large gene families and complicating comparative studies (1, 1517). Relatively recent WGDs often are readily identified through intragenomic comparisons; however, more ancient WGDs are less tractable and often have been studied through “bottom-up” reconstruction of intermediate orders (1, 5), iteratively inferring the state of the ancestral genome before successively more ancient duplications.

It is well established that one WGD (hereinafter denoted as ρ) occurred in the cereal lineage an estimated 70 million years ago, preceding the radiation of the major cereal clades by 20 million years or more (2, 18). “Quartet” comparisons of the two resulting paralogous (homeologous) chromosomal regions in rice and sorghum show that 97%–98% of postduplication gene losses are orthologous (19), consistent with the ρ event predating the diversification of major grass lineages (2, 20). This suggests that rice–sorghum gene arrangements likely are representative of those of most grass genomes, albeit further modified in some lineages by additional cycles of duplication and gene loss. The ρ duplication is extensive, involving all modern chromosomes of rice and sorghum and covering much of the euchromatin (2, 21). Even a duplication previously thought to be recent and segmental apparently results from ρ with subsequent concerted evolution (19, 22).

While several studies (3, 20, 23) have hinted that additional monocot duplications may have predated ρ, the extent of such earlier duplications has not yet been elucidated. Inferences of more ancient polyploidy based on inspection of amino acid differences between duplicate genes (dA) (23) are affected by varying substitution rates among different gene families (1). A recent study identified 29 duplications in the rice genome, including 19 minor blocks that overlap with 10 major blocks (20), but did not systematically study these segments in a hierarchical context to elucidate their evolutionary history.

In the present work, we combined a visually intuitive approach with a gene-based multicollinearity search algorithm, MCscan (24), to improve understanding of the paleoevolution of the cereal lineage before ρ and explore its implications for comparative genomics and gene family evolution. In particular, consideration of these additional monocot duplications in multiple alignments clarifies monocot–eudicot sequence comparisons and reveals clear associations between sets of segments in representative genomes from each clade.

Results

Quartet Alignments Among Rice and Sorghum Gene Orders (ρ-Blocks).

We compiled a list of syntenic gene quartets from rice and sorghum, showing both orthologous and ρ-paralogous matches. We analyzed a total of nine large segmental duplications attributed to the ρ-genome duplication using previously described block identifiers (2). The extent of ρ synteny between ρ-duplicate segments is summarized in Table S1, and boundaries of ρ blocks are highlighted in a rice intragenomic dot plot in Fig. 1A. These 9 ρ-blocks correspond to 9 of 10 major blocks described by Salse et al. (20). We consider one block involving chromosomes 4–10 of Salse et al. (20) to overlap with both ρ2 and ρ5, indicating an origin more ancient than ρ.

Fig. 1.
Illustration of bottom-up reconstruction of ρ-blocks and σ-blocks. (A) Classifications of ρ duplicated blocks are visualized in the lower left triangle. (B) During the second iteration, each of the paired hits is converted into ...

Each ρ-block merges two regions of rice and two regions of sorghum into a single gene order that approximates the genome composition before the ρ duplication. Specifically, the ρ-order collapses 15,640 rice genes and 15,636 sorghum genes into 13,308 ρ-nodes (~50% of the rice and sorghum transcriptomes), excluding tandemly duplicated genes. The incorporation of sorghum gene orders validates the ρ-blocks previously identified in rice while better resolving a few duplicated regions that were reciprocally silenced in rice and sorghum. This reconstruction of pre-ρ gene order is intended to computationally reverse post-ρ gene loss, increasing the sensitivity of subsequent analysis. We emphasize that this order is only an approximation, because the ancestral positions of the intervening singleton genes between consecutive pairs of ρ-paralogs cannot be determined precisely. Nonetheless, we show below that this intermediate order is useful to mask post-ρ events and infer the structure of more ancient blocks.

Pre-ρ Duplications in the Cereal Lineage (σ-Blocks).

The σ-blocks (involved in duplication events before ρ) were identified through further bottom-up reconstruction (1). Reconstructed ρ-orders of 13,308 ρ-nodes from the previous step were compared, revealing collinear patterns of correspondence involving all nine major ρ-blocks (Fig. 1B). Some collinear patterns between pairs of ρ-blocks are one to one, whereas others (i.e., σ2, σ4, and σ5) involve more than two ρ-blocks, suggesting the identification of additional duplications.

To facilitate further analysis, we curated a second list of 8 large σ-blocks (Table S2) that have retained collinearity following σ. These blocks contain a total of 4,168 σ-nodes, covering 5,747 rice genes and 5,738 sorghum genes (~20% of the rice and sorghum transcriptomes). Enumerating all patterns of σ collinearity is difficult, because some duplicated regions become highly degenerate during post-WGD diploidization, creating gene orders that are largely reciprocal or sometimes complementary (25, 26). Relationships between some degenerate segments can still be identified through transitive comparisons of grapevine and rice genomes (see below), but there is little remaining intragenomic correspondence between rice segments.

The bottom-up approach, starting from the modern gene order to deduce ρ- and σ-orders, offers inherent hierarchical structures that reflect the relationships among chromosome segments. Figure 2 shows a section of σ6; collinearity is well retained, and anchored gene pairs, including rice–sorghum orthologs, ρ-paralogs and σ-paralogs, often retain consistent transcriptional orientations. Nonetheless, gene losses (due to fractionation) are extensive, particularly across the σ duplication (between the 2 ρ-blocks) where there are the fewest corresponding genes (Fig. 2).

Fig. 2.
Example alignment showing syntenic relationships among four rice (Os) and four sorghum (Sb) regions. Three well-retained gene clusters along these syntenic groups are plotted to show the relatively stable gene phylogeny that is consistent with the duplication ...

Genetic Distances of the Gene Pairs.

Paralogous gene pairs fall into separate age groups, in accordance with the hierarchical relationships among the segments on which they reside. Synonymous nucleotide substitutions per synonymous site (Ks) for the groups of orthologs and paralogs from different events (ρ and σ) were well separated (Fig. 3). However, variations in the GC content of cereal genes can affect Ks calculations, with different algorithms generating differing Ks values for pairs involving genes with high third codon position GC content (GC3) (27). Accordingly, we focused on gene pairs with an average GC3 < 75%; we provide justification for this cutoff in Methods.

Fig. 3.
Ks value distributions for rice–sorghum orthologs, cereal WGD paralogs (ρ and σ paralogs), and grape‒cereal orthologs. The recent ρ paralog pairs (rice‒rice and sorghum–sorghum pairs) are readily ...

Rice–sorghum orthologs show a sharp Ks peak (median, 0.62), consistent with previous estimates (19). The populations of ρ-paralogs from both rice and sorghum show a major peak at Ks 0.94, along with a small peak at Ks ~0.15 resulting from concerted evolution of the terminal part of ρ9 (22, 28) (Table S1).

Paralogs derived from σ duplications show a well-defined peak around much older Ks (median, 1.72) and with a larger variance than that of other groups. Based on a molecular clock of 6.5 × 10−9 synonymous substitutions per synonymous site per year (29), the σ duplications are estimated to have occurred ≈130 million years ago. Because the Ks values for many σ-paralogous pairs are almost saturated and there are uncertainties in the calibration of the molecular clock (30), this date can be considered only a rough estimate.

Decomposition of Ks distributions explains why previous studies were not able to identify the σ event (and to some extent the ρ event as well) based solely on the Ks distribution of ESTs (31). Several analyses have relied on curve-fitting methods to find multiple duplication events based on Ks distributions (16, 24). The combined set of ρ and σ paralogs shows a distribution with the mixed peak extending from 1.0 to 2.0, which can be readily separated into distinct components using our synteny-based classifications. Synteny-based classifications of gene pairs also remove the L-shaped component resulting from recent single-gene duplication events in the Ks plot (24).

Judging from the Ks distribution, the distances between both ρ and σ duplicates appear to be bounded between rice‒sorghum orthologs and grape–cereal orthologs (Fig. 3), suggesting that each of these WGDs occurred between cereal diversification and monocot–eudicot divergence. Indeed, the distances between grape–cereal orthologs (median Ks, 1.95) are higher than those between the cereal paralogs from the σ duplications (P = 4.8 × 10−24; Student t test). However, differences in lineage-specific mutation rates between grass and grape confound interpretation of Ks values, and we reemphasize that our divergence time estimates must be considered only rough approximations. Initial interpretation of the Arabidopsis β WGD provides a cautionary example; Ks analyses of duplicated genes suggested that the β-duplication predated the divergence of Arabidopsis and Carica (papaya), but analyses of blocks of genomic sequence showed that the β-duplication occurred after the divergence of these lineages (24).

Episodic Expansions for Some Cereal Gene Families.

Different ancestral loci often show varying levels of retention following polyploidy; particular gene functional groups are preferentially retained or lost following WGD events (9, 24, 32, 33). To further investigate this, we calculated the enrichment of molecular function terms of Gene Ontology (GO) for the rice WGD paralogs within different age groups (see Methods).

Interestingly, ρ-duplicates and the more ancient σ-duplicates have the same four most-enriched gene functional groups: transcriptional factor activity (GO:0003700), ligand binding (GO:0005488), DNA binding (GO:0003677), and transcriptional regulator activity (GO:0030528) (Table S3). This trend of retention in rice WGD duplicates is consistent with previous findings for WGD paralogs in Arabidopsis (33, 34). Many rice transcriptional regulators and kinases were preferentially retained following the recurring WGDs (ρ and σ), leading to episodic expansions of those gene families. Enrichment for paleo-duplicates suggests that the diversification of signal transduction pathways in both the upstream elements (protein kinases) and the downstream elements (transcriptional factors) may increase the regulatory complexity for flowering plants following WGDs (33).

Effective Comparisons Between Cereal and Eudicot Genomes.

Similarities between monocot and eudicot genomes resulting from common ancestry have been obscured by many rounds of paleopolyploidy and numerous genome rearrangements (3, 35). Recurring polyploidy events pose significant challenges when comparing monocot and eudicot genomes because of the degeneration caused by independent gene fractionation (or “diploidization”) following several rounds of paleopolyploidy in each lineage.

To compare monocot and eudicot genomes, we applied a hierarchical clustering approach (see Methods) that partially circumvents such difficulties in identifying synteny across grape and rice (36). In brief, the chromosomes were cut into small segments, and each pair of rice and grape segments were compared. For example, assume that we had rice segments O1 and O2 and grape segment V1, and that comparisons O1V1 and O2V1 showed a significant number of homologs. Based on this information, O1 and O2 could be clustered together, because both matched the same grape region(s). In this approach, only the “dense” (syntenic) portions of the whole-genome dot plot were clustered, assembled, and interpreted; the “sparse” (nonsyntenic) portions were eliminated from further analysis (Fig. S1).

Based on our clustering approach, duplicated segments retained in grape following the eudicot γ hexaploidy event (3), as well as homologous segments retained in rice following at least two rounds of duplication (ρ and σ), were found to contain 38 “putative ancestral regions” (PARs). Each PAR consists of regions showing a high density of homologs (P < 1 × 10−10; see Methods). The PARs collectively explain 19.1% of all observed homolog pairs and 31.0% of the reciprocal best hits between grape and rice genes, although by chance they should explain only 2.1% for both categories (the 38 PARs, as highlighted in Fig. S1, occupy only 2.1% of the total area on the dot plot), representing a ~10-fold enrichment. The PARs interleave multiple grape and rice genomic regions collectively covering ~70% of each genome. By consolidating much of the redundancy in each genome, the PARs create syntenic blocks with less ambiguity and in most cases show an association between one γ block and one σ block. We found no PAR that mapped simultaneously to two different γ or σ blocks (Table S4).

When scrutinizing a particular PAR, analyzing syntenic relationships among the clustered regions is more informative than analyzing any individual pair of syntenic segments that contribute to the PAR. For example, in PAR17 (Fig. 4A), three grape regions resulting from the γ triplication (γ6) (3, 24) correspond to several regions in rice matching one another, which can be partially explained by σ1, as well as additional duplications unobserved in intragenomic comparisons.

Fig. 4.
Synteny comparisons with PARs. (A) Zoom-in view of an exemplary PAR17 consisting of corresponding regions from grape and rice. The segment labels on the right and below the graph has the format species (Vv for grape, Os for rice) followed by “chromosome:start-stop,” ...

Discussion

Integration of Intragenomic and Intergenomic Analyses.

The individual PARs derived from grapevine–rice comparisons using our clustering approach offer an independent and important validation of the σ blocks that we identified through cereal intragenomic comparisons. All σ blocks that we identified above are evident in the grapevine–rice PARs (Table S4). Some “ghost duplications” (25) in rice that we failed to identify through intragenomic comparisons (due to reciprocal gene losses in largely complementary fashion) are much clearer in cross-species comparisons (i.e., PARs).

Compared with the WGD events in grape, where 94.5% of the genome appears to be duplicated (3), the cereal WGDs are more complicated and degenerate. The dual approach of intragenomic and intergenomic comparisons provides a far more complete picture of the duplication landscape than afforded by either approach alone. The intergenomic approach is slightly more exhaustive, but has some limitations. First, interpreting the segments in a phylogenetic context without a hierarchical structure (which is inherent to the intragenomic approach) is more difficult; second, this approach shows duplications only in those regions where grape–rice synteny is well conserved, missing some duplicated blocks. In summary, neither the intragenomic nor the intergenomic approach provides an exhaustive list of duplicated blocks; rather, each method provides a complementary view of genome duplication and fractionation.

Consideration of duplications in both lineages improves inferences of correspondence between divergent genomes (or segments thereof). The 38 grapevine–rice PARs represent a qualitative advance toward a global view of monocot–dicot synteny (Fig. 5). Collinearity (represented by the white lines in the figure) appears to be disrupted around the pericentromeric regions in 10 of the 12 rice chromosomes, suggesting dynamic reorganization possibly resulting from massive transpositions or gene losses (21).

Fig. 5.
Schematic view of syntenic grape and rice regions. The color scheme is consistent with the original description (3), with different colors assigned to the different sets of triplicate chromosomes. The centromere positions for rice were retrieved from ...

Number of WGD Events in the Monocot Lineages.

In many lineages, the existence of WGD events, and the numbers of these events, remain unclear. Whether the vertebrate lineage had experienced two or three rounds of WGD was the subject of a long debate that was resolved only recently through a careful analysis of the synteny patterns of WGD paralogs (37). Similarly, various studies offered conflicting estimates of the number of WGDs in Arabidopsis (1, 38). Different sources of evidence might favor different models; in particular, estimates based on the distribution of genetic distances of paralogs or topologies of gene trees alone are now known to be complicated by unequal evolutionary rates between gene families and lineages (15, 39). Currently, analyses based on synteny patterns provide the most accurate inferences of WGD events (15).

Our unique approach to synteny analysis provides new insight into the number of WGD events experienced by modern cereal genomes. The pattern exemplified by the one PAR that we had the space to show with fine resolution (Fig. 4A), usually with 3-fold redundancies on the grape axis and at least 4-fold redundancies on the rice axis, is representative of all 38 PAR patterns (shown in Fig. S2). In 22 of the 38 PARs, grapevine–rice collinearity was clear, allowing us to evaluate the level of redundancy in both genomes (Table S4). These redundancies reflect the number of genome duplication events observable in both lineages. Among the 22 PARs, 12 were 3-fold redundant in grapevine, consistent with hexaploidy (3). The level of redundancy in rice was less clear, ranging from as little as 2-fold (one PAR) to 7-fold (three PARs) and 8-fold (five PARs). In line with the intragenomic evidence from our bottom-up analysis, these high redundancies suggest that the rice lineage experienced more than two, perhaps three, rounds of WGD.

Implications for Comparisons Between Cereal and Basal Genomes.

The high level of synteny and collinearity among cereal genomes has long been clear, but parallels to other monocots, such as banana (40), onion, and asparagus (41), have been more difficult to discern. The generally low synteny found in these previous studies may improve after redundancies within cereal (and other) genomes are accounted for.

The duplicated regions that we identified in rice also are evident in comparisons to banana, a nongrass monocot (40). Our synteny search in limited outgroup sequences revealed that two banana BACs (198Kb and 135Kb) match the set of rice regions in PAR17, which was used as an example in the rice–grape PARs (Fig. 4A). A sorghum genomic region (c3:67–68Mb) was selected as a cereal reference (Fig. 4B). Sorghum shows very strong synteny corresponding to the orthologous rice region (Os01:1720–1819), then lesser but still easily discernible synteny to one matching ρ-block (Os05:775–832), whereas σ-blocks (the remaining six regions) show only a few homologs. In contrast, banana–rice homolog concentrations in each duplicated regions are comparable, suggesting that the banana–rice divergence may have predated both ρ and σ duplications. The limited banana sequence data available prevent us from falsifying the alternative hypothesis that this lesser stratification of synteny patterns simply reflects greater divergence between banana and rice.

Synthesis and Ongoing Needs.

Progress in understanding and quantifying the ancient events that shaped eudicot and monocot genomes facilitates comparisons across plant lineages whose genomes have been dynamically restructured over the last 150 million years. Such comparisons promise to improve the understanding of the evolution of functional and regulatory complexity (33) that may have contributed to radiation of angiosperms, as well as the evolution of novel features that may have motivated aboriginal peoples to use and domesticate a subset of these plants. Clarification of angiosperm evolutionary history also provides a firm foundation on which to base translational genomics, the leveraging of structural and functional genomic information from botanical models to dissect specific traits in poorly characterized organisms, such as “orphan crops” that are staples for large and impoverished human populations but for which little genomic data exist (42).

Methods

Sequence Sources and Similarity Search.

We retrieved the rice gene set from the Rice Annotation Project (RAP2) (43), the sorghum gene set (Sbi1.4) from the Joint Genomics Institute (19), and the grapevine gene set from Genoscope (3). Two Musa balbisiana BACs (AC226052.1 and AC226053.1) were downloaded from GenBank, with putative gene models predicted using FGENESH (http://www.softberry.com). Similarity between two proteins was considered significant if the E-value of BLASTP (44) was < 1 × 10−10.

Bottom-Up Method for Identifying Ancestral Duplications.

With gene order alignment algorithm implemented in the software MCscan (24) and some manual curation, an approximate order of the ancestral cereal genome before the ρ-duplication was constructed from quartet alignments between sorghum and rice (19). Manual curation was done to remove minor segments with fragmentary synteny that overlap with major segments, so that the ρ-order represents the most recent event. Based on the comparisons within the ρ-order, more ancient blocks (σ blocks) were circumscribed in which gene pairs within 40 Manhattan distance units were clustered using a single-linkage algorithm. The Manhattan distance is calculated by combining the number of intervening genes in both regions. We focused our analysis on segments with >10 anchor points.

Clustering and Reconstruction of PARs.

Putative ancestral regions between grapevine and rice genomes were derived through clustering of syntenic segments, inspired by the methodology used in previous analyses of sea anemone and amphioxus genomes (36, 45). The whole analysis, streamlined in a set of computer programs, involves 3 major steps, as detailed here. Fig. S3 provides a graphical representation of the methodology.

Filtering of the Matching Set.

We first scanned for tandem gene families, defined as clusters of genes within 10 intervening genes from one another, and kept the longest peptide. Next, we used c-value filtering to exclude weak peptide matches. The c-value is defined as c(x,y) = b(x,y)/max {b(x,z) for z in Y or b(w,y) for w in X}, for each BLAST hit between peptide x in genome X and peptide y in genome Y. The c-value generalizes the concept of mutual best hit, because the mutual best hit would have a value of 1 (36). We used a c-value cutoff of 0.7, which implies that we excluded matches that were < 70% similar to the best match in either genome. The filtered BLASTP results contained 35,386 matches between 14,982 grape genes and 15,395 rice genes. The genes were reindexed according to the rank order on each chromosome.

Segmentation of Chromosomes and Scaffolds.

BLASTP matches within 40 Manhattan distance units were clustered for a first-pass evaluation of syntenic blocks, and as before the blocks with more than 10 gene pairs were retained. The start and stop boundaries of the first-pass syntenic blocks were used as indications of the breakpoints that disrupt otherwise even distributions of homologs. The chromosomes or scaffolds in both genomes were cut into “atomic” segments according to these breakpoints. (Note that some breakpoints can be shared by several synteny blocks.) A total of 180 and 266 “atomic” segments were identified in grape and rice, respectively, including the breaks created by chromosomal ends. Such segments are less affected by genome rearrangements and are suitable for defining simple synteny patterns.

Clustering of Segments Free of Rearrangements into PARs.

The segments from grape and rice identified above were compared in a pairwise manner, and homolog concentration scores (36) were calculated using -log(p), where p is the probability of observed number of homolog pairs as modeled by a Poisson distribution. For each segment, the array of scores against all segments in the other genome form a unique profile. The segments were then clustered based on the similarity of these profiles (determined by Pearson correlation coefficient r) using average linkage method. The clusters were defined at a cutoff of r = 0.3, as determined by visual inspection of the resulting clusters (Fig. S1). This resulted in 56 and 56 reconstructed regions in the grape and rice genomes, respectively. Significant synteny between the reconstructed regions was evaluated statistically by summing the likelihoods of observing as many or more gene pairs under the null hypothesis of these pairs occurring randomly. For all pairwise comparisons (56 × 56) in grape and rice, we kept 38 grape–rice comparisons that were significantly enriched for homologs (P < 1 × 10−10), using this stringent cutoff to limit consideration to particularly strong synteny. These 38 comparisons, each containing an ensemble of syntenic patterns, were referred to as PARs, and a unique PAR identifier was assigned to each.

Availability of Reconstructed Orders and Composition of PARs.

The compiled rice–sorghum ρ-order and σ-orders, and 38 grapevine–rice PARs, are available as downloadable EXCEL spreadsheets at http://chibba.agtec.uga.edu/duplication/par.

Calculation of Synonymous Substitutions (Ks).

For homologs inferred from syntenic alignments, the protein sequences were aligned using CLUSTALW (46), and the resulting protein alignments were used to guide coding sequence alignments by PAL2NAL (47). Ks values were calculated using the Nei–Gojobori method implemented in the yn00 program in the PAML package (48). In-house python scripts were used to pipeline all of the calculations. Extra caution was needed when calculating the Ks values for cereal genes, because there are 2 distinct groups of genes with significantly different third codon position GC content (GC3) (Fig. S4A). Ks values calculated using the Nei–Gojobori and Yang–Nielson methods were quite consistent for low-GC3 gene pairs, but differed significantly for high-GC3 gene pairs (Fig. S4B) (18, 27). Consequently, we chose to not use Ks values for gene pairs with average GC3 > 75%. [We chose 75% because this is the saddle point in the bimodal distribution in (Fig. S4A) and also was used in previous analyses (18, 27).] In addition, we considered Ks values >3.0 to indicate saturated substitutions at synonymous positions and also excluded these pairs from the later analysis.

GO Enrichment for Rice WGD Paralogs.

We tested the enrichment of GO broad terms in the two duplicate sets (4,831 ρ-duplicates and 1,098 σ-duplicates in rice, with some genes retained in both sets) using Fisher’s exact test, calculating the P value for the null hypothesis that there is no association between the duplicate status and a particular functional category. The P values were corrected with the total number of terms to account for multiple testing. Mappings from the rice genes to the molecular function terms were based on GO-SLIM assignments from the MSU rice annotation database (http://rice.plantbiology.msu.edu/).

Supplementary Material

Supporting Information:

Acknowledgments

We thank Jim Leebens-Mack for his helpful comments on the manuscript. Financial support was provided by the National Science Foundation (Grant MCB-0450260, to A.H.P. and J.E.B.; Grant MCB-0821096, to A.H.P.).

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission. D.E.S. is a guest editor invited by the Editorial Board.

This article contains supporting information online at www.pnas.org/cgi/content/full/0908007107/DCSupplemental.

References

1. Bowers JE, Chapman BA, Rong J, Paterson AH. Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature. 2003;422:433–438. [PubMed]
2. Paterson AH, Bowers JE, Chapman BA. Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc Natl Acad Sci USA. 2004;101:9903–9908. [PMC free article] [PubMed]
3. Jaillon O, et al. French-Italian Public Consortium for Grapevine Genome Characterization. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature. 2007;449:463–467. [PubMed]
4. Jaillon O, et al. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature. 2004;431:946–957. [PubMed]
5. Aury JM, et al. Global trends of whole-genome duplications revealed by the ciliate Paramecium tetraurelia. Nature. 2006;444:171–178. [PubMed]
6. Kellis M, Birren BW, Lander ES. Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature. 2004;428:617–624. [PubMed]
7. Zhang G, Cohn MJ. Genome duplication and the origin of the vertebrate skeleton. Curr Opin Genet Dev. 2008;18:387–393. [PubMed]
8. Ha M, Kim ED, Chen ZJ. Duplicate genes increase expression diversity in closely related species and allopolyploids. Proc Natl Acad Sci USA. 2009;106:2295–2300. [PMC free article] [PubMed]
9. Paterson AH, et al. Many gene and domain families have convergent fates following independent whole-genome duplication events in Arabidopsis, Oryza, Saccharomyces, and Tetraodon. Trends Genet. 2006;22:597–602. [PubMed]
10. Gu Z, et al. Role of duplicate genes in genetic robustness against null mutations. Nature. 2003;421:63–66. [PubMed]
11. Lynch M, Force AG. The origin of interspecific genomic incompatibility via gene duplication. Am Nat. 2000;156:590–605.
12. Bikard D, et al. Divergent evolution of duplicate genes leads to genetic incompatibilities within A. thaliana. Science. 2009;323:623–626. [PubMed]
13. Scannell DR, Byrne KP, Gordon JL, Wong S, Wolfe KH. Multiple rounds of speciation associated with reciprocal gene loss in polyploid yeasts. Nature. 2006;440:341–345. [PubMed]
14. Soltis DE, et al. Polyploidy and angiosperm diversification. Am J Bot. 2009;96:336–348. [PubMed]
15. Tang H, et al. Synteny and collinearity in plant genomes. Science. 2008;320:486–488. [PubMed]
16. Cui L, et al. Widespread genome duplications throughout the history of flowering plants. Genome Res. 2006;16:738–749. [PMC free article] [PubMed]
17. Paterson AH, et al. Toward a unified genetic map of higher plants, transcending the monocot‒dicot divergence. Nat Genet. 1996;14:380–382. [PubMed]
18. Wang X, Shi X, Hao B, Ge S, Luo J. Duplication and DNA segmental loss in the rice genome: Implications for diploidization. New Phytol. 2005;165:937–946. [PubMed]
19. Paterson AH, et al. The Sorghum bicolor genome and the diversification of grasses. Nature. 2009;457:551–556. [PubMed]
20. Salse J, et al. Identification and characterization of shared duplications between rice and wheat provide new insight into grass genome evolution. Plant Cell. 2008;20:11–24. [PMC free article] [PubMed]
21. Bowers JE, et al. Comparative physical mapping links conservation of microsynteny to chromosome structure and recombination in grasses. Proc Natl Acad Sci USA. 2005;102:13206–13211. [PMC free article] [PubMed]
22. Wang X, Tang H, Bowers JE, Feltus FA, Paterson AH. Extensive concerted evolution of rice paralogs and the road to regaining independence. Genetics. 2007;177:1753–1763. [PMC free article] [PubMed]
23. Zhang Y, Xu GH, Guo XY, Fan LJ. Two ancient rounds of polyploidy in rice genome. J Zhejiang Univ Sci B. 2005;6:87–90. [PMC free article] [PubMed]
24. Tang H, et al. Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps. Genome Res. 2008;18:1944–1954. [PMC free article] [PubMed]
25. Van de Peer Y. Computational approaches to unveiling ancient genome duplications. Nat Rev Genet. 2004;5:752–763. [PubMed]
26. Freeling M, et al. Many or most genes in Arabidopsis transposed after the origin of the order Brassicales. Genome Res. 2008;18:1924–1937. [PMC free article] [PubMed]
27. Shi X, et al. Nucleotide substitution pattern in rice paralogues: Implication for negative correlation between the synonymous substitution rate and codon usage bias. Gene. 2006;376:199–206. [PubMed]
28. Wang X, Tang H, Bowers JE, Paterson AH. Comparative inference of illegitimate recombination between rice and sorghum duplicated genes produced by polyploidization. Genome Res. 2009;19:1026–1032. [PMC free article] [PubMed]
29. Gaut BS, Morton BR, McCaig BC, Clegg MT. Substitution rate comparisons between grasses and palms: Synonymous rate differences at the nuclear gene Adh parallel rate differences at the plastid gene rbcL. Proc Natl Acad Sci USA. 1996;93:10274–10279. [PMC free article] [PubMed]
30. Hedges SB, Kumar S. Precision of molecular time estimates. Trends Genet. 2004;20:242–247. [PubMed]
31. Blanc G, Wolfe KH. Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell. 2004;16:1667–1678. [PMC free article] [PubMed]
32. Gout JF, Duret L, Kahn D. Differential retention of metabolic genes following whole-genome duplication. Mol Biol Evol. 2009;26:1067–1072. [PubMed]
33. Freeling M, Thomas BC. Gene-balanced duplications, like tetraploidy, provide predictable drive to increase morphological complexity. Genome Res. 2006;16:805–814. [PubMed]
34. Seoighe C, Gehring C. Genome duplication led to highly selective expansion of the Arabidopsis thaliana proteome. Trends Genet. 2004;20:461–464. [PubMed]
35. Liu H, Sachidanandam R, Stein L. Comparative genomics between rice and Arabidopsis shows scant collinearity in gene order. Genome Res. 2001;11:2020–2026. [PMC free article] [PubMed]
36. Putnam NH, et al. The amphioxus genome and the evolution of the chordate karyotype. Nature. 2008;453:1064–1071. [PubMed]
37. Dehal P, Boore JL. Two rounds of whole-genome duplication in the ancestral vertebrate. PLoS Biol. 2005;3:1700–1708. [PMC free article] [PubMed]
38. Vision TJ, Brown DG, Tanksley SD. The origins of genomic duplications in Arabidopsis. Science. 2000;290:2114–2117. [PubMed]
39. Fares MA, Byrne KP, Wolfe KH. Rate asymmetry after genome duplication causes substantial long-branch attraction artifacts in the phylogeny of Saccharomyces species. Mol Biol Evol. 2006;23:245–253. [PubMed]
40. Lescot M, et al. Insights into the Musa genome: Syntenic relationships to rice and between Musa species. BMC Genomics. 2008;9:58. [PMC free article] [PubMed]
41. Jakse J, et al. Comparative sequence and genetic analyses of asparagus BACs reveal no microsynteny with onion or rice. Theor Appl Genet. 2006;114:31–39. [PubMed]
42. Naylor RL, et al. Biotechnology in the developing world: A case for increased investments in orphan crops. Food Policy. 2004;29:15–44.
43. Itoh T, et al. Rice Annotation Project. Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana. Genome Res. 2007;17:175–183. [PMC free article] [PubMed]
44. Altschul SF, et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. [PMC free article] [PubMed]
45. Putnam NH, et al. Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science. 2007;317:86–94. [PubMed]
46. Larkin MA, et al. Clustal W and Clustal X, version 2.0. Bioinformatics. 2007;23:2947–2948. [PubMed]
47. Suyama M, Torrents D, Bork P. PAL2NAL: Robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 2006;34:W609–612. [PMC free article] [PubMed]
48. Yang Z. PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24:1586–1591. [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...