![]() | ![]() |
Formats:
|
||||||||||||||
Copyright © 2009 The Author(s) Structure and expression analysis of rice paleo duplications 1UMR 1095 INRA/UBP, Génétique, Diversité et Ecophysiologie des Céréales (GDEC), Domaine de Crouelle, 234, 63100 Clermont Ferrand, 2Biogemma, Z.I. Du Brézet, 63028 Clermont Ferrand Cedex 2, 3IRD, Avenue Agropolis, 34394 Montpellier Cedex 05 and 4Ecole Normale Superieure de Lyon, 69364 Lyon Cedex 07, France *To whom correspondence should be addressed. Tel: Phone: +33 4 73 62 43 80; Fax: +33 4 73 62 44 53; Email: jsalse/at/clermont.inra.fr The authors wish it to be known that, in their opinion, the first three authors should be regarded as joint First Authors. Received November 7, 2008; Revised December 15, 2008; Accepted December 15, 2008. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. This article has been cited by other articles in PMC.Abstract Having a well-known history of genome duplication, rice is a good model for studying structural and functional evolution of paleo duplications. Improved sequence alignment criteria were used to characterize 10 major chromosome-to-chromosome duplication relationships associated with 1440 paralogous pairs, covering 47.8% of the rice genome, with 12.6% of genes that are conserved within sister blocks. Using a micro-array experiment, a genome-wide expression map has been produced, in which 2382 genes show significant differences of expression in root, leaf and grain. By integrating both structural (1440 paralogous pairs) and functional information (2382 differentially expressed genes), we identified 115 paralogous gene pairs for which at least one copy is differentially expressed in one of the three tissues. A vast majority of the 115 paralogous gene pairs have been neofunctionalized or subfunctionalized as 88%, 89% and 96% of duplicates, respectively, expressed in grain, leaf and root show distinct expression patterns. On the basis of a Gene Ontology analysis, we have identified and characterized the gene families that have been structurally and functionally preferentially retained in the duplication showing that the vast majority (>85%) of duplicated have been either lost or have been subfunctionalized or neofunctionalized during 50–70 million years of evolution. INTRODUCTION Early studies with the first generation of molecular markers indicated the presence of duplicated loci on the genetic maps of various cereals, suggesting ancestral genome duplications and polyploidization events in the history of species that are now considered as diploids (1). In rice (i) restriction fragment length polymorphism mapping performed in the nineties suggested that chromosomes 1 and 5 (2) as well as chromosomes 11 and 12 (3) were ancient duplicates and (ii) comparative genomics studies on the sequence level also suggested ancient polyploidy in rice (4–6). The release of genome sequence drafts from japonica and indica rice subspecies allowed whole genome sequence comparisons and further characterization of duplications in rice (7–11). The most recent analysis (11) concluded that a whole-genome duplication event (involving 10 chromosome-to-chromosome duplication relationships) predated the divergence of cereal genomes 53–94 million years ago, while a more recent, independent duplication event between rice chromosomes 11 and 12 occurred 21 million years ago. Together, these duplications cover 65.7% of the genome. The identification of 163 or even 319 duplicated blocks in the rice genome has recently been published by Lin et al. (12) and Wang et al. (13), respectively. Unfortunately, many of these studies were based on low-stringency sequence alignment criteria, such as the direct use of pairwise sequence alignment information through BLAST expect or score values, and did not take into account the density and location of genes to identify precisely the structure and evolution of paralogous regions. Because it is difficult to infer paralogous relationships from sequence comparisons, expertized alignment criteria and statistical validation are required to (i) evaluate objectively and accurately whether the association between two or more genes in the same order on two chromosome segments occurs by chance or truly reflects duplications; (ii) eliminate the presence of massive background noise linked to the identification of artefactual paralogs necessary to produce a unique view of the duplicated nature of the rice genome from either 10 (11), 163 (12) or 319 (13) duplicated regions. Recently, we have reassessed the duplicated nature of the rice genome based on a combination of (i) new alignment criteria that increase analysis stringency and (ii) statistical tests to re-define interchromosomal duplications (14). We identified 29 rice duplications covering 72% (267 Mb) of the genome with an average density of one gene every 0.8 Mb involved in the duplications. Ten of the 29 duplications were those previously reported in the literature (11) covering 47.8% of the rice genome. The remaining 19 duplicated blocks associated with 539 paralogous gene pairs were newly identified in the study. Moreover, the identification of seven paleo-duplicated blocks (among the 29) shared with the wheat, maize and sorghum genomes allowed us to propose a model in which grass genomes have evolved from a common ancestor with a basic number of five chromosomes, by whole genome and segmental duplications, chromosome fusions and translocations.Gene duplication generates functional redundancy followed by either pseudogenization (i.e. unexpressed or functionless paralog), concerted evolution (i.e. conservation of function for paralog), subfunctionalization (i.e. complementary function of paralog) and neofunctionalization (i.e. novel function of paralog) during the course of genome evolution. Functional divergence either by subfunctionalization or neofunctionalization among duplicated genes is one of the most important sources of evolutionary innovation in complex organisms. Recent studies suggested that a majority of duplicated genes that are structurally retained during the evolution have at least partially diverged in their function (15,16). These studies were based either on (i) systematic studies of the changes in protein sequences through the estimates of synonymous (Ks) or non-synonymous (Ka) substitution per site between paralogs or (ii) the analysis of the timing, location and relative number of gene transcripts available in public expressed sequence tag (EST) databases. However, these approaches are only indirectly related to gene expression as it is generally assumed that (i) variation in substitution rate is not related to variation in the rate of expression divergence (17,18) and (ii) estimation of the level, location and timing of gene expression based on the available EST are limited to the type of cDNA libraries available, and these are not representative of all the spacial and temporal conditions of plant development. Recent micro-array studies in Arabidopsis clearly demonstrated that the vast majority of duplicated genes diverged in their expression profile (18–20). Ganko et al. (18) conclude that ~70% of gene pairs show asymmetric divergence based on micro-array data. Blanc et al. (19) previously concluded that 57% and 73% of the gene pairs acquired divergent expression patterns for recent and old duplication events in Arabidopsis, respectively. In order to analyze the impact of rice genome duplications on gene structure and expression, we produced a rice expression map (eMAP), based on expertized micro-array data collected from a unique platform, and compared expression profiles of all paralogous gene pairs identified in the rice genome. This allowed us to provide new insights into the structural and functional evolution of genes after a whole genome duplication event. MATERIALS AD METHODS Rice duplication analysis Rice sequence databases The sequences of the 12 rice pseudomolecules (build 4; 372 Mb) were downloaded from the TIGR web site (http://www.tigr.org/tdb/e2k1/osa1/pseudomolecules/info.shtml) as well as the annotation data for 42 653 genes and 13 237 transposable elements (TEs). The 42 653-gene ftp file was used in this study. Gene sequences (42 653) were used for the analysis of the rice duplications by taking into account the gene position (coordinates) on the pseudomolecules. Nucleic-acid sequence alignments We recently defined three new parameters [AL, cumulative identity percentage (CIP) and cumulative alignment length percentage (CALP)] by parsing the BLASTN results and rebuilding high scoring pairs (HSPs) for pairwise sequence alignments (14). The first, AL (aligned length), is the sum of all the HSP lengths. The second, cumulative identity percentage (CIP = ∑ nb ID/HSP/AL × 100), corresponds to the cumulative percent identity obtained for all the HSPs. The third parameter is cumulative alignment length percentage (CALP = AL/query length). It represents the sum of the HSP lengths (AL) for all the HSPs divided by the length of the query sequence. The CIP and CALP criteria allow identification of the best alignment, i.e. the highest cumulative percent identity in the longest cumulative length, taking into account all HSPs obtained for any pairwise alignment. Graphical display Duplications were graphically visualized using the CIRCOS software (http://mkweb.bcgsc.ca/circos/). Rice micro-array data Oligo micro-array design and production Oryza sativa Genome Oligo Set Version 1.0 was designed by Beijing Genomics Institute (BGI) and contained 60 727 70-mer oligos representing both indica and japonica genomes. All oligos were designed from cDNAs, EST sequences, predicted genes of BGI rice genome build and other public resources. The oligo set was spotted onto three Amersham Mirror T7 Star slides (21). Plant samples Samples were collected from Nipponbare (Oryza sativa, japonica) plants grown in greenhouse at 28°C. Photoperiod was 12 h and relative humidity was 70%. Seeds were sown in small pots containing 4/5 Neuhaus compost and 1/5 Pouzzolane. Kernels (endosperm and embryo) were sampled at 2–3 days after pollination (DAP), 5–6 DAP, 15 DAP and 30 DAP. Unfertilized pistils (ovaries and stigmates) were sampled at 2 days before pollination. These samples could be contaminated by pollen during dissection. Leaves were sampled at different growing stages and pooled. Roots were sampled on 12-day-old seedlings grown in sand. RNA isolation, probe labeling and hybridization Total RNA were extracted as described by Kay et al. (22). Quantification was performed using spectrophotometer and quality checked by gel migration. Oligo array hybridization mRNA (250 ng) was amplified and cDNA was labeled using Low DNA Input Fluorescent Linear Amplification kit and protocol (Agilent). Fifty nanograms of Cy3 and Cy5 (Amersham) were used for the labeling of the probes. Hybridization were performed with two technical repeats and fluorophore dye swaps using Corning hybridization chambers for 16 h at 37°C with an Amersham hybridization buffer RPK 0325 with 30% formamide. Additional RNA cleanup and DNase treatment were performed on Qiagen RNAeasy mini column (cat# 74106) according to the manufacturer's instructions. One microgram of total RNA was amplified using the aminoallyl MessageAmp a RNA Kit (Ambion# 1752). Antisense RNA synthesis was performed over 13 h, quality was controlled using the Agilent 2100 Bioanalyser and RNA was quantified with a Nanodrop ND-100. Five micrograms of lyophilized aminoallyl RNA were labeled by coupling of NHS Ester Dyes (Cy5 Mon-Reactive Dye Pack; Amersham # PA25001 et Cy3 Mon-Reactive Dye Pack; Amersham # PA23001) and purified according to the Ambion Aminoallyl MessageAmp a RNA Kit protocol. Fifty picomoles of each of the labeled aRNA (Cy5 and Cy3, respectively) were vacuum dried to a final volume of 9 μl and mixed with 1 μl of Ambion fragmentation buffer (cat# 8740), the solution was incubated 15 min at 70°C and supplemented with 1 μl of the Stop solution (included in the fragmentation buffer packaging) then stored on ice. The two fragmented solutions were mixed to 2.5 μl of 0.1 mg/ml sonicated herring sperm DNA (Sigma), denatured 2 min at 95°C and stored on ice. Then 7.5 μl of hybridization buffer 2 (Amersham RPK0325) and 9 μl of 100% deionized formamide was added to the denatured probe and directly used for hybridization. Hybridizations were performed overnight at 42°C in Corning hybridization chambers. For each biological replicate, a dye swap was performed. Slides were washed once in 1× SSC/0.2%SDS (10 min at 42°C), twice in 0.1× SSC/0.2%SDS (each 10 min at 42°C), twice in 0.1× SSC (each 1 min at room temperature) and 10 s in water at room temperature. Data processing and normalization Hybridized slides were scanned on Amersham Gen3 scanner. Data acquisition was performed using software package ArrayVision. Potential sample and dye intensity biases in micro-array data sets were visualized by using log intensity ratio (M) − mean log intensity (A) scatter plots constructed for each array, in which M [M = log (Cy5/Cy3) = (log Cy5 − log Cy3)] was plotted against A [A = (log Cy3 + log Cy5)/2] for each array spot, as described by Yang et al. (23). Array-specific data normalization of ratio value was then performed by using a robust local regression technique in the LOWESS procedure of LIMMA GUI package (24,25). Normalized data were then back transformed in order to obtain normalized intensity by using the following formula: log Cy3 = A − M/2 and log Cy5 = A + M/2. Intensity log values were finally corrected by median substraction. For a given spot and a given sample, we considered that the level of expression is different from the background when intensity log value >1.5 for root and leaf experiment and intensity log value >1 for the grain experiment. Expression of a given spot was considered significantly different between two samples when associated P-value FDR calculated on M value (with LIMMA GUI package) is <0.1 (less than 10%) of genes discovered to be potentially significant by chance alone. For the grain developmental kinetic, global profiles on the five developmental stages were considered when expression is significantly different between at least two successive samples (FDR threshold). Expression correlation To determine a cut-off r value below which duplicated gene pairs can be considered divergent, we calculated a Pearson correlation value (r ranking from −1 to +1) between the expression profiles of gene pairs. Any gene pairs with r = 0.52 can be considered to be significantly co-regulated at a P-value of 0.05, so we used r < 0.52 as a criterion for determining that two duplicated genes have diverged in expression as described in Blanc and Wolfe (19). RESULTS Rice eMAP The fourth release of the rice genome annotation (http://www.tigr.org/tdb/e2k1/osa1/pseudomolecules/info.shtml) contains 55 890 genes including 13 237 that are related to TEs. The 42 653 non-TE-related genes were used to perform a genome-wide transcriptome analysis based on a rice oligo-array to produce a rice eMAP (cf. Figure 1 727 spots (70-nt oligonucleotides) from the Oryza sativa Genome Oligo Set Version 1.0 designed by Beijing Genomics Institute (21). When aligning the 60 727 oligonucleotides with the rice genome annotation (TIGR v4), 41 865 genes were associated with single unambiguous rice genes on the basis of at least 90% sequence identity with its associated gene over at least 60 bp, providing evidence of mis- and/or non-annotated genes in rice (26,27). Thus, the transcriptome experiments performed with this array allow us to study the expression pattern of 32 493 genes annotated by TIGR (corresponding to 41 865 spots on the array).
We assessed rice gene expression during grain development as well as in root and leaf. The five samples for grain developmental kinetics (two technical replicates for each stage) were harvested just prior to pollination and at 2–3, 5–6, 15 and 30 days after pollination (DAP). Samples for the root (two technical replicates) and the leaf (two technical replicates), taken at single time points, completed our assessment of gene expression in three major rice tissues. We found 29 191 (89%) genes expressed (i.e. presence of signal) in at least one sample among the 32 493 genes present on the array (from the 42 653 annotated genes from TIGR), among which 2382 genes significantly and differentially expressed in root, leaf grain tissues based on normalized intensity log value as well as P-value FDR thresholds detailed in the materials and methods section (Figure 1 Ren et al. (28) reported recently the presence of coexpression domains for ~5% the rice genome based on a data set of 14 789 differentially expressed genes from affymetrix experiments. Moreover, several studies have suggested that coexpressed genes may participate in the same biological pathway (29). In order to test this hypothesis, 32 annotated transcription factors (TFs) [from a total of 373 TF in TIGR v4 (30)] were selected as associated with a single or several expression profiles in our data set in root (10 genes TF), grain (24 genes TF) and leaf (five genes TF), cf. highlighted with green stars on the Figure 1 653 (1.8%) genes were expressed in leaves, a random co-regulation value would be 1.8% of 100 (i.e. 1.8 genes), 300 (i.e. 5.4 genes) and 600 (i.e. 10.8 genes) genes for each physical window. Finally, for the 10 TF expressed in root, 2.6 ± 1.4, 7.2 ± 3.2 and 12 ± 5.1 genes are co-regulated, respectively, for the three physical windows considered. The whole rice eMAP showing a total of 803 among 42 653 (1.8%) genes expressed in roots, a random co-regulation value would be 1.8% of 100 (i.e. 1.8 genes), 300 (i.e. 5.6 genes) and 600 (i.e. 11.2 genes) genes for each considered physical window (cf. Figure 2
Improved identification of duplicated regions in the rice genome We recently published a refined analysis of the duplicated nature of the rice genome (14) identifying 29 duplications between chromosomes r1–r2/3/5/10/12, r2–r4/6/7/8/12, r3–r7/9/10/11/12, r4–r5/8/10, r5–r9/11/12, r6–r7/8/12, r7–r8, r8–r9/11, r9–r11 and r11–r12). These 29 duplications can be divided into two groups. The 10 major duplications (between chromosomes r1–r5, r2–r4/6, r3–r7/10/12, r4–r8/10, r8–r9 and r11–r12) cover 47.8% of the rice genome. The 19 additional duplicated regions (between chromosomes r1–r2/3/10/12, r2–7/8/12, r3–r9/11, r4–r5, r5–r9/11/12, r6–r7/8/12, r7/r8, r8–r11 and r9/r11) correspond to duplicated regions (i) not identified previously; (ii) superimposed on previous major duplications; (iii) small in physical size. They define novel relationships between the chromosomes and represent 24.2% of the genome. Thus, in total, the 29 duplications cover 72% (267 Mb) of the rice genome.In order to compare the expression profile between paralogous genes in rice, we have refined our previous analysis and then performed expertized alignments of sister blocks one by one from the 10 major duplicated blocks to identify precisely the number of paralogous gene pairs that had not been performed in our recent study (14). The 11 628 genes contained in the 10 block pairs were aligned [BLASTN (31,32)] using three new parameters for analysis: AL for aligned length, CIP for cumulative identity percentage and CALP for cumulative alignment length percentage (cf. Materials and methods section). With these parameters, BLAST produced the highest cumulative percent of identity on the longest cumulative length, thereby increasing the stringency in defining conservation between chromosome sequences. The rice block sequences were aligned against themselves (BLASTN) using 70% CIP and 70% CALP as cutoff to identify duplicated gene pairs. Paralogous genes (1440) were identified between r1–r5 (two blocks, 251 genes), r2–r4 (127 genes), r2–r6 (two blocks, 392 genes), r3–r7 (two blocks, 96 genes), r3–r10 (63 genes), r3–r12 (75 genes), r4–r8 (16 genes), r4–r10 (12 genes), r8–r9 (89 genes) and r11–r12 (319 genes) (cf. Figure 3
The largest duplication (duplication #3 in Figure 3 Mb located on chromosome 2, which has a paralogous counterpart on a 22.56 Mb region of chromosome 6, corresponding to a cumulative region size of 44.62 Mb involving 312 paralogs, i.e. an average density of one paralogous gene every 71.5 kb. The most gene-dense duplicated region (duplication #10 in Figure 3 Mb) and r11 (2.56 Mb) with 319 paralogs, i.e. an average density of one paralogous gene every 8.4 kb. This density was expected as the r11–r12 duplication event had been reported to be the most recent and, consequently, the most highly conserved (11,14). The identification of 10 known duplication blocks with the highest number of paralogous gene pairs (i.e. 1440 single copy paralogs) compared to previous data [i.e. 772 (11), 529 (14) and 539 (33)] clearly demonstrated that the introduction of new alignment criteria (CIP and CALP parameters) increased the stringency of the analysis, re-defined interchromosomal duplications in rice and was suitable for the comparative analysis of the expression profile between gene pairs.Comparative expression profiling of paralogous genes in rice When we compared the 1440 paralogous gene pairs to the 2382 expressed genes, we identified 115 gene pairs for which at least one gene copy was significantly expressed in one of the three tissues but only 14 gene pairs with significant expression for both copies. The remaining paralogous pairs were either not present on the array or not differentially expressed in our experiment. This result contradicts recent findings from genome-wide transcription analyses in rice using tiling arrays developed by Li et al. (26), in which the authors examined the transcriptional relationship between 1217 (compared to 1440 in the present study) marker pairs defining the same duplication events described here. They concluded that there was no significant difference in expression between the two genes forming paralagous pairs for all the duplications except some pairs of r4–r10 and r8–r9. Both the array and the probe used by the authors were fundamentally different from ours; in fact, they used a rice genome wide tiling array and, more importantly, a cDNA mixture from different tissues (shoots, roots, panicules and suspension cultures) rather than separate probes from each tissue. The mixed nature of the probe may well have masked expression differences in particular tissues, if both genes are expressed more strongly in a second tissue. In addition, the correlation was calculated on average intensity values that made it difficult to identify precise cut-off values to distinguish background noise from significant expression. In our data set, 88% of the gene pairs present were associated with a complete loss of expression for one of the paralogs in one of the three tissues involved. Figure 4
A more detailed analysis of the 10 (out of 37) paralogous couples, for which both genes of a pair were clearly expressed in grain, showed that the percentage of neofunctionalization was even higher. Figure 4 Gene ontology (GO) classification of rice paralogs The 42 653 annotated genes at TIGR are classified into 65 GO classes (available at http://gnn.tigr.org/tdb/e2k1/osa1/GO.retrieval.shtml). Figure 5
In an attempt to identify the gene functions that have retained the same expression pattern or have been preferentially neo- or subfunctionalized within duplicated blocks during the evolution after the duplication event, we represented in the Figure 5 DISCUSSION Structural divergence between rice paralogous blocks Re-analysis of the duplication within the rice genome based on improved sequence alignment criteria allowed the identification of 1440 gene pairs originating from paleo duplication in rice. These paralogs define 10 chromosome to chromosome duplication relationships involving 11 628 annotated genes. In total, 2880 (i.e. 1440 gene pairs) of the 22 884 genes present on the duplicated segments are structurally conserved in the duplicated regions. Since the time when these 10 duplications occurred as part of a whole genome duplication event (WGD) 50–70 million years ago, 87.4% of the duplicated genes have lost their orthologous counterparts. It is noteworthy that no sister blocks were identified as part of a centromeric region. Although transcribed genes are known to be present in these parts of the genome (35), these regions are rich in repeated elements that decrease the number of links between possible sister regions, leaving them difficult to detect by any BLAST alignment strategy. When considering the 65 GO classes associated with the 1440 rice paralogs, no bias is identified for gene function that may have been retained in duplication blocks suggesting that gene loss within duplicated chromosomal regions is a random process. Our results are consistent with recent findings on the duplicated nature of the Arabidopsis genome where ~20% of paralogs are retained within duplicated segments (36). More precisely, the authors stated that ~28% and ~13.5% of duplicated genes are retained in recent (date back to 24–40 MYA) and old (date back to the monocot/dicot divergence) duplication blocks, respectively (36,37). Our data on rice duplication (i.e. 12.6% of paralogs retained in sister blocks after 50–70 MY of evolution) combined with Blanc et al. (36) data on Arabidopsis duplication (i.e. 13.5% of paralogs retained in sister blocks after a double period, 100 MY, of evolution) clearly establish that gene loss after polyploidization is not a linear process. This concept is already visible between the 10 major duplication blocks in rice where 78.1% of the duplicates are retained within the r11–r12 duplication that date back between 14 and 27.3 MYA, whereas 12.3% of the duplicates are retained within the longest duplicated segment between r1 and r5 that date back 53.2–76.3 MYA (14). Associated with recent data obtained in dicots (i.e. Arabidopsis), our results clearly demonstrate that most of the genetic redundancy originating from polyploidy events is erased by a massive loss of duplicated genes by pseudogenisation in one of the duplicated segments soon after the polyploidization event (38). Expression divergence between rice paralogous blocks Detailed analysis of 115 gene pairs (from the 1440 paralogs) for which at least one copy is associated with significant expression (out of the 2382 profiles of the eMAP), clearly shows that 88%, 89% and 96% have diverged in their expression pattern in grain, leaf and root, respectively. When considering the 65 GO classes associated with 115 expressed gene pairs among the 1440 rice paralogs as described previously, clear biases are identified for gene functions that have retained the same function within duplicated blocks indicating a nonramdom process of gene expression divergence. The ‘endogenous stimuli’ gene function (blue star on the Figure 5 Previous studies in yeast and humans have also suggested a rapid phase of initial divergence between duplications (39–41). Moreover, our results are consistent with recent findings of Blanc et al. (19) where 57% (of 1137 genes pairs for young duplications) to 73% (of 420 gene pairs for old duplications) have diverged in expression based on a computational analysis of 62 Affymetrix micro-arrays experiment in Arabidopsis. Our data indicate a greater percentage of expression divergence. However, Blanc et al. (19) mentioned that the 73% of pairs that have diverged in expression in the context of old duplications is an underestimate if cross-hybridization occurred at a high rate in their experiment. Because many genes exert their function through interaction networks, a change in the expression pattern of one gene could drag along changes in the expression pattern of genes present in the same functional pathway. This could explain why we did not detect any difference in expression divergence for gene pairs present in the recent duplication between r11 and r12 dating back to 21 MYA compared with the other paleo-duplications suggested as part of a WGD dated between 53 and 94 MYA (11,14). Even if our estimates of expression divergence between gene pairs may be an underestimation of the true proportion of divergent pairs in rice, because the expression data set analyzed here is necessarily only a sampling of all the possible environmental conditions or tissues where the duplicated genes may be expressed, a large majority of the polyploidy-derived duplicated genes in rice have acquired divergent functions. Haberer et al. (42) noted that tandem as well as segmental duplicate gene pairs had divergent expression in Arabidopsis even when they shared many similar cis-regulatory sequences and suggested that changes to a small fraction of cis-elements could be sufficient for neofunctionalization or subfunctionalization. Finally, epigenetic differences between duplicates may contribute to rapid expression differentiation. Wang et al. (43) observed silencing of polyploidy-derived duplicates due to hypermethylation in Arabidopsis polyploids. Epigenetic mechanisms as well as interaction networks may be the origin of a extremely rapid expression divergence of gene duplicates soon after polyploid events. Time scale of structural and expression shuffling after duplications We concluded that our analysis supports the hypothesis that (i) only 12.6% of 22 884 paralogous genes are retained within rice sister blocks for which (ii) the expression pattern shift identified for more than 88% of the 115 gene pairs available is significantly impacted by neofunctionalization or subfunctionalization after duplication. Moreover, this data, identified for the 10 major duplications in rice, will apply to any cereal genome as these paleo-duplications have been suggested to correspond to a WGD that occurred before the cereal divergence from a common ancestor 50–70 MYA (11,14). Given the prevalence of gene and genome duplication in the evolutionary history of plants, evolution of development in angiosperms may differ from organisms where genome duplication is rare and where extensive expression divergence after duplication would have a profound impact on the evolution of developmental and regulatory networks. Our data support the idea that after 50–70 million years of evolution since the genome has undergone a polyploidization event, the vast majority of the paralogous genes have been lost within a sister block and that the remaining gene couples have largely diverged in their expression profile. A survey in gene expression variation between A. thalina and A. arenosa, which derived from a common ancestor ~1.5 MYA, clearly established ~2.5% of gene expression differences (44). Changes in gene expression can occur immediately after polyploidy formation or may need sometimes only two generations to establish expression status (33,45). This trend toward silencing (or gene loss via pseudogenization) or expression shift (via neo- or subfunctionalization) of a particular locus soon after a polyploid event may be advantageous for adaptation and establishment of a successful polyploid genome. Our data (i.e. >88% of expression divergence in rice within a 50–70 MY of evolution) complement previous findings [i.e. 2.5% and 57% of expression divergence in Arabidopsis within respectively a ~1.5 MY and ~32 MY of evolution (36,44)] demonstrated that within ~30 MY of evolution after a polyploidization event, more than 50% of the paralogs have been either lost or have been sub- or neofunctionalized in plants. CONCLUSION The access to large structural (genome sequences) and functional (micro-array data experiments) genomic resources gives a new insight into the impact of paleo polyploidy in plant. However, the present study demonstrates that in a short time scale after polyploidization, a large majority of duplicated genes are no longer maintained intact in the genome. Among those maintained, a vast majority have diverged in their expression pattern. This is a key process in creating biological novelties or adaptation to ecological environments by avoiding redundant function soon after duplication and by removing selection pressure allowing gene copy to evolve faster through subfunctionalization or neofunctionalization. FUNDING Data included in the current article including comparative genomics data as well as transcriptomics data were generated within programs partially funded by GENOPLANTE. Funding for open access charge: National French Cooperative Plant Genomics Program http://www.genoplante.com/. Conflict of interest statement. None declared. REFERENCES 1. Salse J, Feuillet C. Comparative genomics of cereals. In: Varshney RK, Tuberosa R, editors. Genomics-Assisted Crop Improvement. New York: Springer; 2007. pp. 177–205. 2. Kishimoto N, Higo H, Abe K, Arai S, Saito A, Higo K. Identification of the duplicated segments in rice chromosomes 1 and 5 by linkage analysis of cDNA markers of known functions. Theor. Appl. Genet. 1994;88:722–726. 3. Nagamura Y, Inoue T, Antonio BA, Shimano T, Kajiya H, Shomura A, Lin SY, Kuboki Y, Harushima Y, Kurata N, et al. Conservation of duplicated segments between rice chromosomes 11 and 12. Breed Sci. 1995;45:373–376. 4. Salse J, Piegu B, Cooke R, Delseny M. Synteny between Arabidopsis thaliana and rice at the genome level: a tool to identify conservation in the ongoing rice genome sequencing project. Nucleic Acids Res. 2002;30:2316–2328. [PubMed] 5. Salse J, Piegu B, Cooke R, Delseny M. New in silico insight into the synteny between rice (Oryza sativa L.) and maize (Zea mays L.) highlights reshuffling and identifies new duplications in the rice genome. Plant J. 2004;38:396–409. [PubMed] 6. Paterson AH, Bowers JE, Peterson DG, Estill JC, Chapman BA. Structure and evolution of cereal genomes. Curr. Opin. Genet. Devel. 2003;13:644–650. [PubMed] 7. Vandepoele K, Simillion C, Van de Peer Y. Evidence that rice and other cereals are ancient aneuploids. Plant Cell. 2003;15:2192–2202. [PubMed] 8. Paterson AH, Bowers JE, Chapman BA. Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc. Natl Acad. Sci. USA. 2004;101:9903–9908. [PubMed] 9. Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, et al. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science. 2002;296:79–92. [PubMed] 10. International Rice Genome Sequencing Project. The map-based sequence of the rice genome. Nature. 2005;436:793–800. [PubMed] 11. Yu J, Wang J, Lin W, Li S, Li H, Zhou J, Ni P, Dong W, Hu S, Zeng C, et al. The genomes of Oryza sativa: a history of duplications. PLoS Biol. 2005;3:266–281. 12. Lin H, Ouyang S, Egan A, Nobuta K, Haas B, Zhu W, Gu X, Silva JC, Meyers BC, Buell CR. Characterization of paralogous protein families in rice. BMC Plant Biol. 2008;8:18. [PubMed] 13. Wang X, Tang H, Bowers JE, Feltus FA, Paterson AH. Extensive concerted evolution of rice paralogs and the road to regaining independence. Genetics. 2007;177:1753–1763. [PubMed] 14. Salse J, Bolot S, Throude T, Jouffe V, Piegu B, Masood Quraishi U, Calcagno C, Cooke C, Delseny M, Feuillet C. Identification and characterization of shared duplications between rice and wheat provide new insight into grass genome evolution. Plant Cell. 2008;20:11–24. [PubMed] 15. Lynch M, Conery JS. The evolutionary fate and consequences of duplicate genes. Science. 2000;290:1151–1155. [PubMed] 16. Zhang J. Evolution by gene duplication: an update. Trends Ecol. Evol. 2003;18:292–298. 17. Zhang L, Vision TJ, Gaut BS. Patterns of nucleotide substitution among simultaneously duplicated gene pairs in Arabidopsis thaliana. Mol. Biol. Evol. 2002;19:1464–1473. [PubMed] 18. Ganko EW, Meyers BC, Vision TJ. Divergence in expression between duplicated genes in Arabidopsis. Mol. Biol. Evol. 2007;24:2298–2309. [PubMed] 19. Blanc G, Wolfe KH. Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell. 2004;16:1679–1691. [PubMed] 20. Duarte JM, Cui L, Wall PK, Zhang Q, Zhang X, Leebens-Mack J, Ma H, Altman N, dePamphilis CW. Expression pattern shifts following duplication indicative of subfunctionalization and neofunctionalization in regulatory genes of Arabidopsis. Mol. Biol. Evol. 2006;23:469–478. [PubMed] 21. Jiao Y, Ma L, Strickland E, Deng XW. Conservation and divergence of light-regulated genome expression patterns during seedling development in rice and Arabidopsis. Plant Cell. 2005;17:3239–3256. [PubMed] 22. Kay R, Chan A, Daly M, McPherson J. Duplication of CaMV 35S promoter sequences creates a strong enhancer for plant genes. Science. 1987;236:1299–1302. [PubMed] 23. Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002;15:e15. [PubMed] 24. Cleveland WS, Grosse E. Computational methods of local regression. Stat. Comput. 1991;1:47–62. 25. Wettenhall JM, Smyth GK. limmaGUI: a graphical user interface for linear modeling of microarray data. Bioinformatics. 2004;20:3705–3706. [PubMed] 26. Li L, Wang X, Stolc V, Li X, Zhang D, Su N, Tongprasit W, Li S, Cheng Z, Wang J, et al. Genome-wide transcription analyses in rice using tiling microarrays. Nat. Genet. 2005;38:124–129. [PubMed] 27. Li L, Wang X, Xia M, Stolc V, Su N, Peng Z, Tongprasit W, Li S, Wang J, Wang X, et al. Tiling microarray analysis of rice chromosome 10 to identify the transcriptome and relate its expression to chromosomal architecture. Genome Biol. 2005;6:R52. [PubMed] 28. Ren XY, Stiekema WJ, Nap JP. Local coexpression domains in the genome of rice show no microsynteny with Arabidopsis domains. Plant Mol. Biol. 2007;65:205–217. [PubMed] 29. Williams EJ, Bowles DJ. Coexpression of neighboring genes in the genome of Arabidopsis thaliana. Genome Res. 2004;14:1060–1067. [PubMed] 30. Xiong Y, Liu T, Tian C, Sun S, Li J, Chen M. Transcription factors in rice: a genome-wide comparative analysis between monocots and eudicots. Plant Mol. Biol. 2005;59:191–203. [PubMed] 31. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. [PubMed] 32. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. [PubMed] 33. Wang X, Shi X, Hao B, Ge S, Luo J. Duplication and DNA segmental loss in the rice genome: implications for diploidization. New Phytol. 2005;165:937–946. [PubMed] 34. The Rice Chromosomes 11, & 12 Consortia. The sequence of rice chromosomes 11 and 12, rich in disease resistance genes and recent gene duplications. BMC Biol. 2005;3:20. [PubMed] 35. Wu J, Yamagata H, Hayashi-Tsugane M, Hijishita S, Fujisawa M, Shibata M, Ito Y, Nakamura M, Sakaguchi M, Yoshihara R, et al. Composition and structure of the centromeric region of rice chromosome 8. Plant Cell. 2004;16:967–976. [PubMed] 36. Blanc G, Hokamp K, Wolfe KH. A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome. Genome Res. 2003;13:137–144. [PubMed] 37. Blanc G, Barakat A, Guyot R, Cooke R, Delseny M. Extensive duplication and reshuffling in the Arabidopsis genome. Plant Cell. 2000;12:1093–1101. [PubMed] 38. Chapman BA, Bowers JE, Feltus FA, Paterson AH. Buffering of crucial functions by paleologous duplicated genes may contribute cyclicality to angiosperm genome duplication. Proc. Natl Acad. Sci. USA. 2006;103:2730–2735. [PubMed] 39. Gu X, Zhang Z, Huang W. Rapid evolution of expression and regulatory divergences after yeast gene duplication. Proc. Natl Acad. Sci. USA. 2005;102:707–712. [PubMed] 40. Makova KD, Li WH. Divergence in the spatial pattern of gene expression between human duplicate genes. Genome Res. 2003;13:1638–1645. [PubMed] 41. Yang J, Su AI, Li WH. Gene expression evolves faster in narrowly than in broadly expressed mammalian genes. Mol. Biol. Evol. 2005;22:2113–2118. [PubMed] 42. Haberer G, Hindemitt T, Meyers BC, Mayer KF. Transcriptional similarities, dissimilarities, and conservation of cis-elements in duplicated genes of Arabidopsis. Plant Physiol. 2004;136:3009–3022. [PubMed] 43. Wang J, Tian L, Madlung A, Lee HS, Chen M, Lee JJ, Watson B, Kagochi T, Comai L, Chen ZJ. Stochastic and epigenetic changes of gene expression in Arabidopsis polyploids. Genetics. 2004;167:1961–1973. [PubMed] 44. Lee HS, Chen ZJ. Protein-coding genes are epigenetically regulated in Arabidopsis polyploids. Proc. Natl Acad. Sci. USA. 2001;98:6753–6758. [PubMed] 45. Chen ZJ, Comai L, Pikaard CS. Gene dosage and stochastic effects determine the severity and direction of uniparental ribosomal RNA gene silencing (nucleolar dominance) in Arabidopsis allopolyploids. Proc. Natl Acad. Sci. USA. 1998;95:14891–14896. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||
Nucleic Acids Res. 2002 Jun 1; 30(11):2316-28.
[Nucleic Acids Res. 2002]Plant J. 2004 May; 38(3):396-409.
[Plant J. 2004]Curr Opin Genet Dev. 2003 Dec; 13(6):644-50.
[Curr Opin Genet Dev. 2003]Plant Cell. 2003 Sep; 15(9):2192-202.
[Plant Cell. 2003]Proc Natl Acad Sci U S A. 2004 Jun 29; 101(26):9903-8.
[Proc Natl Acad Sci U S A. 2004]Plant Cell. 2008 Jan; 20(1):11-24.
[Plant Cell. 2008]Science. 2000 Nov 10; 290(5494):1151-5.
[Science. 2000]Mol Biol Evol. 2002 Sep; 19(9):1464-73.
[Mol Biol Evol. 2002]Mol Biol Evol. 2007 Oct; 24(10):2298-309.
[Mol Biol Evol. 2007]Mol Biol Evol. 2007 Oct; 24(10):2298-309.
[Mol Biol Evol. 2007]Plant Cell. 2004 Jul; 16(7):1679-91.
[Plant Cell. 2004]Plant Cell. 2008 Jan; 20(1):11-24.
[Plant Cell. 2008]Plant Cell. 2005 Dec; 17(12):3239-56.
[Plant Cell. 2005]Science. 1987 Jun 5; 236(4806):1299-1302.
[Science. 1987]Nucleic Acids Res. 2002 Feb 15; 30(4):e15.
[Nucleic Acids Res. 2002]Bioinformatics. 2004 Dec 12; 20(18):3705-6.
[Bioinformatics. 2004]Plant Cell. 2004 Jul; 16(7):1679-91.
[Plant Cell. 2004]Plant Cell. 2005 Dec; 17(12):3239-56.
[Plant Cell. 2005]Nat Genet. 2006 Jan; 38(1):124-9.
[Nat Genet. 2006]Genome Biol. 2005; 6(6):R52.
[Genome Biol. 2005]Plant Mol Biol. 2007 Sep; 65(1-2):205-17.
[Plant Mol Biol. 2007]Genome Res. 2004 Jun; 14(6):1060-7.
[Genome Res. 2004]Plant Mol Biol. 2005 Sep; 59(1):191-203.
[Plant Mol Biol. 2005]Plant Cell. 2008 Jan; 20(1):11-24.
[Plant Cell. 2008]Plant Cell. 2008 Jan; 20(1):11-24.
[Plant Cell. 2008]J Mol Biol. 1990 Oct 5; 215(3):403-10.
[J Mol Biol. 1990]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]Plant Cell. 2008 Jan; 20(1):11-24.
[Plant Cell. 2008]New Phytol. 2005 Mar; 165(3):937-46.
[New Phytol. 2005]Nat Genet. 2006 Jan; 38(1):124-9.
[Nat Genet. 2006]BMC Biol. 2005 Sep 27; 3():20.
[BMC Biol. 2005]Plant Cell. 2004 Apr; 16(4):967-76.
[Plant Cell. 2004]Genome Res. 2003 Feb; 13(2):137-44.
[Genome Res. 2003]Plant Cell. 2000 Jul; 12(7):1093-101.
[Plant Cell. 2000]Plant Cell. 2008 Jan; 20(1):11-24.
[Plant Cell. 2008]Proc Natl Acad Sci U S A. 2006 Feb 21; 103(8):2730-5.
[Proc Natl Acad Sci U S A. 2006]Proc Natl Acad Sci U S A. 2005 Jan 18; 102(3):707-12.
[Proc Natl Acad Sci U S A. 2005]Genome Res. 2003 Jul; 13(7):1638-45.
[Genome Res. 2003]Mol Biol Evol. 2005 Oct; 22(10):2113-8.
[Mol Biol Evol. 2005]Plant Cell. 2004 Jul; 16(7):1679-91.
[Plant Cell. 2004]Plant Cell. 2008 Jan; 20(1):11-24.
[Plant Cell. 2008]Plant Physiol. 2004 Oct; 136(2):3009-22.
[Plant Physiol. 2004]Genetics. 2004 Aug; 167(4):1961-73.
[Genetics. 2004]Plant Cell. 2008 Jan; 20(1):11-24.
[Plant Cell. 2008]Proc Natl Acad Sci U S A. 2001 Jun 5; 98(12):6753-8.
[Proc Natl Acad Sci U S A. 2001]New Phytol. 2005 Mar; 165(3):937-46.
[New Phytol. 2005]Proc Natl Acad Sci U S A. 1998 Dec 8; 95(25):14891-6.
[Proc Natl Acad Sci U S A. 1998]Genome Res. 2003 Feb; 13(2):137-44.
[Genome Res. 2003]