![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||
Copyright © 2009 Schreiber et al; licensee BioMed Central Ltd. Comparative transcriptomics in the Triticeae 1Australian Centre for Plant Functional Genomics, Univ of Adelaide, PMB 1 Glen Osmond, SA 5064, Australia 2Dept. of Plant Pathology and Center for Plant Responses to Environmental Stresses, Iowa State Univ., Ames, IA 50011-1020, USA 3Dept. of Agronomy and Plant Genetics, Univ of Minnesota, St. Paul, MN 55108, USA 4Scottish Crop Research Institute, Invergowrie, Dundee DD2 5DA, UK 5Corn Insects and Crop Genetics Research, USDA-ARS, Iowa State Univ, Ames, IA 50011-1020, USA Corresponding author.Andreas W Schreiber: andreas.schreiber/at/adelaide.edu.au; Tim Sutton: tim.sutton/at/adelaide.edu.au; Rico A Caldo: rico.a.caldo/at/monsanto.com; Elena Kalashyan: elena.kalashyan/at/acpfg.com.au; Ben Lovell: ben.lovell/at/acpfg.com.au; Gwenda Mayo: gwenda.mayo/at/acpfg.com.au; Gary J Muehlbauer: muehl003/at/umn.edu; Arnis Druka: arnis.druka/at/scri.ac.uk; Robbie Waugh: robbie.waugh/at/scri.ac.uk; Roger P Wise: rpwise/at/iastate.edu; Peter Langridge: peter.langridge/at/adelaide.edu.au; Ute Baumann: ute.baumann/at/adelaide.edu.au Received February 18, 2009; Accepted June 29, 2009. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background Barley and particularly wheat are two grass species of immense agricultural importance. In spite of polyploidization events within the latter, studies have shown that genotypically and phenotypically these species are very closely related and, indeed, fertile hybrids can be created by interbreeding. The advent of two genome-scale Affymetrix GeneChips now allows studies of the comparison of their transcriptomes. Results We have used the Wheat GeneChip to create a "gene expression atlas" for the wheat transcriptome (cv. Chinese Spring). For this, we chose mRNA from a range of tissues and developmental stages closely mirroring a comparable study carried out for barley (cv. Morex) using the Barley1 GeneChip. This, together with large-scale clustering of the probesets from the two GeneChips into "homologous groups", has allowed us to perform a genomic-scale comparative study of expression patterns in these two species. We explore the influence of the polyploidy of wheat on the results obtained with the Wheat GeneChip and quantify the correlation between conservation in gene sequence and gene expression in wheat and barley. In addition, we show how the conservation of expression patterns can be used to elucidate, probeset by probeset, the reliability of the Wheat GeneChip. Conclusion While there are many differences in expression on the level of individual genes and tissues, we demonstrate that the wheat and barley transcriptomes appear highly correlated. This finding is significant not only because given small evolutionary distance between the two species it is widely expected, but also because it demonstrates that it is possible to use the two GeneChips for comparative studies. This is the case even though their probeset composition reflects rather different design principles as well as, of course, the present incomplete knowledge of the gene content of the two species. We also show that, in general, the Wheat GeneChip is not able to distinguish contributions from individual homoeologs. Furthermore, the comparison between the two species leads us to conclude that the conservation of both gene sequence as well as gene expression is positively correlated with absolute expression levels, presumably reflecting increased selection pressure on genes coding for proteins present at high levels. In addition, the results indicate the presence of a correlation between sequence and expression conservation within the Triticeae. Background Considerable divergence has occurred between bread wheat (Triticum aestivum) and barley (Hordeum vulgare) since evolution from a common ancestor 10–14 million years ago. Since then, these two members of the Triticeae have been subjected to largely parallel processes of cultivation and domestication, starting in the fertile crescent over 10,000 years ago [1]. Barley has remained diploid with a base chromosome number of 7 (HH genome, 2n = 2x = 14) while bread wheat is the product of a series of hybridization events between related species that has resulted in an allo-hexaploid genome with three homoeologous sets of 7 chromosome pairs (AABBDD genome, 2n = 6x = 42 [2]). Despite these major genomic perturbations during its evolution, genetic mapping [3] and detailed structural genomic studies [4] have shown that the wheat and barley genomes are highly conserved. Indeed, barley chromosomes can even be substituted for wheat chromosomes [5]. As a consequence of its simplified genetics, many have suggested that barley is a good genetic model for its genetically more complex cousin. This assertion is supported by the broad range of common morphological and developmental characteristics shared by both species, though fundamental biological differences do exist (such as spike and spikelet morphology). Polyploidization is common across the plant kingdom and the process has been associated with a range of changes in newly synthesized hybrids of several species. These include the genome-wide removal of some (but not all) duplicated, and hence redundant, genetic information, sub- and/or neo-functionalization of duplicated genes, pseudogenization, differential cytosine methylation and epigenetic reprogramming of gene expression (silencing and activation), and transposable element activation (reviewed in [6]). Levy and Feldman [7] summarized some of the major consequences resulting from the recent polyploidization of the wheat genome. In common with other plant species, the outcome for wheat was more than simply the additive combination of genomes and included many of the features described across the species range [8-10]. Wheat is an important species for studying the impact of polyploidization because it is a relatively recent polyploid. Moreover, the outcomes can be studied in very early generations because it is possible to artificially re-synthesize polyploids from their diploid and tetraploid relatives. In such cases, epigenetic silencing of duplicated genes appears to be a common response, with indications of reciprocal silencing in different organs an early sign of sub-functionalization [11-15]. Gene activation or silencing may also occur as a result of transcriptional interference associated with stochastic rearrangements of non-coding RNA [8]. Over longer time frames, the evolutionary consequences of such events are better observed in ancient polyploids. In Arabidopsis (an ancient tetraploid), for example, Blanc and Wolfe [16] reported that more than half of the observed gene pairs retained in the genome exhibited differential transcript abundance in different tissues. An immediate impact of polyploidy is therefore to provide the raw genetic material for adaptation and the evolution of phenotype. The close evolutionary relationship between wheat and barley, reflected in largely parallel morphological and developmental patterns, makes a comparison of their transcriptomes particularly intriguing. It may provide insight, for example, into consequences of speciation and polyploidization. Ideally a genomic-scale comparison of this sort would be carried out once the genomes have been sequenced. This would permit the reliable disentanglement of the evolutionary relationships between individual genes and also provide the foundation on which to build dependable expression analysis platforms. Regrettably, the size and complexity of the wheat and barley genomes has been a major impediment to full-scale sequencing, so that even the diploid barley genome is not expected to be available before 2012 http://barleygenome.org/. In short, among plants comparative transcriptomics is rare: comprehensive pair-wise comparisons have so far only been carried out in rice and Arabidopsis [17], various cotton species [18] and in poplar and Arabidopsis [19]. Recently, a three-way study between Arabidopsis, poplar and rice has also appeared [20]. Compared to genome-wide studies, comparisons of expression patterns of individual orthologous gene pairs, individual gene families and/or in connection with a particular phenotypic characteristic are more frequent. For example, Mangelsen et al. [21] compared, within a number of tissues, expression patterns of members of the WRKY transcription factor family among barley, rice and Arabidopsis and found that, at least within this gene family, coordinated conservation of expression patterns and sequence. Horvath et al. [22] found that groups of genes associated with cell division were consistently expressed preferentially in shoot apices in Arabidopsis, wild oats, poplar and leafy spurge. Differential gene expression, on the other hand, has been observed in some members of the ZIP and NAS metal homeostatis gene families in two closely related Arabidopsis species when exposed to both low and high Zn levels, presumably associated with different Zn accumulation patterns in these two species [23]. Analogously, differential time-dependent expression of a small number genes in response to salt stress in both barley (relatively salt tolerant) and rice (relatively salt sensitive) were studied by Ueda et al. [24], while Taji et al. [25] performed a similar comparative study in salt cress (tolerant) and Arabidopsis (intolerant). Comprehensive Affymetrix GeneChip platforms have now been developed for both wheat and barley, based on extensive EST collections for both species (Ref. [26]; http://www.plexdb.org/index.php). The Barley1 GeneChip has already been used to develop an atlas of gene expression covering the entire developmental cycle of the barley cultivar Morex [27] and intra-species varietal comparisons have been carried out both for Morex and Golden Promise [27,28] as well as Morex and Steptoe [29,30]. Taking advantage of these resources, we have sampled a similar set of biological material collected through the developmental cycle of wheat (Chinese Spring), grown under near-identical conditions to those in Druka et al. [27]. This permits the first comprehensive comparison of developmental expression patterns in these two important crop species. We report on this transcriptome-wide comparison here. At the same time, in order to facilitate more detailed studies of individual homologous genes motivated, say, by particular phenotypic differences as in [21-25], we make available a convenient web-based comparative tool enabling access to the developmental expression profiles of any individual wheat and barley homologs probed by the two GeneChips. It is well known that meaningful comparative expression analyses using microarray platforms based solely on EST collections can be difficult because of the frequent and confounding presence of multiple splice forms, paralogs and orthologs, as well as, in the case of polyploids, homoeologs with near-identical sequence [31]. Because of this, we have also investigated, in some detail, the specificity of the Wheat GeneChip to individual homoeologs and expended considerable effort to avoid misidentification of orthologs in the two species. Results Gene expression measurements were carried out on a developmental tissue series for wild-type wheat (cv. Chinese Spring) using the Affymetrix Wheat GeneChip. Tissues and developmental stages were chosen to match the barley (cv. Morex) tissue series of Druka et al. [27], employing the Barley1 GeneChip, as closely as possible. They consisted of root tissue at two different developmental stages, leaf, crown, caryopsis, anther, pistil, inflorescence, bracts, mesocotyl, endosperm, embryo and coleoptile (for details, see Materials and Methods: Microarray experiment). This wheat expression dataset may be obtained from and visualized at PLEXdb http://www.plexdb.org, experiment TA3, or GEO http://www.ncbi.nlm.nih.gov/projects/geo, Experiment GSE12508. Because the 61,115 probesets on the Wheat GeneChip reflect the complete collection of publically available wheat ESTs at the time of design of the chip, this dataset serves as an 'expression atlas' for hexaploid wheat. While in this paper we concentrate on the comparison of the barley and wheat transcriptomes, the comprehensive nature of the dataset means that it may be used by wheat researchers seeking to explore correlations in transcript levels across tissues to discover putatively co-regulated and/or functionally related genes and it provides a baseline against which transcription in biotically or abiotically stressed plants, other cultivars and mutants may be compared. Using single varieties as representatives for both barley and wheat may to some extent be an oversimplification as it is known that considerable variation can exist among varieties of the same species. Indeed, extensive intra-species variation in barley has been used (employing the Barley1 GeneChip) as a genotyping tool for a cross between the barley varieties Steptoe and Morex [29,32] and for expression polymorphisms among Morex, Steptoe, OWB REC, OWB DOM, Barke, Haruna Nijo, Golden Promise and Optic (PLEXdb accession BB20, ArrayExpress E-TABM-113). We compared the relative importance of intra- and interspecies variation by making use of a common series of 6 tissues from the barley cultivars Golden Promise and Morex (taken from Ref. [27]). For the purpose of the comparison of the transcriptomes of the two species, we found it convenient to define a set of probesets, obtained by eliminating those probesets that were potentially unreliable for various reasons (for details, see Table 1 and Materials and Methods: The Wheat GeneChip). This set of probesets will be referred to as the "high quality" set throughout this paper. This resulted in an expression dataset for 13,822 wheat probesets that could be compared, across 13 tissues, with the equivalent dataset obtained with 12,549 barley probesets hybridized by Druka et al. [27].
Polyploidy and the Wheat GeneChip A comparison of the transcriptomes of barley and wheat is complicated by the fact that the latter is a hexaploid, but the design of the probesets on the wheat GeneChip did not specifically take this polyploidy into account. Hence, depending on the stringency of the probeset design, an expression profile obtained from the wheat GeneChip may receive contributions from one, two or three homoeologs. This has the potential to complicate a comparative study of gene expression in the two species. We explored this issue by comparing the probes on the wheat GeneChip with known sequences of wheat homoeologs. In Mochida et al. [33], ESTs for 90 genes were assigned to the A, B and D genomes using nullisomic lines of Chinese Spring. These authors found that 11 genes had one of the homoeologs silenced while 79 exhibited transcript accumulation for all three homoeologs. We extracted all 25-mer probe sequences on the wheat GeneChip that exhibited a perfect match to this set of 79 × 3 = 237 sequences. Discarding probes from those probesets which are known to either cross-hybridize or are likely to hybridize inefficiently (for details, see Materials and Methods: The Wheat GeneChip), as well as homoeologous triplets for which only a very small number (< 7) of matching probes are present on the GeneChip, left 56 homoeolog triplets for which relatively reliable expression information was available from the microarray. The distribution of matching probes for these 56 triplets is shown in Figure Figure1.1
Homolog identification on the Wheat and Barley1 GeneChips Putative homologous sequences on the wheat and barley GeneChips were inferred using a clustering approach based on sequence similarity [34-38]. This methodology was chosen over, for example, direct putative ortholog identification via the traditional best reciprocal Blast hit (RBH) method [39-41] because (a) the latter is known to be unreliable in the presence of multiple paralogs [42], (b) the RBH method can lead to misidentifications due to the incomplete coverage of the wheat and barley genomes [43-45] and (c) the RBH method could not simultaneously detect the frequent presence of both 3' and 5' fragments of individual genes on the wheat GeneChip. For details relating to this clustering, see Materials and Methods: Homolog identification on the wheat and barley GeneChips. As a resource for the Triticeae community, we have built an on-line tool where these "homologous clusters", as well as their associated expression profiles and sequence annotation, can be interrogated. This tool, the WebComparator, is available at http://contigcomp.acpfg.com.au. It may be used to assess similarities and differences of expression profiles of 10,708 individual homologous clusters. Expression profile differences may be due to biological differences, but quite frequently may simply reflect problems with hybridization efficiencies for individual probesets. The tool, therefore, is also useful in assessing the reliability of individual probesets on the Affymetrix Wheat and Barley1 GeneChips. Apart from similarities and differences in expression profiles of genes in individual homologous clusters, underlying global patterns are observable in the two datasets and it is those on which we focus here. Gene expression in wheat and barley is highly correlated Direct gene-by-gene comparison of expression profiles in wheat and barley is complicated by the ambiguities associated with the many-to-many relationships characteristic of the homologous clusters. However, for the class of clusters consisting of exactly one wheat probeset and one barley probeset, ortholog association is less ambiguous: apart from the aforementioned general insensitivity to individual wheat homoeologues, one would expect this class to be enriched for probesets targeting single-copy and single-spliceform genes. Using only the "high quality" probesets (Table 1), this group of clusters consists of 1,875 sequence pairs. Transcript profiles for these probesets are shown as heat maps in Figure Figure22
Systematic relative shifts in expression levels are frequent in wheat and barley While expression profiles in wheat and barley are highly correlated, it is also of interest to know whether the overall level of expression is similar. In this case the Euclidean distance between corresponding expression profiles should be small. Indeed, the accumulation of points near the origin of the Euclidean distance plot shown in Figure Figure4A4A
If this unexpected difference of signal intensities in wheat and barley were to reflect an underlying difference in mRNA levels for these genes it would be of interest to compare the corresponding protein levels in these two species. This might indicate a surprising shift of regulatory control from the transcriptional to the translational level. However, because overall shifts in signal intensities as measured by two different platforms can easily have a technical origin we have sought to independently verify the effect using quantitative real-time PCR (QPCR). The results are shown in Figure Figure5,5
We first concentrate on those data points, marked by a "W", where the wheat profile obtained with the microarray is significantly higher than the corresponding barley profile. Expression data for barley (Panel A, Figure Figure5)5 For sequence pairs where the barley profile is systematically higher than its wheat counterpart (i.e. those shown in the upper boxed portion of Figure Figure4A4A 5' Probesets on the Wheat GeneChip hybridize unpredictably The comparison of expression information between wheat and barley provides a unique opportunity to assess the reliability of the hybridization signal from the wheat GeneChip. This is important because the wheat microarray contains, apart from the usual probesets designed for sequences for which there is good evidence that they are near the 3' end of the gene (such as the presence of a poly-A tail), a large number of probesets for which this evidence does not exist (see Materials and Methods: The Wheat GeneChip; for convenience, we refer to the latter as "5' probesets"). Because of mRNA degradation away from the 3' end, one might expect the latter to lead to a reduced signal. Individual 3' and 5' probesets can be inferred to correspond to the same gene if they both show strong sequence similarity to a barley sequence but at the same time do not have a significant similarity with each other. Indeed, of the approximately 5,300 consensus sequences from the 5' set homologous to a barley sequence, 78% can be associated to a wheat 3' sequence in this way. A subset of these sequences is contained in the class of homologous clusters shown in Figure Figure6A.6A
The distribution of correlations of the expression profiles from this set of 3' and 5' wheat sequences is shown in Figure Figure6B.6B Conservation and divergence of gene function in wheat and barley Gene and protein expression studies indicate that, in general, sequence divergence after duplication events is associated with a divergence of functionality in the resulting paralogs, presumably because of reduced selection pressure after duplication [46,47]. In fact, it is widely believed that gene duplication – either individually or as part of genome-scale duplication events – is crucial for providing the resource for the subsequent evolution of genes with new functions [48,49]. Genome-wide studies of this sequence divergence have mostly been undertaken in sequenced organisms separated by reasonably large evolutionary distances. The wheat and barley tissues series permit such a study in these more closely related grass species. Consider homologous clusters of the type presented in Figure Figure7.7
A measure of functional divergence, on the other hand, is once more provided by the correlation between wheat and barley expression profiles. For the cluster shown in Figure Figure7A,7A
Dosage constraints in wheat and barley While gene duplication provides opportunities for the evolution of new gene function, it has also been argued that selective pressure can maintain original function after a duplication event. This might occur if the gene codes for part of a protein complex [50], thus imposing strong stoichiometric constraints, or if 'buffering' of crucial functions [51] is required. In addition, studies in yeast [52] and Paramecium [53] indicate that dosage constraints may account for the inhibition of divergence of duplicated genes. These authors found that, at least in these two species, duplicate copies of genes are more likely to be retained if the expression level is high than if it is low, presumably in order to maintain high transcript levels. We have compared the wheat and barley transcriptomes to see if such a correlation between maintenance of gene function and expression level persists in these two species. Again, we use conservation of the expression profiles (i.e. the correlation) between wheat and barley genes as an indirect measure of conservation of function. The results, using all wheat and barley probesets from the "high quality" set linked by a reciprocal Blast hit, are shown in Figure Figure8A.8A
Discussion Gene expression in wheat and barley is highly correlated Our results (Figures (Figures22 Conservation and divergence of gene function in wheat and barley The evolution of new functional roles for duplicated genes can depend on a number of mechanisms. Most directly, accumulation of sequence changes in coding regions can lead to changes in protein structure, either by substitution of amino acids or through the evolution of new splice variants. This type of sequence divergence after gene duplication is well documented [47]. Numerous studies have observed the expected decreased selective pressure on sequence conservation after duplication through comparisons of synonymous and non-synonymous nucleotide substitutions [54]. A second mechanism is provided through alteration of gene regulation imparted through mutation in cis-acting elements and/or alterations of trans-acting factors. This alters a gene's expression repertoire even in the absence of changes in its sequence. The relationship between these two mechanisms also sheds light on the importance of selective pressure in the evolutionary process; random drift under a neutral model would result in unrelated expression and sequence changes, while strong selective pressure would be reflected by a positive correlation. The results in Table 2 provide an indication that gene activity patterns across tissues and accumulation of sequence changes after gene duplication are positively correlated in wheat and barley, as one would expect in the presence of selective pressure [55]. This finding is in agreement with some of the studies in model species such as yeast [56], C. elegans [57], Drosophila [58] and primates [59]. However, these results are by no means universal and in other studies no clear correlation is observed [60]. For example, no [61] or weak [16] correlation was observed in two studies using the model plant Arabidopsis. It is likely that the lack of agreement reflects differences in approach; for example, in some studies sequence divergence is measured on the protein level, in others it is quantified through the rate of non-synonymous substitutions and others through the rate of synonymous substitutions. In some studies tandem duplications are separated out from ancient segmental duplications, while in others (including ours) they are not. Expression divergence, on the other hand, will clearly depend on the number and types of different temporal, spatial and environmental conditions which are probed. We believe that at present the results shown in Table 2 should be seen as indicative only. Gene duplication and subsequent divergence is only one possible source for the apparently correlated divergence of sequence and expression. A second source for this correlation might be found in the generation of new alternative splice forms rather than new genes, with the original splice form maintaining its expression pattern but the new one diverging. Furthermore, as opposed to the species mentioned above, wheat and barley do not have sequenced genomes and this naturally has an influence on the reliability of the available sequence and expression information. Here sequence divergence was assessed purely on the basis of the existence (for W1 and B1 in Figure Figure7)7 A second effect apparent in Table 2 should be treated with even more caution. The expression profiles of duplicated wheat genes appear to be considerably less similar to that of their barley ortholog (correlation coefficients ~0.33 & 0.15) than duplicated barley genes and their wheat ortholog (correlation coefficients ~0.51 & 0.43). These correlations can be compared to those 1,875 sequence pairs discussed earlier where there is no evidence of gene and/or splice form duplication (correlation ~0.66). While it might be tempting to conclude that expression patterns in wheat evolve faster than their counterparts in barley, this signal is sensitive to any asymmetry in design of the two GeneChips. In particular, as discussed in [26] and at http://www.affymetrix.com, the EST clustering procedures used in the construction of the Barley1 and Wheat Genechips was rather different, particularly in their treatment of potential splice-variants. It could well be these differences has led to differing proportions of probesets on the two GeneChips designed to individual splice forms. Because both alternative splice forms as well as gene duplications potentially contribute to the asymmetries in Table 2, a differing admixture of the two could easily be responsible for this asymmetry between the species. Highly expressed genes show correlated expression in wheat and barley Numerous studies have shown that genes transcribed at high levels tend to evolve more slowly than genes transcribed at low levels [62-64]. As already mentioned above, the "rate of evolution" in this type of study is quantified by sequence divergence, either on the DNA level by counting synonymous and/or non-synonymous substitutions or on the protein level by counting amino acid changes. The results shown in Figure Figure8B8B Systematic shifts in expression levels and platform dependant biases While differential hybridization signal (i.e. fluorescence) intensities in a select number of tissues are highly likely to be indicative of true biological differences between wheat and barley, great care must be taken in interpreting overall shifts of fluorescence levels in the same way. The RNA extractions for these experiments were performed in two different laboratories and, of course, with different species using GeneChips with two different design philosophies. RNA hybridization to the GeneChips, on the other hand, was performed at the same facility in a virtually identical manner. Even issues such as the poor hybridization efficiency of the large number of wheat 5' probesets are likely to have an impact on the relative normalization. Without an absolute standard, a rigorous relative calibration of the two datasets therefore seems very difficult, if not impossible. Our approach to this issue was two-fold: firstly, while one might not be able to control the overall normalization uncertainty one can attempt to estimate its importance and secondly, as described, we performed additional QPCR measurements in an attempt to verify interesting outcomes. An estimate of the overall normalization uncertainty is provided by the data shown in Figure Figure4A.4A For those probesets where there is a systematic decrease in the observed fluorescence levels in barley as compared to wheat, the barley QPCR results do not confirm the barley microarray results. This indicates that for these probesets, at least, the difference has a technical rather than biological origin. An obvious possibility for a lack of signal is the possible presence of single feature polymorphisms (SFPs; see Ref. [28]) between the probes on the Barley1 array and the Morex mRNA being hybridized to it. If this is the case, the rough consistency of QPCR and microarray results for wheat, for those probesets with enhanced microarray fluorescence levels in barley, would indicate that the SFPs do not play as much of a role in the hybridization of Chinese Spring to the wheat GeneChip as they do for the hybridization of Morex to the Barley1 GeneChip. There is corroborating evidence supporting this conjecture. The Barley1 GeneChip was designed using the ESTs collected from 84 libraries, originating from EST projects in Japan, Finland, Germany, Scotland, and the US, respectively. Five major and a few minor cultivars were used, representing the favorite from each project. In the end, the majority of these ESTs were from Barke (Germany) and, to a lesser extent, Morex (US) [26]; in total, only about 1 in 7 ESTs used to design Barley1 came from Morex [28]. The dominant cultivar in the EST collections used to design the wheat GeneChip, on the other hand, is Chinese Spring. One would, therefore, expect a greater prevalence of SFPs between Morex mRNA and the probes on the Barley1 GeneChip as compared to SFPs between Chinese Spring mRNA and the probes on the wheat GeneChip. We have compared the sequences in current publicly available EST collections against the probe sequences on the two GeneChips and find that this is indeed the case: from this comparison, we estimate that the probability of any mismatch between a 25mer barley probe and Morex sequence to be around 2.9% while for the wheat GeneChip the probability of a mismatch between a probe and a Chinese Spring sequence is around 1.5%. Assuming independence, this implies that in Figure Figure4A4A Conclusion We have performed a comparative study of gene expression in barley and hexaploid wheat, using 13 different tissues and developmental stages. The comparison has been achieved through the clustering of almost 84,000 wheat and barley sequences represented on the Affymetrix wheat and barley GeneChips into homologous clusters, with over 10,700 clusters containing more than one sequence. Detailed comparisons of expression profiles for all of these sequences have been made available at http://contigcomp.acpfg.com.au and individually the two gene expression atlases can be explored further at http://www.plexdb.org, accession numbers BB3 and TA3. We have established that on the whole there are strong similarities between expression patterns of homologous genes in the two species. This conclusion could only be reached, however, by first taking into account the differing designs of the two GeneChips. Among several confounding factors, the most significant is the presence of over 32,000 probesets on the wheat GeneChip not clearly anchored to the 3' end of gene sequences. The expression profiles obtained with these probesets and, particularly, the comparison to expression profiles of homologous barley sequences clearly shows that most lead to a significantly compromised signal. In this way, our comparative results provide a significant resource aiding the interpretation of the hybridization signal from individual probesets in future experiments employing the wheat GeneChip. Our results indicate that the hybridization signal obtained from the wheat GeneChip generally does not differentiate between wheat homoeologs. Detailed study of homoeolog expression patterns across tissues awaits the construction of microarray platforms that specifically target regions of homoeolog sequence divergence and/or studies employing direct transcriptome sequencing. Finally, we have used several high-quality subsets of our expression datasets to investigate some of the more prominent, but nevertheless comparatively small, systematic differences between the wheat and barley data. As is to be expected, we found that great care must be taken to distinguish genuine differences in the transcriptomes from artifactual differences due either to the dissimilar design of the GeneChips and/or the disparity in our current knowledge of the wheat and barley genomes. Examples of the latter include a systematic shift in absolute expression found in a significant number of wheat and barley putative orthologs. On the other hand, we also found a comparatively clear indication that highly expressed wheat and barley genes tend to be evolutionarily conserved, both in sequence as well as transcriptional activity. This observation for these two grasses is in agreement with results from previous studies of model species. Methods Experiment design Wild-type wheat (Triticum aestivum L. cv. Chinese Spring) was grown in a temperature-controlled growth room with 16 h light (22°C) and 8 h dark (16°C) at approximately 80% humidity. Thirteen plant tissues were selected to represent the major stages of wheat development and to mirror the experiment of Ref. [27] for barley (cv. Morex). The number of plants harvested and the developmental stages selected are as described in [27], with the following exception: while in the latter samples were collected for three stages of caryopsis (namely, 5, 10 and 16 DAP), for wheat only caryopsis 3–5 DAP was used. Three independent biological samples (replicates) represented a tissue type. RNA isolation and quality checking were performed as described in Ref. [27]. Labeling and hybridization to the Affymetrix wheat GeneChip was carried out at the Iowa State University GeneChip facility http://www.biotech.iastate.edu/facilities/genechip/Genechip.htm. Background subtraction and normalization for both experiments was performed using the RMA normalization procedure [65,66] and the three biological replicates were averaged. The data is expressed on a logarithmic scale (base 2), as usual. Data access All detailed data and protocols from these experiments have been deposited in PLEXdb http://www.plexdb.org/, a unified public resource for gene expression for plants and plant pathogens. Files are categorized under accession TA3 for the wheat gene atlas and BB3 for the barley gene atlas. TA3 has also been deposited at NCBI-GEO as accession GSE12508. The Wheat GeneChip It is crucial to take into account the different design philosophies of the Affymetrix wheat and barley GeneChips when comparing the transcriptome data obtained with them. In order to maximize the reliability of results we impose rigorous constraints to arrive at a set of probesets that may be judged to be reliable. The design of the Barley1 GeneChip has already been discussed in detail in Ref. [26]. Additional information on both GeneChips can be found in the technical support section of http://www.affymetrix.com. Here we briefly summarize the relevant details of the wheat GeneChip. This GeneChip contains, apart from a small number of reporters and controls, 61,115 probesets. All but 73 of these are made up of 11 25-mer perfect-match (and accompanying mismatch) probes. As is usual for these GeneChips, there are a number of probesets where one or more probes are known to cross-hybridize in one way or another (for details, see Appendix B of the "GeneChip Expression analysis manual" available at http://www.affymetrix.com): their names are suffixed by "_s_at" (2617 probesets), "_x_at" (6766 probesets) and "_a_at" (3321 probesets). Because of the danger of unwanted cross-hybridization complicating ortholog identification across the two species, as well as the fact that in any case these probesets are often provided in addition to uniquely hybridizing probesets, we do not include them in the comparative analysis carried out in this paper. For the same reason, the results presented here only make use of those probesets from the Barley1 GeneChip for which ESTs could be assembled into a contig. Singleton ESTs tend to have shorter sequence, increasing the chance that confusion arises when trying to match them to a particular sequence present on the other GeneChip. Finally, in our comparative analysis we also disregard the 10,643 probesets marked with the suffix ".A1" because they are predominantly of the wrong orientation. Furthermore, the wheat GeneChip includes a considerable number of probesets not clearly anchored to the 3' end (32,578 out of 61,115; Close and Davies, personal communication). These form part of the so-called "prune" set in Affymetrix's design pipeline and are usually used for checking probes for potential cross hybridization Throughout this paper we refer to these as "5' sequences". This type of sequence was not included on the Barley1 GeneChip because, while they may be useful for gene discovery, their hybridization efficiency is unreliable. Unless explicitly indicated otherwise, we do not consider them in our comparative analysis. Finally, as discussed below, an additional quality control on the GeneChip sequences was obtained by demanding that the relative orientation of the consensus sequences on the barley and wheat GeneChips should be the same. Our comparative analysis leaves out sequences with opposite or inconsistent orientation on the two chips. After all these cuts, 13,822 wheat probesets and 12,549 barley probesets remained and it was this set that we used. We stress, however, that expression results from all probesets have been included in the data contained in the WebComparator http://contigcomp.acpfg.com.au. Homolog identification on the Wheat and Barley GeneChips We identified putative wheat and barley homologs using the following approach 1) After constructing non-redundant sets of consensus and exemplar sequences for the wheat and barley GeneChips, respectively, we performed all possible wheat-barley, barley-wheat, wheat-wheat and barley-barley sequence comparisons using NCBI's gapped Blastn [67] algorithm. The intra-species comparisons were performed in order to avoid, as much as possible, issues associated with the incomplete representation of the wheat and barley genomes on the two GeneChips. 2) A directed graph was constructed from the results of these sequence comparisons, with the nodes consisting of the non-redundant sequences. A directed edge starting at node i (being a sequence from genome I) and ending at node j (a sequence from genome J) was defined to exist if a) node j was the best Blast hit to node i when sequence i was compared to genome J and this Blast hit had an E-value better than the cut-off C = 10-50, or b) if the Blast hit had an E-value within a tolerance T = 10-5 of that of the best Blast hit (if the best Blast hit had an E-value of 0 then this limit was taken to be within 10-5 of machine precision instead). Note that keeping Blast hits which are close to the best Blast hit is useful if several homoeologs with near-identical sequence are present and/or if probesets have been tiled to both the 3' and 5' end of the same sequence, as was done for the wheat GeneChip. 3) Finally, the resulting graph was decomposed into connected sub-graphs (termed "homology graphs"), with those sequences contained within a sub-graph defining a putative "homologous cluster". The results are quite insensitive to the choices for C and T; the precise value of C tends to be immaterial because either I = J (i.e. an intra-species comparison), in which case the best Blast hit naturally almost always links the sequence back to itself with an E-value of 0, or – if I ≠ J (an inter-species comparison) – the general similarity between wheat and barley sequences ensures that if a homolog is present at all it tends to have a similarity very much better than E ~ 10-50. The precise value of T, on the other hand, is not critical for a similar reason; most Blast hits are found to be either very close to the best Blast hit (usually with an E-value within a factor of 100 or so of the best E-value) or considerably further removed. In other words, while we have not attempted to distinguish homoeologs, paralogs and orthologs (only a phylogenetic treatment can do this), by using the above approach the detection of homologs in general is relatively unambiguous. Typically, the homology graphs are rather small: only 105 out of a total of 10,708 non-trivial homology graphs contain more than 10 vertices. A much larger number of these graphs, just over 40,000, are found to be 'trivial' in the sense that they contain only 1 node, i.e. for these sequences, the Blast searches did not result in a significant hit to any other sequences. This should not be interpreted to mean that there are large numbers of genes in wheat and barley having no counterpart in the other species. Rather, inspection of the trivial graphs shows that about 57% of them correspond to 5' wheat sequences (presumably not having a significant overlap with the typically longer barley sequences) and slightly less than 10% correspond to short barley ESTs rather than longer contigs. It is to be expected that most of the remaining 13,000 sequences or so are unmatched because the two GeneChips do not represent the entire complement of genes from the two species. In principle, the number of trivial graphs could be reduced by increasing C considerably; however, we did not do so in order not to increase the number of false positive associations. Quantitative RT-PCR (QPCR) verifications cDNA was synthesized from the same RNA that was hybridized to the wheat and barley GeneChips for 11 of the 13 tissues (excluding anthers and pistils). While three independent RNA samples were used in the microarray experiment, the QPCR cDNA was prepared for only one of the three RNA samples. Results from this sample were compared to the microarray results from the same sample. Templates of 5 μg total RNA for barley and 0.5 μg total RNA for wheat were used for the cDNA synthesis reaction with Superscript III RNAse H-Reverse Transcriptase (Invitrogen, Australia) according to the manufacturer's protocol. Four control genes were assessed (actin, GAPdH, EFA and cyclophilin). The primers for the barley control genes are described in Ref. [68], while the wheat primers are listed in Table 3. The selection of barley and wheat probesets used for the comparison of microarray and QPCR results was drawn from the boxed region indicated in Figure Figure4A4A
Authors' contributions AWS conceived and performed the analysis and drafted the manuscript. TS and GM grew the plants and extracted the mRNA. RAC and RPW organized and carried out the microarray hybridizations. EK created the WebComparator software application. BL performed the QPCR verifications. GJM, AD, RW, RPW and PL conceived, organized and obtained the funding for the collaborations behind wheat and barley tissues series. UB participated in the analysis and QPCR verification and together with RW participated in drafting the manuscript. All authors reviewed and edited the manuscript. Acknowledgements We would like to thank Prof. Y. Ogihara for providing us with the homoeologous sequences of (Mochida et al., 2003). Funding for this research was provided by USDA Initiative for Future Agriculture and Food Systems (IFAFS) grant no. 2001-52100-11346 (RPW), USDA-ARS CRIS Project 3625-21000-049-00D (RPW), the Grains Research and Development Corporation of Australia as well the Australian Research Council. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||
Trends Genet. 2008 Jan; 24(1):24-32.
[Trends Genet. 2008]Plant Mol Biol. 1997 Sep; 35(1-2):3-15.
[Plant Mol Biol. 1997]Genetics. 2002 Nov; 162(3):1389-400.
[Genetics. 2002]Curr Opin Plant Biol. 2005 Apr; 8(2):135-41.
[Curr Opin Plant Biol. 2005]Nat Genet. 2003 Jan; 33(1):102-6.
[Nat Genet. 2003]Proc Natl Acad Sci U S A. 2004 Jun 29; 101(26):9903-8.
[Proc Natl Acad Sci U S A. 2004]Proc Natl Acad Sci U S A. 2003 Apr 15; 100(8):4649-54.
[Proc Natl Acad Sci U S A. 2003]Genetics. 2002 Apr; 160(4):1651-9.
[Genetics. 2002]Nat Genet. 2003 Jan; 33(1):102-6.
[Nat Genet. 2003]Plant Cell. 2004 Jul; 16(7):1679-91.
[Plant Cell. 2004]Genome Res. 2005 Sep; 15(9):1274-83.
[Genome Res. 2005]BMC Biol. 2008 Apr 16; 6():16.
[BMC Biol. 2008]New Phytol. 2008; 180(2):408-20.
[New Phytol. 2008]Plant Physiol. 2008 Aug; 147(4):1763-73.
[Plant Physiol. 2008]BMC Genomics. 2008 Apr 28; 9():194.
[BMC Genomics. 2008]Plant J. 2003 Apr; 34(1):125-34.
[Plant J. 2003]Plant J. 2004 Jan; 37(2):251-68.
[Plant J. 2004]Theor Appl Genet. 2006 May; 112(7):1286-94.
[Theor Appl Genet. 2006]Plant Physiol. 2004 Jul; 135(3):1697-709.
[Plant Physiol. 2004]Plant Physiol. 2004 Mar; 134(3):960-8.
[Plant Physiol. 2004]Funct Integr Genomics. 2006 Jul; 6(3):202-11.
[Funct Integr Genomics. 2006]Genome Biol. 2005; 6(6):R54.
[Genome Biol. 2005]Plant J. 2008 Jan; 53(1):90-101.
[Plant J. 2008]Plant J. 2008 Oct; 56(2):287-96.
[Plant J. 2008]Funct Integr Genomics. 2007 Jul; 7(3):207-19.
[Funct Integr Genomics. 2007]Funct Integr Genomics. 2006 Jul; 6(3):202-11.
[Funct Integr Genomics. 2006]Plant J. 2008 Jan; 53(1):90-101.
[Plant J. 2008]Genetics. 2007 Jun; 176(2):789-800.
[Genetics. 2007]Funct Integr Genomics. 2006 Jul; 6(3):202-11.
[Funct Integr Genomics. 2006]Funct Integr Genomics. 2006 Jul; 6(3):202-11.
[Funct Integr Genomics. 2006]Mol Genet Genomics. 2003 Dec; 270(5):371-7.
[Mol Genet Genomics. 2003]Mol Genet Genomics. 2003 Dec; 270(5):371-7.
[Mol Genet Genomics. 2003]Nucleic Acids Res. 2000 Jan 1; 28(1):33-6.
[Nucleic Acids Res. 2000]Nucleic Acids Res. 2005 Jan 1; 33(Database issue):D476-80.
[Nucleic Acids Res. 2005]Curr Biol. 1996 Mar 1; 6(3):279-91.
[Curr Biol. 1996]Genome Res. 2002 Jun; 12(6):962-8.
[Genome Res. 2002]Bioinformatics. 2003 Sep 1; 19(13):1710-1.
[Bioinformatics. 2003]Funct Integr Genomics. 2006 Jul; 6(3):202-11.
[Funct Integr Genomics. 2006]Funct Integr Genomics. 2006 Jul; 6(3):202-11.
[Funct Integr Genomics. 2006]Funct Integr Genomics. 2006 Jul; 6(3):202-11.
[Funct Integr Genomics. 2006]BMC Bioinformatics. 2006 May 28; 7():270.
[BMC Bioinformatics. 2006]Science. 2000 Nov 10; 290(5494):1151-5.
[Science. 2000]Annu Rev Genet. 2004; 38():615-43.
[Annu Rev Genet. 2004]Nature. 2003 Jul 10; 424(6945):194-7.
[Nature. 2003]Proc Natl Acad Sci U S A. 2006 Feb 21; 103(8):2730-5.
[Proc Natl Acad Sci U S A. 2006]Curr Opin Microbiol. 1999 Oct; 2(5):548-54.
[Curr Opin Microbiol. 1999]Nature. 2006 Nov 9; 444(7116):171-8.
[Nature. 2006]Science. 2000 Nov 10; 290(5494):1151-5.
[Science. 2000]Trends Genet. 2005 Nov; 21(11):602-7.
[Trends Genet. 2005]Trends Genet. 2002 Dec; 18(12):609-13.
[Trends Genet. 2002]Genome Res. 2004 Aug; 14(8):1530-6.
[Genome Res. 2004]Mol Biol Evol. 2004 Jul; 21(7):1308-17.
[Mol Biol Evol. 2004]Genome Res. 2003 Jul; 13(7):1638-45.
[Genome Res. 2003]Plant Physiol. 2004 Mar; 134(3):960-8.
[Plant Physiol. 2004]Genetics. 2001 Jun; 158(2):927-31.
[Genetics. 2001]Genetics. 2007 Jan; 175(1):199-206.
[Genetics. 2007]Genome Biol. 2005; 6(6):R54.
[Genome Biol. 2005]Plant Physiol. 2004 Mar; 134(3):960-8.
[Plant Physiol. 2004]Genome Biol. 2005; 6(6):R54.
[Genome Biol. 2005]Funct Integr Genomics. 2006 Jul; 6(3):202-11.
[Funct Integr Genomics. 2006]Funct Integr Genomics. 2006 Jul; 6(3):202-11.
[Funct Integr Genomics. 2006]Biostatistics. 2003 Apr; 4(2):249-64.
[Biostatistics. 2003]Bioinformatics. 2003 Jan 22; 19(2):185-93.
[Bioinformatics. 2003]Plant Physiol. 2004 Mar; 134(3):960-8.
[Plant Physiol. 2004]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]Plant Mol Biol. 2006 Jan; 60(2):259-75.
[Plant Mol Biol. 2006]Genome Biol. 2002 Jun 18; 3(7):RESEARCH0034.
[Genome Biol. 2002]BMC Genomics. 2006 Oct 19; 7():267.
[BMC Genomics. 2006]