![]() | ![]() |
Formats:
|
||||||||||||||||||||||
Copyright © 2009, EMBO and Nature Publishing Group Evidence for a major role of antisense RNAs in cyanobacterial gene regulation 1Faculty of Biology and Freiburg Initiative in Systems Biology, University of Freiburg, Freiburg, Germany 2Justus-Liebig University Giessen, Institute of Microbiology and Molecular Biology, Heinrich-Buff-Ring, Giessen, Germany aFaculty of Biology and Freiburg Initiative in Systems Biology, University of Freiburg, Schänzlestr. 1, Freiburg 79104, Germany. Tel.: +49 761 2032796; Fax: +49 761 2036996; Email: wolfgang.hess/at/biologie.uni-freiburg.de Received December 22, 2008; Accepted August 3, 2009. This is an open-access article distributed under the terms of the Creative Commons Attribution Licence, which permits distribution and reproduction in any medium, provided the original author and source are credited. This licence does not permit commercial exploitation or the creation of derivative works without specific permission. Abstract Information on the numbers and functions of naturally occurring antisense RNAs (asRNAs) in eubacteria has thus far remained incomplete. Here, we screened the model cyanobacterium Synechocystis sp. PCC 6803 for asRNAs using four different methods. In the final data set, the number of known noncoding RNAs rose from 6 earlier identified to 60 and of asRNAs from 1 to 73 (28 were verified using at least three methods). Among these, there are many asRNAs to housekeeping, regulatory or metabolic genes, as well as to genes encoding electron transport proteins. Transferring cultures to high light, carbon-limited conditions or darkness influenced the expression levels of several asRNAs, suggesting their functional relevance. Examples include the asRNA to rpl1, which accumulates in a light-dependent manner and may be required for processing the L11 r-operon and the SyR7 noncoding RNA, which is antisense to the murF 5′ UTR, possibly modulating murein biosynthesis. Extrapolated to the whole genome, ~10% of all genes in Synechocystis are influenced by asRNAs. Thus, chromosomally encoded asRNAs may have an important function in eubacterial regulatory networks. Keywords: antisense RNA, cyanobacteria, microarray, noncoding RNA, Synechocystis Introduction Bacteria, as well as eukaryotes, possess a significant number of regulatory RNAs. Eubacterial regulatory RNAs mainly control mRNA translation or decay, but some also bind proteins and thereby modify protein function (for reviews see Gottesman, 2004; Urban and Vogel, 2007). The majority of eubacterial regulatory RNAs are encoded at genomic locations far away from their target genes and exhibit only partial base complementarity to their mRNA targets. However, a small number of regulatory RNAs are transcribed from the reverse complementary strand of an annotated gene and hence these fully or partially overlap with their potential targets (cis-encoded regulatory RNAs). It was known early on that such natural antisense RNAs (asRNAs) control phage development and plasmid replication in bacteria (Wagner and Simons, 1994), yet recent work has made much more progress on trans-encoded regulatory RNAs. In several eukaryotic model organisms, it was found that the main transcriptional output from their genomes is noncoding RNA (ncRNA). Sense/antisense transcript pairs occur frequently in mammalian genomes (Katayama et al, 2005) and asRNAs were found opposite 1555 genes during high-resolution transcript screening of the yeast genome (David et al, 2006). It is now estimated that asRNAs or overlapping transcripts from adjacent transcriptional units exist for ~22–26% of annotated genes in the human genome (Yelin et al, 2003; Chen et al, 2004; Zhang et al, 2006), for 14.9–29% of mouse genes (Okazaki et al, 2002; Kiyosawa et al, 2003; Katayama et al, 2005; Zhang et al, 2006), 15.4–16.8% of Drosophila genes (Zhang et al, 2006), and 8.9% of Arabidopsis thaliana genes (Jen et al, 2005; Wang et al, 2005). Despite the earlier reported examples of antisense transcripts in prokaryotes, experimental evidence for a more general role of chromosomally encoded asRNAs in eubacteria has remained scarce. Using a tiled microarray and a protocol optimized for detection of sRNAs, two asRNAs to transposase genes, and three ncRNAs overlapping a substantial part of an mRNA or of another ncRNA were reported in Caulobacter (Landt et al, 2008). On the other hand, Selinger et al (2000) found a very high number of potential asRNAs in Escherichia coli by using Affymetrix microarrays with an inverted probe set capable of detecting antisense transcription. Although not corroborated by independent experiments, this array detected antisense transcription for ~3000–4000 genes, suggesting that there is a low level of transcription virtually throughout the E. coli genome (Selinger et al, 2000). More recently, evidence for 127 putative asRNAs in Vibrio cholera was obtained through parallel sequencing (Liu et al, 2009) but these asRNAs were not further studied. There is only one publication describing the biocomputational prediction of asRNAs in bacteria (Yachie et al, 2006). On the basis of a combination of promoter and rho-independent terminator prediction, 87 ncRNA and 46 asRNA candidates were predicted for E. coli. Of these, eight ncRNAs and four asRNAs could be verified experimentally. In cyanobacteria, evidence from earlier work indicated a function of chromosomal cis-encoded asRNAs in the regulation of gene expression. The asRNA IsrR in Synechocystis sp. PCC 6803 (from here: Synechocystis) regulates the accumulation of the isiA mRNA, and thereby controls the amount of IsiA protein and finally, protein–chromophore light harvesting complexes in cyanobacterial cells under iron limitation and redox stress (Duehring et al, 2006). A transcript complementary to the transcription factor furA mRNA was found in the filamentous cyanobacterium Anabaena PCC 7120. The furA asRNA originates by read-through from the adjacent gene alr1690 encoding a putative cell wall protein (Hernandez et al, 2005) and covers furA over its full length. Interrupting read-through from alr1690 resulted in an increased expression of FurA, thus the asRNA contributed in determining cellular levels of the protein. Other, less characterized, examples of asRNAs in cyanobacteria include a cis-encoded asRNA starting from the 3′ end of the gas vesicle gene gvpB and ending within the gvpA gene of the filamentous Calothrix PCC 7601 (Csiszar et al, 1987), and 24 asRNAs found by microarray hybridization in the marine unicellular Prochlorococcus MED4 (Steglich et al, 2008). In addition, there is a growing number of publications that hint at the impact of regulatory RNA in cyanobacteria without providing molecular details (Nakamura et al, 2007; Dienst et al, 2008; Voss et al, 2009). Here, a computational search was implemented for the 3.6 Mb genome of Synechocystis to find such RNAs. To test the existence of predicted candidates efficiently, a tiling microarray was designed, in which all genome regions containing predicted regulatory RNAs were covered, together with a control set of the same size. Focusing on high scoring as well as on randomly selected candidates for asRNAs, 28 asRNAS were verified independently by 5′ RACE (rapid amplification of cDNA ends) and Northern blot analysis (Table I). Among the targets possibly influenced by these asRNAs are mRNAs for ribosomal proteins, mRNAs for enzymes of primary metabolism as well as for proteins that are involved in signal transduction and electron transfer.
Results Large-scale analysis using a tiling microarray A tiling microarray was developed, covering all genes and intergenic regions for which a terminator, and thus a candidate asRNA or ncRNA, was predicted. As a control set, probes were designed for genes and intergenic regions without a prediction, covering approximately the same total size. The resulting 102 739 probes amount to an accumulated length of 1 441 146 nt in tiled probes in both orientations, which represent ~40% of the chromosome. The arrays were hybridized in quadruplicates with pooled RNA from nine different conditions, such as exponential and stationary growth phase and different stress conditions (high light (HL), low light, 12 h incubation in the dark, iron and nitrogen depletion, heat and cold stress), to detect those transcripts, which are only induced under specific conditions. To avoid labeling artifacts from reverse transcription and second strand synthesis during cDNA synthesis (Perocchi et al, 2007), we labeled the RNA directly for microarray hybridization. Two additional microarrays were hybridized with genomic DNA and used for the normalization of signal intensities from individual probes as described by Huber et al (2006). The mapping of transcribed segments was carried out according to Huber et al (2006) yielding ~2500 transcript segments with arbitrary expression values from −5 to +10 (see Supplementary information ‘Segmentation2500_final.pdf'). As evidence for low-level transcription of virtually every part of a bacterial genome has been provided (Selinger et al, 2000), we established a robust threshold at +1.0, leaving 646 transcript segments for closer inspection. As a positive control, IsrR (Duehring et al, 2006) was detected as one contiguous segment of the array (Figure 1
From the 646 transcript segments above the expression threshold of +1.0, 432 corresponded to mono-, di-, and multicistronic mRNAs, whereas 60 originated from intergenic regions and were considered ncRNAs and 73 at least partially overlap sense transcripts and therefore were designated asRNAs (see Supplementary Table S1 for details). We also detected transcripts, which likely represent short mRNAs (labeled ‘new ORF' in Supplementary Table S1) and are not included in the numbers of the candidate asRNAs and ncRNAs, nor the segments representing putative 5′ and 3′ UTRs (Figure 2
Synechocystis transcripts expression levels The 15 most highly accumulating mRNAs (see Supplementary Table S1) in our tiling microarray originate from an intron-located endonuclease gene (slr0915), the photosynthetic genes psaAB (slr1834/slr1835), psbD2 (slr0927), psbD (sll0849), psbT (smr0001), and rbcL (slr0009), the cell division cycle gene slr0374, the groESL operon (slr2075_slr2076), the genes slr0742, sll0524, sll0623, and slr1667, the RNA-binding protein A gene rbpA (sll0517), the molybdopterin biosynthesis gene moeA (slr0900), as well as the iron-stress-induced protein A gene isiA (sll0247). We found 14 ncRNAs and 4 asRNAs within the same range of expression levels. These asRNAs are opposite to isiA, slr0320, sll1121, and sll1049 (Supplementary Table S1). Finding stress-induced genes such as isiA among the top-expressed genes is not an artifact, but results from the fact that we hybridized pooled RNA samples from cultures grown under nine different conditions. Assessing the reliability of the prediction strategy The transcription of many bacterial genes, and thus also of ncRNAs and asRNAs, finishes at a rho-independent terminator, which can be computationally predicted (see Materials and methods). Our terminator prediction identified 713 putative transcripts within all non-annotated sequences (intergenic and antisense). Assuming an average transcript length of 300 nt, ~20% were completely intergenic (ncRNA candidates), whereas ~80% were antisense to an annotated gene. The iron stress regulated asRNA IsrR (Duehring et al, 2006), as well as the small ncRNAs Yfr1 (Axmann et al, 2005; Voss et al, 2007), SyR1, and SyR2 (Voss et al, 2009), were among the predicted transcripts, indicating the reliability of this procedure. To evaluate the performance of the prediction strategy further, we compared its outcome against the results from the tiling microarrays. As the segmentation procedure could be erroneous in itself, we took the following approach: for each predicted terminator, we computed the mean normalized expression of probes within four 100 nt long segments, starting from the 5′ end of the terminator. For expression cut-offs ranging from 0 to 9, the number of terminators passing it was computed. Two background sets (one antisense-only, and one freely distributed) of randomly chosen segments of size 100 nt were handled the same way. Altogether, the analyses showed that there is a clear tendency of regions close to predicted terminators to have a higher mean expression. This is even more pronounced in the antisense-only analyses (Supplementary Figure S1). In absolute numbers, 11 out of 73 asRNAs and 27 out of 60 intergenic ncRNAs with a microarray expression level of at least +1, have been predicted here, based on the presence of a rho-independent terminator (Table II; Supplementary Table S1), including five ncRNAs reported earlier in a comparative genomics study (Voss et al, 2009). Examples for false-negatives include SyR9, the 5′ UTR of the isiA gene that accumulates in large quantities as an ~160 nt small RNA (Duehring et al, 2006) and ffs, the ncRNA of the signal recognition particle (Table II). If all 60 segments identified in the array were real ncRNAs, the true-positive rate of the terminator-based prediction for this class of RNA molecules would be ~45%. The higher true-positive rate for ncRNAs is reflected in their better terminator scores. In Figure 3
Finding new intergenic ncRNAs In total, our data revealed 60 segments that represent possible ncRNA genes within the total set of high scoring transcripts (Supplementary Table S1). Among these are the known ncRNAs Yfr1 (Voss et al, 2007), Yfr2b, SyR1, and SyR2 (Voss et al, 2009). Additionally, seven ncRNA candidates were verified in Northern blot experiments (Figure 4
New asRNAs An overview of 73 different candidate asRNAs detected in our array is provided in Supplementary Table S1. With an average expression value of 7.8, an asRNA to slr0320 was the most highly accumulated asRNA, followed by the asRNA IsrR (7.0), which served as the internal control (Duehring et al, 2006). The five next most highly expressed asRNAs have expression levels (4.2 to 6.3) similar to those of highly expressed protein-coding genes such as amt1 (sll0108, 5.5) or rbcL (slr0009, 6.0). We chose 28 asRNAs for independent verification by Northern blot analysis and 5′ RACE. The Northern data can be broadly divided into clear signals and more complex patterns observed for a subset of asRNAs, which may result from either co-degradation or co-processing with their corresponding mRNAs. Prominent examples are asRNAs to the flavoprotein gene sll0217, rlpA, slr0580, and ndhF1 (Figure 5
Verification and characterization of newly found asRNAs by transcriptome microarrays A novel transcriptome microarray was designed as an efficient tool for the verification and examination of possible regulation of the newly found asRNAs and ncRNAs. This array includes probe sets for all protein-coding genes as well as for all other transcripts, which we identified in the course of this study. Cultures were treated with three different stress conditions, which are highly relevant for a photosynthetic organism, namely HL, darkness and CO2 depletion. The fold changes (FCs) in expression levels were measured for all ncRNAs, asRNAs and their cognate mRNAs in triplicates and can be found in Tables I and II. For six selected asRNA/mRNA pairs and for the SyR7 ncRNA, we confirmed the changes in expression levels by Northern blot hybridization (Figures 6 Characteristic changes were also obtained for the other asRNA/mRNA pairs studied in more detail. A situation inverse to SyR7 was observed with as_slr0882, which increases dramatically under HL and almost disappears in darkness (Figure 6B as_rpl1: a possible role in discoordinating gene expression The ratio between as_rpl1 and the rpl1 mRNA is close to 1 under all tested conditions, except under HL, where it declines to 0.2 (Figure 8
Discussion Identification of eubacterial asRNAs Despite early reports on asRNAs in bacteria and phages (Wagner and Simons, 1994) a systematic screening for asRNAs in bacteria is missing. Here, we present a partial transcriptome analysis in the cyanobacterial model organism Synechocystis, combined with extensive verification, and provide first functional insight into the role of asRNAs. There are three main technical problems in dealing with antisense transcription in bacteria: (i) the general lack of robust algorithms to predict them; (ii) the high risk of measuring experimental artifacts generated during cDNA synthesis in microarray analyses (Perocchi et al, 2007); and (iii) a low level of transcription reported to occur virtually throughout the entire genome (Selinger et al, 2000), making it difficult to differentiate asRNA with a regulatory function from transcriptional noise. Here, we have tried to overcome all three obstacles by (i) rigorously interrogating all predictions made in a computational approach using tiled microarrays. To overcome the problem of unintended second strand synthesis (ii) we labeled RNA samples directly before their hybridization on the microarray, and finally (iii) we focused predominantly on very highly expressed asRNAs. Computational screens have been used successfully for the prediction of ncRNAs in various eubacteria, but very rarely for finding asRNAs. Yachie et al (2006) presented a strategy that also predicts asRNAs, based on sequence patterns, nucleotide biases, and higher-order base relations, as they, for example, occur through basepairing in structured RNA molecules. This is reasonable for (intergenic) ncRNA prediction, yet it is less suitable for a prediction focusing on asRNAs, as these function mainly by complementarity rather than specific sequence and/or structure features. Here, we found a correlation between the prediction and the actual presence of a terminator. However, based on the array results, the number of false-negative predictions turned out to be high. The predicted terminators come with the following parameters: free energy of the stem-loop ΔGS, hybridization energy ΔGH and a poly-U scoring. Comparing any single parameter or combination of parameters with the actual presence of a transcript did not indicate any particular correlation. The poor performance of the prediction for antisense transcripts may be explained by the existence of alternative termination signals (involving proteins similar to Rho, or RNA–RNA interaction (Stork et al, 2007)), or a lack of specific termination because of functional peculiarities, such as transcriptional interference (Sneppen et al, 2005). Moreover, the accumulation of asRNAs with secondary 3′ ends resulting from co-degradation or co-processing of asRNAs with their cognate mRNAs could, in some cases, also provide an explanation. Further work is required to differentiate between these possibilities. Total number of asRNAs Here, we found 73 candidates for cis-asRNAs and 60 free-standing genes for putative ncRNAs which all had an average expression of more than +1.0. With regard to mRNAs, such an expression threshold of +1.0 corresponded to the top third of the most-strongly expressed genes. The false-positive rate appears low in this candidate set. False-positives would be expected predominantly among those 13 asRNA candidates (18% of all) represented only by one or two probes in the microarray; however, further testing did not support this view. Nevertheless, if we conservatively assume a false-positive rate of 5% and a true-positive rate of 95% for the array-selected candidate asRNAs, 69 of the 73 asRNA candidates can be expected to exist. On the other hand, focusing on one third of the most-strongly accumulating transcripts leaves two thirds of the segments to be investigated. In fact, there is strong evidence to suggest that also less highly expressed asRNAs exist in Synechocystis. We selected exemplarily three possible asRNAs for the genes uvrA, dnaX, and accA, which were predicted based on the possible presence of a terminator but not found during autosegmentation of the array data. Their expression levels were also below the threshold of +1.0. These candidate asRNAs were detectable in Northern hybridizations (Supplementary Figure S2) and verified in 5′ RACE experiments. These three weakly expressed asRNAs accumulate to levels that correspond to the amounts of their respective mRNAs. The stoichiometric ratio between an asRNA and its respective mRNA is probably more important than the absolute accumulation level of the asRNA. Therefore, it appears valid to assume that an equal number of 69 asRNAs exists among the medium-expressed third of all transcripts as in the top third, suggesting 138 asRNAs from 40% of the bacterial chromosome. Extrapolated to the whole genome the resulting number of more than 300 chromosomally encoded asRNAs does not appear unlikely for a bacterial cell. Recently, evidence for 127 asRNAs was found by parallel sequencing in Vibrio cholera (Liu et al, 2009). Chromosomally encoded cis-asRNAs in Synechocystis are much more frequent than originally thought and seem to outnumber intergenic ncRNAs. With this conservative approximation taken into account, asRNAs may affect 8–10% of all genes in Synechocystis, a number that lies within the range of asRNAs in eukaryotic genomes. Possible mechanisms of asRNA functions If nearly every tenth open reading frame has an asRNA encoded on the opposite DNA strand, very complex regulatory circuits would be possible (Levine et al, 2007; Shimoni et al, 2007). We detected a large variety among the asRNAs in our study. The asRNAs can be classified by their transcript level, the mRNA/asRNA ratio, and their position relative to their corresponding open reading frame. Functionally, it makes a difference if an asRNA overlaps a 5′ or 3′ end of its cognate mRNA or if it is fully internal. For this reason, we differentiated the asRNAs into these three classes according to mapping data from 5′ RACE, the lengths of hybridizing fragments in Northern blots, and by array hybridization. From the set of 28 asRNAs confirmed by multiple methods (Table I) 13 were internal, 8 were 5′ overlapping, and 7 were 3′ overlapping. Together with other factors, such as half-life, length or expression patterns (induced, transient, constitutive), a multitude of functions and mechanisms appear possible. Some of these are discussed below, but more experimental effort is necessary to investigate the individual functions of Synechocystis asRNAs. It is well established that asRNAs and their cis-targets can form RNA–RNA duplexes, which are degraded by dsRNA-specific RNases (Hernandez et al, 2005; Duehring et al, 2006; Darfeuille et al, 2007; Kawano et al, 2007; Fozo et al, 2008). Hence, antisense transcription is a powerful natural tool in repressing gene expression. There is a growing number of examples, which support the idea of bacterial asRNAs serving as novel types of transcriptional terminators such as the 427 nt asRNA RNAβ in Vibrio anguillarum (Stork et al, 2007) to achieve discoordinated expression of different operon segments. Obviously, the most likely candidates for such termination and processing events are asRNAs overlapping the 3′ ends of their target mRNAs. Another such candidate is as_rpl1 (Figure 8 Another possible level of regulation includes asRNAs, which directly modulate transcriptional activity. There is strong evidence to suggest that divergently located promoters can interfere with each other (Prescott and Proudfoot, 2002) and work with E. coli showed that the length of transcripts generated from the divergently located promoter (Sneppen et al, 2005) is one important factor for this interaction. We noticed that the average length of asRNAs tends to be longer than that of ncRNAs. According to literature, the latter are typically 50–250 nt in length (Vogel and Papenfort (2006) and see Supplementary Figure S4 in Shi et al (2009)). Here, we observed ~180 nt as the average ncRNA length (Figure 4 We found several asRNAs extending into the 5′ UTR region of their mRNA targets and some of them probably terminate beyond the TSS of the mRNA on the reverse complementary strand. It is well established that initiation of degradation through RNase E requires free 5′ ends (Mackie, 1998). Therefore, the selective stabilization of transcripts by masking of endonuclease (RNase E) recognition sites appears to be another important function of natural asRNAs. Moreover, such 5′ overlapping asRNAs are prime candidates for providing translational regulation by extending into the regions for interaction with the ribosome, regulating rather translation than RNA stability (Darfeuille et al, 2007; Kawano et al, 2007; Fozo et al, 2008). Biological relevance of asRNAs The substantial amounts of different asRNAs in Synechocystis raise the question of their biological benefit for the organism. One known role of bacterial asRNAs is to act as the antidote to mRNAs coding for toxic peptides (Kawano et al, 2007; Fozo et al, 2008) or transposons (Sittka et al, 2008). Systematic searches for toxin–antitoxin systems have revealed an abundance in free-living prokaryotes, including Synechocystis (Pandey and Gerdes, 2005). But what is the relevance of the majority of the asRNAs detected here? Their appearance is not restricted to a specific functional class of genes (such as regulation, primary metabolism, transcription, translation, DNA repair, etc.). Furthermore, their expression level, which is in part very high (IsrR, as_sll1049, as_slr0320) and otherwise covers the whole range of mRNA expression levels, indicates a vital function. A bacterial cell has several means of achieving gene regulation. There are regulatory proteins as well as RNA-based elements, for example, riboswitches or ncRNAs. Although one regulatory protein per gene is clearly impossible and not very sophisticated, the concept of asRNA theoretically allows the system to have an individual regulator for every single element at a very low cost. Moreover, mathematical modeling of sRNA-based gene regulation has revealed a particular niche for regulatory RNA in allowing cells to transition quickly yet reliably between distinct states, consistent with the widespread appearance of bacterial sRNAs in stress regulatory networks (Mehta et al, 2008). In addressing this possibility, we examined the expression of all asRNAs and ncRNAs found in this study in a genome-wide expression microarray under four different conditions and verified the results for seven of them in more detail. In several of the newly found asRNAs, we discovered the expression to be strongly affected by some of these conditions, resulting in distinct and characteristic changes in the ratios between asRNAs and their cognate mRNAs. These changes provide circumstantial evidence for a functional role of the newly found asRNAs in regulatory networks. Beyond Synechocystis In a systematic screening for cyanobacterial ncRNAs in four strains of marine Prochlorococcus/Synechococcus, seven different ncRNAs were identified based on comparative genome analysis (Axmann et al, 2005). More recently, we used high coverage whole genome microarrays to screen genome wide for the presence of ncRNAs in Prochlorococcus MED4 (Steglich et al, 2008). This complements the earlier analysis of Axmann et al (2005) in the identification of 14 novel ncRNAs and 24 possible asRNAs (Steglich et al, 2008), although these were not characterized in detail. Considering Prochlorococcus MED4 is the cyanobacterium with the most streamlined genome (Strehl et al, 1999; Rocap et al, 2003; Hess, 2004) and given the paucity of such analyses for this class of bacteria as a whole, the number of asRNAs detected here in a related unicellular cyanobacterium is astonishing. Synechocystis or even cyanobacteria as a whole may not be so exceptional in this respect. Recent publications have presented a growing number of asRNAs in a wide variety of bacteria such as Calothrix (Csiszar et al, 1987), Anabaena sp. PCC7120 (Hernandez et al, 2005), Vibrio anguillarum (Stork et al, 2007), Vibrio cholera (Liu et al, 2009), Caulobacter crescentus (Landt et al, 2008), Clostridium acetobutylicum (Andre et al, 2008), Streptomyces coelicolor (Swiercz et al, 2008), Bacillus subtilis (Eiamphungporn and Helmann, 2009), and Salmonella (Sittka et al, 2008). A closer look at E. coli supports this view: first, albeit not studied in detail, an E. coli tiling array detected antisense transcription (Selinger et al, 2000). Second, Vogel et al (2003a, 2003b) and Kawano et al (2005) detected asRNAs in RNomics experiments. Third, a bioinformatic approach predicted 46 asRNAs from which four were verified (Yachie et al, 2006). Finally, the five QUAD1 or Sib RNAs in E. coli lie antisense to short open reading frames coding for toxic oligopeptids (Fozo et al, 2008). Taking into account, that most of the approaches to systematically detect ncRNAs, discriminate against asRNAs, for example by size exclusion of the relatively big asRNAs (<65 nt (Kawano et al, 2005), <50 nt (Swiercz et al, 2008), 50–500 nt (Vogel et al, 2003b)), the focus on Hfq-bound RNAs (Sittka et al, 2008), or on intergenic regions (Landt et al, 2008), the actual number of asRNAs in E. coli and other bacteria is undoubtedly underestimated. Therefore, a potentially high number of bacterial asRNAs still remaining to be discovered could dramatically increase the regulatory capacity, flexibility and redundancy. It is very likely that chromosomally encoded asRNAs constitute an important component of another, not yet fully appreciated, level of gene regulation in bacteria. Materials and methods Bacterial strains and growth conditions Synechocystis sp. PCC 6803 used in this study (originally from S. Shestakov, Moscow State University, Russia) was propagated on BG11 (Rippka et al, 1979) 1% (w/v) agar (Bacto agar, Difco) plates. Liquid cultures of Synechocystis 6803 were grown at 30 °C in BG11 (20 mM TES pH 7.6) medium under continuous illumination with white light of 50 μmol of photons m–2 s–1 and a continuous stream of air. Different growth and stress conditions were applied to exponentially growing Synechocystis cultures (OD750 0.6–0.8) to allow virtually all kinds of RNAs to be expressed. For HL stress, light intensity was shifted from 50 to 500 μmol of photons m−2 s–1, samples were collected 30, 60, and 120 min after the shift. For low light conditions, light intensity was shifted from 100 to 10 μmol of photons m–2 s–1, samples were collected 30, 60, and 120 min after the shift. For iron and nitrogen stress, cells were collected by centrifugation and washed twice with iron-free (replacing ammonium iron (III) citrate with di-ammonium hydrogen citrate) or nitrogen-free (omitting sodium nitrate from the medium) BG11 medium. Resulting pellets were then resuspended in their respective medium. For iron stress, cells were harvested after 20 and 45 h, for nitrogen stress after 12.5 and 20 h. Heat and cold stress were applied by a temperature shift from 30 to 42 °C or 15 °C, respectively. For heat stress, sample collection occurred after 20 and 60 min, for cold stress after 30 and 120 min. Another culture was harvested after 12 h incubation in the dark. For stationary phase cells, a culture was harvested at OD750 of 3.5. Exponentially growing cells were harvested at OD750 0.56. The cultures for the expression microarray were grown at control conditions (OD 0.6 at 750 nm; 50 μmol photons m–2 s–1), or transferred to dark for 1 h, depleted for CO2 for 6 h by transferring to carbon-free BG11 (BG11 w/o NaCO2, pH 7.0) without aeration after washing once in carbon-free BG11, or transferred to HL (500 μmol of photons m–2 s–1) for 30 min. RNA extraction and analysis Synechocystis 6803 cells were collected by rapid filtration (Pall Supor 800 Filter, 0.8 μm). Filters with cells were dissolved in 1 ml TRIzol (Invitrogen) per 40 ml culture, immediately frozen in liquid nitrogen and incubated 15 min at 65 °C in a water bath. Further RNA isolation followed the manufacturer's protocol. Northern blot analysis and 5′ RACE High resolution Northern blots were prepared from the separation of 10 to 25 μg of total RNA on 10% urea-polyacrylamide gels as described by Steglich et al (2008). Blots for RNAs with higher molecular weight were prepared from the separation of 5 to 10 μg of total RNA on 1.5% denaturing agarose gels. Hybridization conditions were described by Steglich et al (2008). 5′ RACE was performed as described in Steglich et al (2008). The sequences of all oligonucleotides used in this study for the preparation of transcript probes and 5′ RACE are listed in Supplementary Table S2. Microarray hybridization For RNA hybridizations, the RNA mix was labeled directly, without cDNA synthesis in 5 μg aliquots with the Kreatech ‘ULS labeling kit for Agilent gene expression arrays' with Cy3 or Cy5 according to the manufacturer's protocol. Fragmentation and hybridization was performed following the manufacturer's instructions for Agilent one color microarrays with 3 to 5.5 μg of labeled RNA. DNA was fragmented by 3 h incubation at 95°C in H2O and Cy3 labeled with the Kreatech kit mentioned above. Hybridization was performed similar to RNA hybridization, without the fragmentation step in the Agilent protocol. For DNA hybridization, 0.5 to 3.8 μg of labeled DNA were used. For the expression microarray, we directly labeled 2 μg RNA using the Cy3 labeling kit mentioned above. Hybridization was done with 1.5 μg RNA per array according to the Agilent protocol for 4 × 44k single color microarrays. Each stress condition was hybridized in triplicates. The data for both types of microarrays have been deposited in the GEO database under the accession numbers GSE16162 and GSE14410. Transcript prediction In general, a transcribed region of a genome is characterized by a TSS and a region of termination. A TSS can be identified by its preceding promoter region and the nucleotide identity (preferably A or G; Vogel et al (2003a)). Preliminary studies showed that the current standard method for TSS prediction based on a position-specific scoring matrix as developed by Vogel et al (2003a) alone is statistically not significant for ab initio transcript prediction and also does not improve significance in combination with terminator prediction. For this reason, we only made use of terminator prediction, described in the following. For termination of transcription, two possibilities exist: rho-dependent and -independent termination. Only the latter can be identified on the sequence level, as it shows a characteristic GC-rich hairpin in front of a T/U-rich region, the so-called T-tail. The T-tail can be further divided into the proximal (first five bases) and the distal part (the four bases after the proximal part). With the help of RNAll (Wan and Xu, 2005) such intrinsic terminators were predicted and subjected to a postfiltering step with the following rules: (1) at least four G–C or G–U pairs; (2) at most 2 nt spacer between stem and T-tail; (3) at least three ‘T/U's in the proximal part; (4) no more than one ‘G' in the proximal part; (5) a ‘T' at position 2 or 3 in the proximal part; (6) at most three purines or three cytosines in the distal part; (7) at least 4 ‘T's in proximal and distal part together; (8) no multiloops and at most 1 bulged nucleotide; and (9) free energy of the stem-loop at most −8.0 kcal/mol. Rules 1–7 were taken from Lesnik et al (2001) and rules 8 and 9 were defined by ourselves. Calculation of the free energy was performed using RNAshapes (Steffen et al, 2006) as RNAll provides a heuristic structure prediction, leading to artifacts in the subsequent energy calculation by efn2 (Mathews et al, 2004). Design of microarrays The design of probes for the tiling microarray was based on the terminator predictions. To each prediction, the sequence of the corresponding gene or intergenic region was extracted in both orientations and redundancy was removed. Neighboring genes were concatenated with their intergenic spacer, to get antisense transcripts overlapping two genes. This resulted in 646 (480 antisense+166 intergenic) sequences with a total length of 691 759 nt. As controls, we aimed at a similar number of genes and intergenic regions yielding a similar amount of bases. We selected 474 genes with a total length of 698 590 nt and 158 intergenic regions comprising 50 797 nt. This sums up to a total of 632 control sequences holding 749 387 nt. Altogether, the sequences on the array covered 1 441 146 nt. The probe design included generating overlapping sequences of length 50 with an offset of 28 nt, trimming of sequences to get a Tm as close as possible to 72°C with a minimum length of 25 nt, checking redundancy of trimmed sequence within the genome and the plasmids pcB.2.4, pSYSG, pSYSX, pSYSM, and pSYSA and discarding sequences with multiple perfect or 1-mismatch hits or Tm out of 70–74°C. This procedure resulted in 102 739 probes fitting on a 2 × 104K-Agilent custom array together with control probes from mouse actin gene. The expression microarray holds probe sets for all annotated genes from the chromosome (NC_000911) as well as the seven plasmids (pSYSA: NC_005230, pSYSG: NC_005231, pSYSM: NC_005229, pSYSM: NC_005232, pCA2.4, pCB2.4, pCC5.2 available at http://genome.kazusa.or.jp/cyanobase/Synechocystis/) and, additionally, each genomic region corresponding to an expressed segment seen with the tiling microarray. On average, 3 to 5 probes per transcript were designed using the Agilent eArray system (https://earray.chem.agilent.com/earray/). The chosen design criteria were ‘best distribution method', Tm 80°C and a length between 45 and 60 nt, resulting in 20 293 probes. These probes were manufactured in a 44K Agilent custom microarray format with an internal duplication of all probes, hence providing an internal obligatory technical replicate. Descriptions of the array design and probe sequences for both microarrays have been deposited in the GEO database under the accession numbers GSE16162 and GSE14410. Data normalization, transcript mapping, and identification of antisense transcripts The procedure of transcript mapping on data from the tiling microarray was performed as described in Huber et al (2006). To be able to make use of the author-provided software (R-package tiling_array) we had to design virtual probes for the genomic regions not covered by the probes on the microarray and assigned to them the arbitrarily chosen normalized expression value of −20.0. This is possible without affecting the segmentation algorithm, as the latter is optimizing the sum of the summed up residuals, that is the squared difference of an individual probe to the mean of all probes in the segment, over all segments. Segments containing solely virtual probes have a mean of −20.0 and as each probe has the same expression value, the contribution of such virtual-only segments is 0.0 and thereby does not affect the overall optimization. To find the optimal segmentation, the algorithm needs to be given an expected number of segments. To calculate this number, we considered 646 regions based on predictions and 632 regions as controls, making a total of 1278 genomic regions. As a region always implies ‘empty' regions surrounding it we get 2 × 1278=2556 regions. Overall, this gives an estimate of ~2500 segments per strand. Data extraction from transcriptome microarray. Spot intensities were extracted with the ‘Agilent Feature Extraction Software 10.5.1.1' (Protocol: GE1_105_Dec08), for further processing we used the R-package ‘limma'. The median spot intensities were quantile normalized and the contrasts between control and stress conditions were extracted using the linear model provided by limma. The P-values were calculated with Benjamini–Hochberg adjustment. Only probes with an adjusted P-value <0.05 were used for further calculations. All probes of one feature were unified in a probe set for calculation of FC and mean expression. To test the experimental variability, we determined the average in-group FCs between the normalized triplicates, the borders for a significance level of 0.05 are −0.34 and 0.31 for the control (all log2 values), −0.72 and 0.54 for the sample from dark, −0.92 and 0.85 for HL and −0.64 and 0.46 for CO2 depletion. Thus, FC′s greater than ±0.9 (log2) were listed as differentially expressed. The mean expression is the mean of all quantile normalized median probe intensities of one probe set. For the calculation of asRNA/mRNA ratios, the mean expression of the asRNA was divided by the mean expression of the corresponding mRNA. ORF analysis Candidate asRNAs and ncRNAs were scanned for conserved ORFs. Initially, ORFs with possible start codons (ATG, GTG, TTG, and ATT) and a minimum length of 45 nt were predicted. Conservation was checked using TBLASTN against the NCBI nr database. Supplementary Material overview and Figures S1 and S2 Click here to view.(565K, pdf) Supplementary Table S1 Synarray Click here to view.(106K, xls) Supplementary Table S2 Oligonucleotides Click here to view.(35K, xls) Supplementary Material Segmentation file Click here to view.(6.0M, pdf) Machine-readable version (R-object) of the segmentation file Click here to view.(3.7M, zip) Acknowledgments This work was supported by the Deutsche Forschungsgemeinschaft Focus program ‘Sensory and regulatory RNAs in Prokaryotes' SPP1258 (project HE 2544/4-1 to WRH and WI 2014/3-1 to AW), the graduate school ‘Signal systems in plant model organisms' (to JG) and by the BMBF—Freiburg Initiative in Systems Biology, project 0313921 (WRH). Footnotes The authors declare that they have no conflict of interest. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||
Annu Rev Microbiol. 2004; 58():303-28.
[Annu Rev Microbiol. 2004]Nucleic Acids Res. 2007; 35(3):1018-37.
[Nucleic Acids Res. 2007]Annu Rev Microbiol. 1994; 48():713-42.
[Annu Rev Microbiol. 1994]Science. 2005 Sep 2; 309(5740):1564-6.
[Science. 2005]Proc Natl Acad Sci U S A. 2006 Apr 4; 103(14):5320-5.
[Proc Natl Acad Sci U S A. 2006]Mol Microbiol. 2008 May; 68(3):600-14.
[Mol Microbiol. 2008]Nat Biotechnol. 2000 Dec; 18(12):1262-8.
[Nat Biotechnol. 2000]Nucleic Acids Res. 2009 Apr; 37(6):e46.
[Nucleic Acids Res. 2009]Gene. 2006 May 10; 372():171-81.
[Gene. 2006]Proc Natl Acad Sci U S A. 2006 May 2; 103(18):7054-8.
[Proc Natl Acad Sci U S A. 2006]J Mol Biol. 2006 Jan 20; 355(3):325-34.
[J Mol Biol. 2006]Gene. 1987; 60(1):29-37.
[Gene. 1987]PLoS Genet. 2008 Aug 29; 4(8):e1000173.
[PLoS Genet. 2008]Plant Cell Physiol. 2007 Sep; 48(9):1309-18.
[Plant Cell Physiol. 2007]Nucleic Acids Res. 2007; 35(19):e128.
[Nucleic Acids Res. 2007]Bioinformatics. 2006 Aug 15; 22(16):1963-70.
[Bioinformatics. 2006]Nat Biotechnol. 2000 Dec; 18(12):1262-8.
[Nat Biotechnol. 2000]Proc Natl Acad Sci U S A. 2006 May 2; 103(18):7054-8.
[Proc Natl Acad Sci U S A. 2006]Proc Natl Acad Sci U S A. 2006 May 2; 103(18):7054-8.
[Proc Natl Acad Sci U S A. 2006]Genome Biol. 2005; 6(9):R73.
[Genome Biol. 2005]BMC Genomics. 2007 Oct 17; 8():375.
[BMC Genomics. 2007]BMC Genomics. 2009 Mar 23; 10():123.
[BMC Genomics. 2009]BMC Genomics. 2009 Mar 23; 10():123.
[BMC Genomics. 2009]Proc Natl Acad Sci U S A. 2006 May 2; 103(18):7054-8.
[Proc Natl Acad Sci U S A. 2006]BMC Genomics. 2007 Oct 17; 8():375.
[BMC Genomics. 2007]BMC Genomics. 2009 Mar 23; 10():123.
[BMC Genomics. 2009]Proc Natl Acad Sci U S A. 2006 May 2; 103(18):7054-8.
[Proc Natl Acad Sci U S A. 2006]J Mol Biol. 2006 Jan 20; 355(3):325-34.
[J Mol Biol. 2006]Microbiology. 1995 Jan; 141 ( Pt 1)():163-9.
[Microbiology. 1995]Cell. 2006 Nov 17; 127(4):721-33.
[Cell. 2006]Plant Cell. 2001 Apr; 13(4):793-806.
[Plant Cell. 2001]Proc Natl Acad Sci U S A. 1980 Apr; 77(4):1837-41.
[Proc Natl Acad Sci U S A. 1980]Nucleic Acids Res. 1981 Jan 24; 9(2):293-307.
[Nucleic Acids Res. 1981]Annu Rev Genet. 1986; 20():297-326.
[Annu Rev Genet. 1986]Annu Rev Microbiol. 1994; 48():713-42.
[Annu Rev Microbiol. 1994]Nucleic Acids Res. 2007; 35(19):e128.
[Nucleic Acids Res. 2007]Nat Biotechnol. 2000 Dec; 18(12):1262-8.
[Nat Biotechnol. 2000]Gene. 2006 May 10; 372():171-81.
[Gene. 2006]J Bacteriol. 2007 May; 189(9):3479-88.
[J Bacteriol. 2007]J Mol Biol. 2005 Feb 18; 346(2):399-409.
[J Mol Biol. 2005]Nucleic Acids Res. 2009 Apr; 37(6):e46.
[Nucleic Acids Res. 2009]PLoS Biol. 2007 Sep; 5(9):e229.
[PLoS Biol. 2007]J Mol Biol. 2006 Jan 20; 355(3):325-34.
[J Mol Biol. 2006]Proc Natl Acad Sci U S A. 2006 May 2; 103(18):7054-8.
[Proc Natl Acad Sci U S A. 2006]Mol Cell. 2007 May 11; 26(3):381-92.
[Mol Cell. 2007]Mol Microbiol. 2007 May; 64(3):738-54.
[Mol Microbiol. 2007]Mol Microbiol. 2008 Dec; 70(5):1076-93.
[Mol Microbiol. 2008]Proc Natl Acad Sci U S A. 2002 Jun 25; 99(13):8796-801.
[Proc Natl Acad Sci U S A. 2002]J Mol Biol. 2005 Feb 18; 346(2):399-409.
[J Mol Biol. 2005]Curr Opin Microbiol. 2006 Dec; 9(6):605-11.
[Curr Opin Microbiol. 2006]Nature. 2009 May 14; 459(7244):266-9.
[Nature. 2009]Nucleic Acids Res. 2008 Oct; 36(18):5955-69.
[Nucleic Acids Res. 2008]Nature. 1998 Oct 15; 395(6703):720-3.
[Nature. 1998]Mol Cell. 2007 May 11; 26(3):381-92.
[Mol Cell. 2007]Mol Microbiol. 2007 May; 64(3):738-54.
[Mol Microbiol. 2007]Mol Microbiol. 2008 Dec; 70(5):1076-93.
[Mol Microbiol. 2008]Mol Microbiol. 2007 May; 64(3):738-54.
[Mol Microbiol. 2007]Mol Microbiol. 2008 Dec; 70(5):1076-93.
[Mol Microbiol. 2008]PLoS Genet. 2008 Aug 22; 4(8):e1000163.
[PLoS Genet. 2008]Nucleic Acids Res. 2005; 33(3):966-76.
[Nucleic Acids Res. 2005]Genome Biol. 2005; 6(9):R73.
[Genome Biol. 2005]PLoS Genet. 2008 Aug 29; 4(8):e1000173.
[PLoS Genet. 2008]FEMS Microbiol Lett. 1999 Dec 15; 181(2):261-6.
[FEMS Microbiol Lett. 1999]Nature. 2003 Aug 28; 424(6952):1042-7.
[Nature. 2003]Curr Opin Biotechnol. 2004 Jun; 15(3):191-8.
[Curr Opin Biotechnol. 2004]PLoS Genet. 2008 Aug 29; 4(8):e1000173.
[PLoS Genet. 2008]Nucleic Acids Res. 2003 Jun 1; 31(11):2890-9.
[Nucleic Acids Res. 2003]Nucleic Acids Res. 2001 Sep 1; 29(17):3583-94.
[Nucleic Acids Res. 2001]Proc Natl Acad Sci U S A. 2004 May 11; 101(19):7287-92.
[Proc Natl Acad Sci U S A. 2004]Bioinformatics. 2006 Aug 15; 22(16):1963-70.
[Bioinformatics. 2006]Proc Natl Acad Sci U S A. 2006 May 2; 103(18):7054-8.
[Proc Natl Acad Sci U S A. 2006]BMC Genomics. 2009 Mar 23; 10():123.
[BMC Genomics. 2009]