![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © The Author 2005. Published by Oxford University Press. All rights reserved A computational study of off-target effects of RNA interference Department of Computer Science, University of New Mexico, Albuquerque, NM 87131, USA 1Department of Biology, University of New Mexico, Albuquerque, NM 87131, USA *To whom correspondence should be addressed at Department of Computer Science, University of New Mexico, Farris Engineering Building Room 325, Albuquerque, NM 87131-1386, USA. Tel: +1 505 277 9609; Fax: +1 505 277 9627; Email: terran/at/cs.unm.edu Received December 13, 2004; Revised February 19, 2005; Accepted March 7, 2005. The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions/at/oupjournals.org This article has been cited by other articles in PMC.Abstract RNA interference (RNAi) is an intracellular mechanism for post-transcriptional gene silencing that is frequently used to study gene function. RNAi is initiated by short interfering RNA (siRNA) of ~21 nt in length, either generated from the double-stranded RNA (dsRNA) by using the enzyme Dicer or introduced experimentally. Following association with an RNAi silencing complex, siRNA targets mRNA transcripts that have sequence identity for destruction. A phenotype resulting from this knockdown of expression may inform about the function of the targeted gene. However, ‘off-target effects’ compromise the specificity of RNAi if sequence identity between siRNA and random mRNA transcripts causes RNAi to knockdown expression of non-targeted genes. The complete off-target effects must be investigated systematically on each gene in a genome by adjusting a group of parameters, which is too expensive to conduct experimentally and motivates a study in silico. This computational study examined the potential for off-target effects of RNAi, employing the genome and transcriptome sequence data of Homo sapiens, Caenorhabditis elegans and Schizosaccharomyces pombe. The chance for RNAi off-target effects proved considerable, ranging from 5 to 80% for each of the organisms, when using as parameter the exact identity between any possible siRNA sequences (arbitrary length ranging from 17 to 28 nt) derived from a dsRNA (range 100–400 nt) representing the coding sequences of target genes and all other siRNAs within the genome. Remarkably, high-sequence specificity and low probability for off-target reactivity were optimally balanced for siRNA of 21 nt, the length observed mostly in vivo. The chance for off-target RNAi increased (although not always significantly) with greater length of the initial dsRNA sequence, inclusion into the analysis of available untranslated region sequences and allowing for mismatches between siRNA and target sequences. siRNA sequences from within 100 nt of the 5′ termini of coding sequences had low chances for off-target reactivity. This may be owing to coding constraints for signal peptide-encoding regions of genes relative to regions that encode for mature proteins. Off-target distribution varied along the chromosomes of C.elegans, apparently owing to the use of more unique sequences in gene-dense regions. Finally, biological and thermodynamical descriptors of effective siRNA reduced the number of potential siRNAs compared with those identified by sequence identity alone, but off-target RNAi remained likely, with an off-target error rate of ~10%. These results also suggest a direction for future in vivo studies that could both help in calibrating true off-target rates in living organisms and also in contributing evidence toward the debate of whether siRNA efficacy is correlated with, or independent of, the target molecule. In summary, off-target effects present a real but not prohibitive concern that should be considered for RNAi experiments. INTRODUCTION RNA interference (RNAi) (1) is an intracellular mechanism for post-transcriptional gene silencing that most probably functions in the regulation of gene expression and defense against transposable DNA elements and viruses. RNAi is triggered by double-stranded RNA (dsRNA). Dicer, an enzyme with RNAse activity, cleaves dsRNA into fragments of ~21 nt, termed short interfering RNA (siRNA). The siRNA associates with several proteins to form an RNAi silencing complex (RISC). The sequence of the minus-strand of the siRNA then targets mRNA molecules that have sequence identity for cleavage by RISC. This sequence-directed removal of particular mRNA transcript yields a knockdown of expression of the affected gene. Extensive investigations are ongoing to gain more detailed understanding of RNAi. RNAi has been widely used as an experimental tool for the study of gene function and can be applied for large-scale analyses (2–4). RNAi has aroused a great deal of excitement in both therapeutic and genomic experimental communities because of its potentials for the treatment of a wide spectrum of diseases, such as HIV (5,6), spinocerebellar ataxia type 1 and Huntington's diseases (7), certain classes of cancers (8–10) and hypercholesterolemia (11,12), as well as its demonstrated use in functional genomic studies via controlled gene knockdown (13–15). Both dsRNA and siRNA have been used to knockdown the expression of genes of interest. Resulting phenotypes are then used to infer gene function. Unfortunately, RNAi is not without some complications. Empirically, RNAi was shown to function in many different organisms. However, some organisms (Saccharomyces cerevisiae, Trypanosoma cruzei and Leishmania major) are considered to be RNAi-negative, based on the lack of experimental observations for specific knockdown of targeted genes and on the absence of components, such as Dicer and RISC, in the genes of these organisms that are critical for effective RNAi (16,17). More importantly, concern has arisen that the specificity of RNAi, targeted by the sequence of siRNA, may not be perfect. Initially, RNAi was regarded as a highly specific means of gene repression. Several studies dealing with various model systems supported this idea (13,18–20). However, still siRNA can direct RNAi to target mRNA sequences that lack complete sequence identity (21). Agrawal et al. (4) forwarded concerns over specificity of gene repression in RNAi. Saxena et al. (22) have demonstrated the effect of siRNA mismatches on target specificity in mammalian tissue culture cells and reported ‘off-target’ gene knockdown. Sequence identity of as few as 11 contiguous nucleotides to siRNA caused direct silencing of non-target genes in experiments conducted on specificity of siRNA in cultured human cells (23). Scacheri et al. (24) pointed out that mismatches between siRNA and target sequences could have caused off-target RNAi in mammalian cells but such effects are difficult to detect. Combined, the above examinations of RNAi off-target effects have yielded mixed results. Perhaps as a consequence, RNAi studies do not explicitly control for off-target effects on a routine basis. Of course, a lack of specificity resulting in knockdown of unknown or unintended genes has considerable negative implications for functional genomics. Target specificity is also of paramount importance when considering applications of RNAi in therapeutics (3,4). For clarification of these uncertainties regarding RNAi, the off-target effect should be evaluated for each gene expressed by the organism under study, by considering multiple possible factors affecting off-target silencing. Such comprehensive studies are most probably expensive and cumbersome to conduct experimentally. A computational approach is less expensive to implement and permits the extension of real parameters into wider ranges for fully observing the trends and effects upon RNAi specificity. This work represents a systematic computational study of RNAi-related off-target effects in several organisms. Current guidelines for the design of siRNA and dsRNA for RNAi experiments recommend BLAST similarity searches (25) against sequence databases to identify potential off-target genes to improve the likelihood that only the intended single gene is targeted (26). However, the BLAST algorithm was not specifically designed to assess RNAi off-target effects. Therefore, dedicated computational methods were developed for improved detection of sequence identity to accurately and systematically evaluate RNAi off-target effects between siRNA sequences and target genes on a transcriptome-wide scale. In this computational study, three organisms, Schizosaccharomyces pombe (fission yeast), Caenorhabditis elegans and Homo sapiens (human) were examined. The likelihood of off-target effects for all known genes in each of these organisms were evaluated, including factors that may impact the target specificity and efficiency of RNAi. These factors included the length of siRNA, the length of dsRNA, the length of siRNA-target sequence mismatch, the position of mismatch within the siRNA sequence, the position of dsRNA within its target, coding sequences (CDSs) and untranslated regions (UTRs) as targets for RNAi, the chromosomal location and density of genes, and the effect of siRNA selection by rational siRNA design (27). These analyses were aimed to gain insights toward improving specificity of RNAi for functional genomics and potential future therapeutic application by facilitating a better understanding of off-target effects of RNAi. It would also be desirable to include effects such as RNAi directed against promoter regions, concentration dependences and the non-linear silencing effects of siRNA pools. Unfortunately, published empirical data on such effects are currently sparse that we cannot construct a reasonable computational model for them, hence these classes of interactions are omitted from this study. MATERIALS AND METHODS Sequence data The sequence data used in this study were collected from the S.pombe, C.elegans and H.sapiens. RNAi has been observed in each of these organisms and extensive sequence data, including full genome sequences, were available for analysis. These three organisms represent a wide phylogenetic range. We used the cDNA sequences of 5401 genes of S.pombe available at the Sanger Institute (ftp://ftp.sanger.ac.uk/pub/yeast/pombe). The cDNA sequences from 22 168 genes of C.elegans (release WS110) were obtained from the Wormbase at Sanger Institute. The collective sequence data considered to represent 3′-UTR sequences from C.elegans consisted of 1000 UTRs that were present in the expressed sequence tag database combined with sequences that resulted from the UTR prediction method of Hajarnavis et al. (28). The dataset of human genes representing 27 852 mRNAs with 3′-UTRs was taken from the RefSeq database at NCBI (http://www.ncbi.nlm.nih.gov).Modeling RNAi and off-target effects Although computational methods exist to model aspects of mechanisms that employ short RNA sequences to regulate gene expression, such as microRNA (miRNA) genes (29,30), miRNA targets (31,32) and siRNA efficacy (33–35), none was available to study RNAi off-target effects. Thus, dedicated computational methods were developed for improved detection of sequence identity to accurately and systematically evaluate RNAi off-target effects based on sequence identity between siRNA sequences and target genes on a transcriptome-wide scale. RNAi is guided by complete and near complete sequence identity of siRNA and the target mRNA transcript (21–23). siRNA sequences are generated by the activity of Dicer, an enzyme that cleaves long dsRNA into fragments of ~21 bp (19). To model RNAi, we determined the incidence of sequence identity (exact and allowing for some mismatch) of each of all possible siRNA sequences (arbitrary length range of 17–29 nt) derived from the length of dsRNA (100, 200, 300 and 400 nt starting at the first coding nucleotide, and the sequence region from 100 to 200 nt) representing any of the CDSs relative to all possible siRNA sequences predicted from the CDSs of each of the organisms studied. Sequence identity of the siRNA derived from a given gene by using another gene was considered to signify a potential off-target RNAi. To mimic RNAi that is directed through siRNA with sequence identity to the UTRs of mRNA transcripts, both upstream and downstream UTR sequences (if available) were included for the analysis of off-target effects. With these variables, the effect of length of both siRNA and initial dsRNA upon the chance of off-target effects was investigated. This was implemented as follows. The similarity between two oligonucleotides is computed with inner product in the feature space using the n-gram feature map, as described previously (36). The use of an inverted file and red black tree (RBT) for calculating the inner products in the feature space achieved efficient computational performance. Computational representation of siRNA-target binding We describe each gene by its possible contiguous subsequences of length n (typically ~21, Table 1 and Figure 1
Computing the similarity of Equation 2 to find the off-target error in the genome using vector space model directly requires O(DF4n) time, where F (40 × 106 for C.elegans and 60 × 106 for human) is the number of n-grams in the genome that may include UTR sequence and D (close to F) is the amount of n-grams to be compared in the CDSs. For genome-wide scanning, this computing time is prohibitive and can be improved by using the sparsity of the feature vectors. We use an inverted file where the n-grams serve as identifiers and their gene names and positions within the genes serve as attributes (the positions are used for mismatches later). If we ignore n-mers having zero occurrence and allow for the duplication of n-mers, a gene gx can be represented in the feature space compactly
K(gx, gy) in Equation 2 is computed by searching each n-mer of gx for gy in the inverted file. K(gx, gy) is the number of occurrence of gy among the matched genes. Each search in the RBT takes O(log F) time, resulting in a time of O(kxlog F) for computing K(gx, gy). Definition of off-target error rate We define the off-target error using the exact match feature map. However, it is the same for the mismatch feature map defined later. To simulate Dicer's cleavage of dsRNA into siRNAs, we take an oligonucleotide, ox, as dsRNA from gene gx and map it onto the feature space, expressed compactly as
To assess the off-target error rates, we employ measures from information retrieval theory. Let
We take an oligonucleotide as dsRNA from each gene in the genome and compute its off-target error and average the errors for all genes to evaluate the effects of the parameters. Thus, we define the average error rate for a given parameter set to be
An algorithm for detecting siRNA-target binding allowing mismatches Experiments have shown that RNAi works despite the existence of a number of mismatched nucleotides between the siRNA and its target gene (22,23). However, the efficacy changes with the length of the mismatch and the position of the mismatch on the siRNA. Several algorithms have been developed for string mismatching, a problem that relates to siRNA-target similarity. Leslie et al. (36) used a trie to construct a mismatch tree for computing their mismatch string kernels applied in a support vector machine classifier to detect protein families. Suffix trees were used as data structures to predict putative RNAi (40). Amir et al. (41) have developed an algorithm for single mismatch string searches (m = 1), which is not enough for our study. BLAST (25) also allows for mismatch by using substitutions based on alignment cost. However, the related mismatch algorithms are not particularly developed for RNAi and cannot control the positions of the mismatch as required in computational models for RNAi. We define mismatch feature map as follows. For an n-mer a from an alphabet
We first introduce some notations. Let Definition. A mirrored tree of a binary search tree (BST) populated with strings from S is the BST populated with reverse strings s1, s2, …, sN. A u leading range of a string s from S searched in a BST is the set of nodes returned by a search that only matches the beginning u letters of s. The mismatch kernel corresponding to Equation 9 can be computed by the mirrored tree search (MTS) in Algorithm 1. We omit its correctness proof owing to space limitation. At Steps 5 and 6, the substring before the mismatch is exact-matched in T1 and the leading range is stored in R1, the substring after the mismatch is exact-matched in T2 and the leading range is stored in R2. The genes corresponding to the mismatch letters are sandwiched in C at Step 7 by the intersection based on gene names and positions of the n-mers. At Step 9, Ei is computed using the definition based on Ci.
Let the size of the inverted file be F and the total number of n-grams from all the dsRNAs be D. MTS has a cost of Simulating positional effect of mismatches Experiments suggested that nucleotides in the region of 2–9 nt at the 5′ end of the guide strand are crucial for gene silencing (22,43). It therefore seems that transcripts containing sequence identity within this critical binding region would have a higher probability of being targeted for silencing and that mismatches within this region would have a more significant effect on reducing off-target silencing. To see this positional effect of mismatch, we use a weighted scheme where we assign lower silencing efficiency scores if the mismatches are in the critical binding region, and higher scores if the mismatches are outside of the region. The silencing efficiency score for silencing a gene is the sum of all the scores contributed by each siRNA. A gene is considered silenced only when its total efficiency score is above a threshold. We use contiguous mismatches to control their positions. Distribution of redundant siRNA sequences in the transcriptome A coincidental high frequency of particular 21mer target sequences within a transcriptome would increase the probability for off-target effects of any given siRNA sequences. The transcriptome of each organism tested was described as a collection of all possible 21mer sequences contained within and frequency of each sequence was determined. Web utilities were made available so that the frequencies of siRNAs of a particular sequence and genes targeted by the sequence can be retrieved, for the benefit of siRNA design. Effect of dsRNA position Frequently, dsRNA (of various lengths) is used to affect RNAi experimentally. Success was obtained with dsRNA sequences from various locations within the full-length CDS of targeted genes. The position of dsRNA along the target sequence was investigated as a parameter for off-target effects. Beginning with the first nucleotide of the coding region of a gene, the start position of dsRNA was incremented 6 nt (two codons) until position 600 (on average, the final dsRNA closely approached the end of the CDS). The off-target error (based on exact sequence identity) was determined for such dsRNA for all CDSs in the transcriptome. Off-target error distributions in chromosomes The physical distribution of genes on chromosomes is not uniform. Often more genes are located in the middle of a chromosome than in the ends. The chance for off-target errors, based on exact sequence identity, for each gene was plotted against its coordinates on the genetic map of C.elegans. The curves were smoothed by averaging the error of a gene with that of its neighbors. Implementation of rational siRNA design Different siRNAs from the same target gene have highly variable efficacy (27,43). Several biological and thermodynamical properties have been identified to characterize siRNA sequences that mediate especially efficient RNAi knockdown (27,34,35,43,44). This has led to a set of rules for ‘rational design’ to optimize siRNA development (27). All possible siRNA sequences of each of the organisms studied were scored by the eight criteria of rational design. The length of siRNA sequences analyzed ranged from 17 to 29 nt (odd numbers only). The off-target error (based on exact sequence identity only) was determined for the highest scoring siRNAs from each gene (in pools of 5, 10 and 20 sequences) to evaluate whether rational design may reduce off-target errors. RESULTS Off-target error based on siRNA sequence identity versus the transcriptome The comparison of siRNA sequences with an arbitrary range of lengths (from 17 to 29 nt) derived from particular dsRNA sequences against all possible targets on a genome-wide scale disclosed that the length of siRNA sequences is an important parameter for determining off-target effects as defined by sequence identity with other than the intended target sequence for all three organisms tested. Only CDSs were used as target sequences. The chance for off-target errors decreased with increasing lengths of siRNA. siRNA of 21 nt proved optimal, the chance for off-target effects with this length was significantly lower than for shorter siRNA sequences whereas it did not differ significantly from that of longer siRNAs (Figure 2
Effects of length and position of mismatches Allowing sequence mismatch of up to nine contiguous nucleotides between siRNA and its target sequences increased the off-target error (Figure 3
Using the weighted scheme for simulating the positional effect of mismatches, we found that off-target error rates corresponding to mismatches within the critical binding region (2–9 nt at the 5′ end of the guide strand) were significantly lower, whereas the error rates corresponding to mismatches outside this region were much higher, consistent with the findings in the literature (22,23,43). Figure 4
Effects of UTRs Incorporation of available (not for S.pombe) UTR sequence data considerably increased the size and diversity of the target sequences for H.sapiens and C.elegans. The 3′-UTR sequences described for human transcripts when added to the inverted file containing the CDSs, increased the RBT by 58%, and the number of leave nodes grew from 41.4 million to 65.5 million. The use of exact sequence identity as parameter while analyzing siRNA of various lengths, derived from different lengths of dsRNA representing CDS target sequences only, showed only non-significant increase in off-target errors of RNAi in the case of H.sapiens and C.elegans (Figure 5
Frequency of specific 21mer sequences in different transcriptomes The sequence data of the transcriptome of each of the three organisms studied were computationally scanned for the occurrence of all possible 21mer sequences representing siRNA, derived from the same transcriptome. Particular sequences were present at distinctly different frequencies (Figure 6
To assist siRNA designers to evaluate off-target errors, we have made available the frequency count of each siRNA in the three genomes of H.sapiens, C.elegans and S.pombe on the Web at http://rnai.cs.unm.edu/rnai/off-target/sirna_freq/. The website accepts siRNAs and returns their occurrence count that serves as indicators for off-target chances. We also provide a web tool that searches for the genes targeted by a given sequence allowing mismatches and different siRNA lengths (http://rnai.cs.unm.edu/rnai/off-target/genes-targeted/). Effect of dsRNA position along the target sequence The incremental variation of the position of the dsRNA (that served as source for the siRNA) along the target sequence showed that off-target errors were significantly lower for the beginning 100 nt positions than for those in the following positions. Figure 7A
Off-target error distributions on physical maps of chromosomes The chance for off-target errors for each CDS in the transcriptome was mapped onto the physical map of chromosomes from C.elegans (Figure 8
Effect of rational siRNA design All possible siRNA sequences of various lengths from dsRNA (l = 300) were selected using rational design parameters to identify a subset of siRNA sequences that are more likely to effectively guide RNAi. The off-target error for this subset of sequences was determined for H.sapiens and C.elegans owing to their high off-target error rates (Figure 9
To understand the relationship of an siRNA's frequency in the genome and its efficacy represented by its score of rational design, we computed the correlation coefficient between the count and the rational score of each siRNA. These correlation coefficients are very small, as shown in Table 2, indicating that the frequency and the rational score are not correlated. This independence between frequency of an siRNA and its rational score suggests that the objectives of minimizing off-target error rates of siRNAs and maximizing their efficacy can be pursued independently in an siRNA design. DISCUSSION This computational study of mechanistic aspects of RNAi against the background of extensive transcriptome and genome information available in the nematode C.elegans, H.sapiens (human) and (to a lesser extent) for S.pombe (fission yeast), indicated a considerable likelihood that the specificity of RNAi knockdown is compromised by off-target RNAi effects. The similarity of observations from organisms of a wide phylogenetic range (fungi to both protostome and deuterostome animals) suggests that the conclusions from our analyses may provide insights into general aspects of RNAi. The results reported here were derived from computational approaches only; the feasibility of experimental validation is compromised by the large (genome-size) scale of the sequence data considered in these analyses. However, the parameters used for the computational analyses were applied at a high stringency compared with the conditions that allow RNAi in vivo. For instance, only sequence identity and minimal mismatch were considered to define siRNA specificity for a target sequence, bulge or wobble phenomena that relax sequence-specific target recognition by siRNA (22) were not allowed for. In addition, with a computational approach it was feasible to test parameters (such as lengths of dsRNA and especially siRNA) beyond the naturally occurring ranges to examine properties and trends of RNAi specificity. Our work does, however, suggest an empirical investigation that would be informative about both in vivo off-target rates and the properties of the siRNA binding/knockdown process. In each of the organisms studied here, we have identified a number of siRNA that have the highest potential for off-target effects, along with the predicted affected genes and predicted efficacy according to the rational design rules (http://rnai.cs.unm.edu/rnai/off-target/). An in vivo study of the knockdown produced by some or all of these siRNAs with regard to the putative affected genes and controlled by monitoring predicted non-target genes (measured, e.g. by microarray analysis) could reveal whether the predicted off-target effects do, in fact, occur in living systems. The rates of off-target knockdown would help to calibrate the predicted rates in this paper. Furthermore, such an experiment would contribute evidence toward the current debate of whether or not efficacy is purely a function of the siRNA or is also dependent on the target molecule (45–47). The algorithm used to detect sequence similarity as parameter for off-target RNAi was designed specifically for use with short (siRNA) sequences, while it also incorporated the use of dsRNA as a source for siRNA sequences. Thus, the analyses are relevant for two ways to experimentally affect RNAi, introduction of dsRNA and siRNA (18,26). The algorithm is superior to the BLASTN algorithm (25), which is usually recommended to evaluate potential off-target properties of siRNA and dsRNA designs toward other genes (26). While BLAST search offers some protection against off-target effects, and is certainly better than no control whatsoever, it is not, by itself, sufficient for general use for at least two reasons. First, the BLAST homology function was not particularly developed to model the RNAi-binding process and does not account for some of its known features. For example, mismatches and bulges are known to have differential effects on efficacy, varying along the length of the siRNA (22,23). Although BLAST allows for mismatch, insertion and deletion based on alignment cost, it cannot control the positions of these imperfect match patterns. Our algorithm is capable of modeling these patterns by controlling the length and positions, allowing it to detect off-target effects that would be missed by BLAST searches. Second, BLAST is suitable only when the entire genome sequence is available—in the absence of complete genome information, it is possible that significant off-target interactions will be missed. Although we can also only search complete genomes in the current work, we have quantified expected off-target error rates in a number of organisms, establishing a range of probable off-target rates. In an otherwise unsequenced organism, these bounds can be used to estimate the probability of off-target effects based on comparison of its genome size and evolutionary history. They can also be used to ameliorate such effects through multiple trials with varying siRNA selected from the target gene. Using off-target framework built in this work, we are able to develop quantitative models to predict off-target errors by incorporating a number of variables such as genome size and chromosomal location of a target gene in addition to the parameters we used in Materials and Methods. These models will provide reliable prediction of false positive error rates when an organism is partially sequenced. We should note that our predictions neither include the effects of siRNA concentration nor do they attempt to account for the non-linear (synergistic or mutually interfering) interactions of a pool of siRNA. It is clear that both these effects are of critical practical consequence and that a computational model supporting them is desirable. At the moment, however, there is insufficient published data on the efficacies of pools to be able to construct a high-confidence model of pool effects. From some reports (15) it is clear that simplistic models, such as linear combinations weighted by concentration, are inadequate. Thus, the results in this paper do not attempt to model either concentration or non-linear siRNA pool effects. Our results should, therefore, be interpreted as the chance that any single siRNA arising from a chosen dsRNA has a chance of off-target interaction within the genome. In practice, this may be an overestimate of true off-target effects, but it does still provide an indication of off-target genes that should be monitored for potential off-target repercussions. Remarkably, the examination of RNAi off-target error as a function of siRNA length disclosed that siRNA sequences of 21 nt, the length most observed in vivo, optimally balanced target specificity and low chance of off-target RNAi. siRNA sequences of <21 nt had increased chance for off-target effects whereas longer sequences did not gain adequate target specificity to significantly reduce off-target reactivity. This siRNA length effect suggests that the chance for off-target RNAi effects may increase with the use of artificial siRNA sequences of <21 nt, such as 12–15 nt dsRNA fragments that result from RNase III digestion of dsRNA (48). The protozoan parasite Trypanosoma bruci employs comparatively long siRNA (24–26 nt) to target RNAi (39), perhaps for the benefit of gaining some critical specificity of RNAi. However, sufficient sequence data are lacking at this time to validly investigate the off-target dynamics for siRNA of various lengths in this organism. Despite inherent properties that combine optimally for specific sequence-based recognition, 21 nt siRNA still have a considerable chance for off-target effects when considering all coding domains within a transcriptome. Not surprisingly, the incidence of off-target effects increased when sequence mismatch of up to nine consecutive residues between the siRNA and the potential targets was allowed for. Varying the position of these mismatches within the siRNA sequence changed the number of potential target sequences. Consistent with experimental observations (22,23,43), we found that off-target error rates corresponding to mismatches within the region of 2–9 nt at the 5′ end of the guide strand were significantly lower. The off-target effects also increased following inclusion of upstream and downstream UTR sequences within the target sequences, to reflect the in vivo reality that complete mRNA transcripts (not just the protein-encoding sequences) can be attacked by RNAi. Although this increase was not significant (Figure 5 Combined, the above computational findings suggest an extensive potential for off-target effect of RNAi experiments. However, in practice, chances for off-target errors may be less severe. RNAi targets mRNA for destruction and can only knock down genes that are expressed when siRNA is present. Potential off-target genes (that have adequate sequence identity to siRNA) will not be affected if they are not expressed simultaneously with the intended target gene. Our analysis showed that relatively few siRNA targets a sequence that is repeated frequently throughout the transcriptome of each of the organisms tested. In fact, siRNA designs can be screened for this property (http://rnai.cs.unm.edu/rnai/off-target/) to avoid the use of siRNA with increased chance for off-target errors. Moreover, we determined the chance for off-target error for each gene within the transcriptome of C.elegans relative to its position on the physical map of the genome of this nematode. CDSs from chromosome regions that contain more densely packed genes had a lower probability for off-target RNAi, as observed in all chromosomes except chromosomes IV and V. This implied that densely packed genes generally employ more unique sequences within the genome of C.elegans. Regardless, once a physical map is available for an organism, it may be possible to correlate the need to consider RNAi off-target error for a particular gene with the location of that gene within the genome. In addition, the results of the combined analysis suggest a trend where the chance for the off-target error is elevated for larger genomes. Of note, C.elegans and H.sapiens have roughly the same proportions of unique siRNAs (Table 2), but the off-target error rate in H.sapiens was much higher (Figure 2D Finally, several properties of siRNA sequences have been found to be associated with a high efficacy to cause RNAi. For instance, the relative thermodynamical stability of the sequence termini may determine how a double-stranded siRNA dissociates to correctly incorporate the negative RNA strand into the RISC complex (43). Such properties have been combined into rational design methods for improving the siRNA efficacy (43). Implementation of rational design yielded a considerable reduction in the number of functional siRNA sequences derived from the transcriptomes of H.sapiens and C.elegans, thereby reduced likelihood for off-target error. Statistical analysis showed that minimizing off-target error and enhancing siRNA efficacy can be performed independently. In summary, experimental RNAi targeted by siRNA has a certain degree of specificity. However, off-target effects yielding unintentional knockdown of unrelated genes are probable. The random occurrence of some level of sequence identity (including imperfect match) between siRNA and multiple targets in a transcriptome contributes to this undesired effect. The computational methods applied here may underestimate the off-target effects because of fairly stringent matching of sequence identity. Further studies will consider more relaxed rules for siRNA–target interaction such as bulge and wobble effects that occur in vivo. Although off-target effects can be reduced by minimizing sequence similarity with known transcripts and by rational design, it is recommended to include controls for specific targeting in RNAi experiments. Further understanding of siRNA will lead to more precise targeting of RNAi and reduce off-target effect to benefit the study of gene function and other future applications of RNAi. Acknowledgments The authors thank Vladimir Vuksan for implementing the web tools. This work was supported by NIH under grant number P20RR18754 from the Institutional Development Award Programme of the National Center for Research Resource. C.M.A. is supported by NIH grant RO1-AI052363. Funding to pay the Open Access publication charges for this article was provided by NIH grant number P20RR18754. Conflict of interest statement. None declared. REFERENCES 1. Fire A., Xu S.Q., Montgomery M.K., Kostas S.A., Driver S.E., Mello C.C. Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature. 1998;391:806–811. [PubMed] 2. Fraser A.J.G., Kamath R.S., Zipperten P., Campos M.M., Sohrmann M., Ahringer J. Functional genomic analysis of C.elegans chromosome I by systemic RNA interference. Nature. 2000;408:325–330. [PubMed] 3. Dillin A. The specifics of small interfering RNA specificity. Proc. Natl Acad. Sci. USA. 2003;100:6289–6291. [PubMed] 4. Agrawal N., Dasaradhi P.V.N., Mohmmed A., Malhotra P., Bhatnagar R.K., Mukherjee S.K. RNA interference: biology, mechanism and applications. Microbiol. Mol. Biol. Rev. 2003;67:657–685. [PubMed] 5. Jacque J.M., Triques K., Stevenson M. Modulation of HIV-1 replication by RNA interference. Nature. 2002;418:435–438. [PubMed] 6. Surabhi R., Gaynor R. RNA interference directed against viral and cellular targets inhibits human immunodeficiency virus type 1 replication. J. Virol. 2002;76:12963–12973. [PubMed] 7. Xia H., Mao Q., Eliason S.L., Harper S.Q., Martins I.H., Orr H.T., Paulson H.L., Yang L., Kotin R.M., Davidson B.L. RNAi suppresses polyglutamine-induced neurodegeneration in a model of spinocerebellar ataxia. Nature Med. 2004;10:816–820. [PubMed] 8. Hannon G.J. RNA interference. Nature. 2002;418:244–251. [PubMed] 9. Borkhardt A. Blocking oncogenes in malignant cells by RNA interference—new hope for a highly specific cancer treatment? Cancer Cell. 2002;2:167–168. [PubMed] 10. Barik S. Development of gene-specific double-stranded RNA drugs. Ann. Med. 2004;36:540–551. [PubMed] 11. Check E. Hopes rise for RNA therapy as mouse study hits target. Nature. 2004;432:136. [PubMed] 12. Soutschek J., Akinc A., Bramlage B., Charisse K., Constien R., Donoghue M., Elbashir S., Geick A., Hadwiger P., Harborth J., et al. Therapeutic silencing of an endogenous gene by systemic administration of modified siRNAs. Nature. 2004;432:173–178. [PubMed] 13. Chi J.T., Chang H.Y., Wang N.N., Chang D.S., Dunthy N., Brown P.O. Genomewide view of gene silencing by small interfering RNAs. Proc. Natl Acad. Sci. USA. 2003;100:6343–6346. [PubMed] 14. Kamath R.S., Fraser A.G., Dong Y., Poulin G., Durbin R., Gotta M., Kanapin A., Le Bot N., Moreno S., Sohrmann M., et al. Systematic function analysis of the C. elegans genome using RNAi. Nature. 2003;421:231–237. [PubMed] 15. Hsieh A.C., Bo R., Manola J., Vazquez F., Bare O., Khvorova A., Scaringe S., Sellers W.R. A library of siRNA duplexes targeting the phosphoinositide3-kinase pathway: determinants of gene silencing for use in cell-based screens. Nucleic Acids Res. 2004;32:893–901. [PubMed] 16. Catalanotto C., Azzalin G., Macino G., Cogoni C. Transcription—gene silencing in worms and fungi. Nature. 2000;404:245. [PubMed] 17. Ullu E., Tschudi C., Chakraborty T. RNA interference in protozoan parasites. Cell. Microbiol. 2004;6:509–519. [PubMed] 18. Tuschl T., Zamore P.D., Lehmann R., Bartel D.P., Sharp P.A. Targeted mRNA degradation by double-stranded RNA in vitro. Genes Dev. 1999;13:3191–3197. [PubMed] 19. Elbashir S., Harborth J., Lendeckel W., Yalcin A., Weber K., Tuschl T. Duplexes of 21-nucleotide RNAs mediate RNA interference in cultured mammalian cells. Nature. 2001;411:494–498. [PubMed] 20. Semizarov D., Frost L., Sarthy A., Kroeger P., Halbert D.N., Fesik S.W. Specificity of short interfering RNA determined through gene expression signatures. Proc. Natl Acad. Sci. USA. 2003;100:6347–6352. [PubMed] 21. Elbashir S.M., Martinez J., Patkaniowska A., Lendeckel W., Tuschl T. Functional atonamy of siRNA for mediating efficient RNAi in Drosophila melanogaster embryo lysate. EMBO J. 2001;20:6877–6888. [PubMed] 22. Saxena S., Jonsson Z.O., Dutta A. Small RNAs with imperfect match to endogenous mRNA repress translation. J. Biol. Chem. 2003;278:44312–44319. [PubMed] 23. Jackson A.L., Bartz S.R., Schelter1 J., Kobayashi S.V., Burchard J., Mao M., Li B., Cavet G., Linsley P.S. Expression profiling reveals off-target gene regulation by RNAi. Nat. Biotechnol. 2003;21:635–637. [PubMed] 24. Scacheri P.C., Rozenblatt-Rosen O., Caplen N.J., Wolfsberg T.G., Umayam L., Lee J.C., Hughes C.M., Shanmugam K.S., Bhattacharjee A., Meyerson M., Collins F.S. Short interfering RNAs can induce unexpected and divergent changes in the levels of untargeted proteins in mammalian cells. Proc. Natl Acad. Sci. USA. 2004;101:1892–1897. [PubMed] 25. Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. [PubMed] 26. Elbashir S.M., Harborth J., Weber K., Tuschl T. Analysis of gene function in somatic mammalian cells using small interfering RNAs. Methods. 2002;26:199–213. [PubMed] 27. Reynolds A., Leake D., Boese Q., Scaringe S., Marshall W.S., Khovorova A. Rational siRNA design for RNA interference. Nat. Biotechnol. 2004;22:326–330. [PubMed] 28. Hajarnavis A., Korf I., Durbin R. A probabilistic model of 3′ end formation in Caenorhabditis elegans. Nucleic Acids Res. 2004;32:3392–3399. [PubMed] 29. Lim L.P., Glasner M.E., Yekta S., Burge C.B., Bartel D.P. Vertebrate microRNA genes. Science. 2003;299:1540. [PubMed] 30. Lai E.C., Tomancak P., Williams R.W., Rubin G.M. Computational identification of Drosophila microRNA genes. Genome Biol. 2003;4:R42. [PubMed] 31. Lewis B.P., Shih I.-H., Jones-Rhoades M.W., Bartel D.P., Burge C.B. Prediction of mammalian microRNA targets. Cell. 2003;115:787–798. [PubMed] 32. Enright A.J., John B., Gaul U., Tuschl T., Sander C., Marks D.S. MicroRNA targets in Drosophila. Genome Biol. 2003;5:R1. [PubMed] 33. Pancoska P., Moravek Z., Moll U.M. Effcient RNA interference depends on global context of the target sequence: quantitative analysis of silencing effciency using Eulerian graph representation of siRNA. Nucleic Acids Res. 2004;32:1469–1479. [PubMed] 34. Amarzguioui M., Prydz H. An algorithm for selection of functional siRNA sequences. Biochem. Biophys. Res. Commun. 2004;316:1050–1058. [PubMed] 35. Chalk A.M., Wahlestedt C., Sonnhammer E.L. Improved and automated prediction of effective siRNA. Biochem. Biophys. Res. Commun. 2004;319:264–274. [PubMed] 36. Leslie C., Eskin E., Cohen A., Weston J., Noble W.S. Mismatch string kernels for discriminative protein classification. Bioinformatics. 2003;1:1–10. 37. Vapnik V.N. Statistical Learning Theory. NY: Wiley; 1998. 38. Qiu S., Lane T. String kernels of imperfect matches for off-target detection in RNA interference. In: Sunderam V., Albada G.D., Sloot P.M.A., Dongarra J.J., editors. Atlanta, GA (to appear): Springer-Verlag; 2005. Proceedings of Fifth International Conference on Computational Science, Lecture Notes in Computer Science. 39. Djikeng A., Shi H., Tschudi C., Ullu E. RNA interference in Trypanosoma brucei: cloning of small interfering RNAs provides evidence for retroposon-derived 24–26-nucleotide RNAs. RNA. 2001;7:1522–1530. [PubMed] 40. Horesh Y., Amir A., Michaeli S., Unger R. A rapid method for detection of putative RNAi target genes in genomic data. Bioinformatics. 2003;19(Suppl. 2):ii73–ii80. [PubMed] 41. Amir A., Landau G., Keselman D., Lewenstein M., Lewenstein N., Rodeh M. Text indexing and dictionary matching with one error. J. Algorithms. 2000;37:309–325. 42. Garcia-Molina H., Ullman J.D., Widom J.D. Database Systems: The Complete Book. NJ: Prentice Hall Inc; 2002. 43. Khvorova A., Reynolds A., Jayasena1 S.D. Functional siRNAs and miRNAs exhibit strand bias. Cell. 2003;115:209. [PubMed] 44. Ui-Tei K., Naito Y., Takahashi F., Haraguchi T., Ohki-Hamazaki H., Juni A., Ueda R., Saigo K. Guidelines for the selection of highly effective siRNA sequences for mammalian and chick RNA interference. Nucleic Acids Res. 2004;32:936–948. [PubMed] 45. Sætrom P., Ola Snøve J. A comparison of siRNA efficacy predictors. Biochem. Biophys. Res. Commun. 2004;321:247–253. [PubMed] 46. Yoshinari K., Miyagishi M., Taira K. Effects on RNAi of the tight structure, sequence and position of the targeted region. Nucleic Acids Res. 2004;32:691–699. [PubMed] 47. Luo K.Q., Chang D.C. The gene-silencing efficiency of siRNA is strongly dependent on the local structure of mRNA at the targeted region. Biochem. Biophys. Res. Commun. 2004;318:303–310. [PubMed] 48. Yang D., Buchholz F., Huang Z., Goga A., Chen C.-Y., Brodsky F.M., Bishop M.J. Short RNA duplexes produced by hydrolysis with Escherichia coli RNase III mediate effective RNA interference in mammalian cells. Proc. Natl Acad. Sci. USA. 2002;99:9942–9947. [PubMed] 49. Nielsen H., Engelbrecht J., Brunak S., vonHeijne G. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng. 1997;10:1–6. [PubMed] 50. Dykxhoorn D.M., Novina C.D., Sharp P.A. Killing the messenger short RNAs that silence gene expression. Nature Rev. Mol. Cell Biol. 2003;4:457–467. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||||||||||||||||||||
Nature. 1998 Feb 19; 391(6669):806-11.
[Nature. 1998]Nature. 2000 Nov 16; 408(6810):325-30.
[Nature. 2000]Microbiol Mol Biol Rev. 2003 Dec; 67(4):657-85.
[Microbiol Mol Biol Rev. 2003]Nature. 2002 Jul 25; 418(6896):435-8.
[Nature. 2002]J Virol. 2002 Dec; 76(24):12963-73.
[J Virol. 2002]Nature. 2000 Mar 16; 404(6775):245.
[Nature. 2000]Cell Microbiol. 2004 Jun; 6(6):509-19.
[Cell Microbiol. 2004]Proc Natl Acad Sci U S A. 2003 May 27; 100(11):6343-6.
[Proc Natl Acad Sci U S A. 2003]Genes Dev. 1999 Dec 15; 13(24):3191-7.
[Genes Dev. 1999]Proc Natl Acad Sci U S A. 2003 May 27; 100(11):6347-52.
[Proc Natl Acad Sci U S A. 2003]Proc Natl Acad Sci U S A. 2003 May 27; 100(11):6289-91.
[Proc Natl Acad Sci U S A. 2003]Microbiol Mol Biol Rev. 2003 Dec; 67(4):657-85.
[Microbiol Mol Biol Rev. 2003]J Mol Biol. 1990 Oct 5; 215(3):403-10.
[J Mol Biol. 1990]Methods. 2002 Feb; 26(2):199-213.
[Methods. 2002]Nat Biotechnol. 2004 Mar; 22(3):326-30.
[Nat Biotechnol. 2004]Nucleic Acids Res. 2004; 32(11):3392-9.
[Nucleic Acids Res. 2004]Science. 2003 Mar 7; 299(5612):1540.
[Science. 2003]Genome Biol. 2003; 4(7):R42.
[Genome Biol. 2003]Cell. 2003 Dec 26; 115(7):787-98.
[Cell. 2003]Genome Biol. 2003; 5(1):R1.
[Genome Biol. 2003]Nucleic Acids Res. 2004; 32(4):1469-79.
[Nucleic Acids Res. 2004]EMBO J. 2001 Dec 3; 20(23):6877-88.
[EMBO J. 2001]Nat Biotechnol. 2003 Jun; 21(6):635-7.
[Nat Biotechnol. 2003]Nature. 2001 May 24; 411(6836):494-8.
[Nature. 2001]Nat Biotechnol. 2003 Jun; 21(6):635-7.
[Nat Biotechnol. 2003]Microbiol Mol Biol Rev. 2003 Dec; 67(4):657-85.
[Microbiol Mol Biol Rev. 2003]Methods. 2002 Feb; 26(2):199-213.
[Methods. 2002]RNA. 2001 Nov; 7(11):1522-30.
[RNA. 2001]J Biol Chem. 2003 Nov 7; 278(45):44312-9.
[J Biol Chem. 2003]Nat Biotechnol. 2003 Jun; 21(6):635-7.
[Nat Biotechnol. 2003]J Biol Chem. 2003 Nov 7; 278(45):44312-9.
[J Biol Chem. 2003]Nat Biotechnol. 2003 Jun; 21(6):635-7.
[Nat Biotechnol. 2003]Bioinformatics. 2003 Oct; 19 Suppl 2():ii73-80.
[Bioinformatics. 2003]J Mol Biol. 1990 Oct 5; 215(3):403-10.
[J Mol Biol. 1990]J Biol Chem. 2003 Nov 7; 278(45):44312-9.
[J Biol Chem. 2003]Cell. 2003 Oct 17; 115(2):209-16.
[Cell. 2003]Nat Biotechnol. 2004 Mar; 22(3):326-30.
[Nat Biotechnol. 2004]Cell. 2003 Oct 17; 115(2):209-16.
[Cell. 2003]Biochem Biophys Res Commun. 2004 Apr 16; 316(4):1050-8.
[Biochem Biophys Res Commun. 2004]Biochem Biophys Res Commun. 2004 Jun 18; 319(1):264-74.
[Biochem Biophys Res Commun. 2004]Nucleic Acids Res. 2004; 32(3):936-48.
[Nucleic Acids Res. 2004]J Biol Chem. 2003 Nov 7; 278(45):44312-9.
[J Biol Chem. 2003]Nat Biotechnol. 2003 Jun; 21(6):635-7.
[Nat Biotechnol. 2003]Cell. 2003 Oct 17; 115(2):209-16.
[Cell. 2003]Nat Biotechnol. 2004 Mar; 22(3):326-30.
[Nat Biotechnol. 2004]J Biol Chem. 2003 Nov 7; 278(45):44312-9.
[J Biol Chem. 2003]Biochem Biophys Res Commun. 2004 Aug 13; 321(1):247-53.
[Biochem Biophys Res Commun. 2004]Biochem Biophys Res Commun. 2004 May 21; 318(1):303-10.
[Biochem Biophys Res Commun. 2004]Genes Dev. 1999 Dec 15; 13(24):3191-7.
[Genes Dev. 1999]Methods. 2002 Feb; 26(2):199-213.
[Methods. 2002]J Mol Biol. 1990 Oct 5; 215(3):403-10.
[J Mol Biol. 1990]J Biol Chem. 2003 Nov 7; 278(45):44312-9.
[J Biol Chem. 2003]Nat Biotechnol. 2003 Jun; 21(6):635-7.
[Nat Biotechnol. 2003]Nucleic Acids Res. 2004; 32(3):893-901.
[Nucleic Acids Res. 2004]Proc Natl Acad Sci U S A. 2002 Jul 23; 99(15):9942-7.
[Proc Natl Acad Sci U S A. 2002]RNA. 2001 Nov; 7(11):1522-30.
[RNA. 2001]J Biol Chem. 2003 Nov 7; 278(45):44312-9.
[J Biol Chem. 2003]Nat Biotechnol. 2003 Jun; 21(6):635-7.
[Nat Biotechnol. 2003]Cell. 2003 Oct 17; 115(2):209-16.
[Cell. 2003]Protein Eng. 1997 Jan; 10(1):1-6.
[Protein Eng. 1997]Methods. 2002 Feb; 26(2):199-213.
[Methods. 2002]RNA. 2001 Nov; 7(11):1522-30.
[RNA. 2001]Nat Rev Mol Cell Biol. 2003 Jun; 4(6):457-67.
[Nat Rev Mol Cell Biol. 2003]Cell. 2003 Oct 17; 115(2):209-16.
[Cell. 2003]