![]() | ![]() |
Formats:
|
||||||||||
Copyright © 2008 RNA Society Global analysis of mRNA splicing Department of Systems Biology, Harvard Medical School, Boston, Massachusetts 02115, USA
Reprint requests to: Pamela A. Silver, Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA; e-mail: pamela_silver/at/hms.harvard.edu; fax: (617) 432-5012. This article has been cited by other articles in PMC.Abstract Alternative mRNA splicing is a rich source of transcript diversity in eukaryotic cells with broad roles in development and disease. Systems-wide experimental methods have started to define how global splicing regulation shapes complex biological properties and pathways. Here, we review these approaches, describe recent insights they have yielded, and discuss avenues of future investigation. Keywords: mRNA splicing, systems biology, genome-wide, alternative splicing regulation, microarrays, RNA network INTRODUCTION Alternative splicing (AS) vastly augments the coding potential of complex genomes by allowing single genetic loci to encode multiple transcripts with different protein coding sequences and RNA regulatory elements. Roughly 75% of human genes are thought to encode two or more splice isoforms, with striking variation across tissue types and developmental stages (Johnson et al. 2003). Moreover, many human diseases arise from deficient splicing of crucial transcripts, or are marked by the appearance of aberrant splice isoforms in affected tissues (Wang and Cooper 2007). The mechanism and regulation of splicing have been informed mainly by genetic and biochemical studies in model organisms and human cells focusing on individual factors and events (Black 2003). Collectively these studies suggest that the use and/or suppression of splice sites are determined by combinatorial binding of RNA-binding proteins (RNABPs) to nearby regulatory elements. Deciphering this “splicing code” has proven a major experimental and bioinformatic challenge, which is further complicated by the many post-transcriptional and post-translational mechanisms that regulate RNABP expression and cellular localization (Fig. 1
EXPRESSION PROFILING Microarrays permit the simultaneous measurement of all nucleic acids present in a biological sample, subject only to the technical constraint of how many sequences can be represented on a single chip (Stoughton 2005). Traditional expression profiling arrays are spotted with expressed sequence tag (EST)-derived cDNAs or 3′-clustered oligonucleotide sequences that probe total transcript abundance, but cannot distinguish between different splice isoforms (Fig. 2
SPLICE JUNCTION MICROARRAYS “Splice junction” arrays bear oligonucleotide probes spanning annotated exon–exon junctions exclusive to individual splice isoforms (Johnson et al. 2003). Junction probes are position-constrained and thus subject to noise arising from nonideal sequence content. Thus, most versions of these arrays include probes within constitutive exons so that transcripts can first be scored as present or absent from the sample, then assessed for AS based on junction probes. Splice junction arrays have proven valuable in functional classification of RNA processing mutants in yeast (Clark et al. 2002; Burckin et al. 2005) and definition of tissue-specific splicing patterns and regulatory motifs in mammalian cells (Johnson et al. 2003; Pan et al. 2004; Sugnet et al. 2006; Fagnani et al. 2007). The latter studies have produced the recurring observation that transcripts regulated at the level of alternative splicing comprise a distinct population from those regulated at the level of transcript abundance. This finding suggests that regulatory mechanisms governing transcript diversity and transcript abundance can evolve separately to influence distinct properties of a given cell type. Several studies have extended these observations to contexts of differentiation and disease. Junction profiling following T-cell activation identified many transcripts showing changes in total abundance, with functional enrichment for immune defense and cytoskeleton dynamics; however, transcripts undergoing changes in AS were enriched for cell-cycle functions (Ip et al. 2007). Similarly, many transcripts in prostate and breast cancer cells showed altered splice isoform levels relative to normal tissue without significant changes in total transcript abundance, notably the mRNA for cell adhesion molecule and metastatic effector CD44 (Li et al. 2006a,b). In addition, global patterns of splice isoform expression in various Hodgkin lymphoma tumors clustered according to how far the tumors had progressed (Relógio et al. 2005). These and related studies suggest that splice variant profiling may augment existing tools for tumor diagnosis. Splice junction profiling upon removal of individual factors has revealed functionally coherent transcript networks controlled at the level of splicing. Individual knockdown of four broad-specificity RNABPs in Drosophila cells identified distinct sets of AS events regulated by each factor (Blanchette et al. 2005). In mouse, a subset of brain-specific exons regulated by the neuronal RNABP NOVA2 was discovered by comparing brain-derived transcript profiles of NOVA2-knockout to wild-type animals. Transcripts containing these exons encoded many synapse-specific factors, illuminating a clear biological function for this splicing network in synapse formation (Ule et al. 2005). Knockdown of splicing regulator PTBP1 and/or its neuronal paralog nPTB in mouse neuroblastoma cells identified exon sets that are differentially regulated by these proteins (Boutz et al. 2007). Since neuronal precursors express PTBP1, while differentiated neurons express nPTB, these data identified a “post-transcriptional switch” that reprograms AS during neuronal development. Several studies have probed connections between AS and nonsense-mediated decay (AS–NMD) in light of estimates that some 35% of annotated alternative splicing events introduce premature termination codons (PTCs) (Green et al. 2003). Surprisingly, blocking NMD identified few changes in splice isoform levels in junction array profiles, suggesting that AS–NMD is confined to a small subset of AS events (Pan et al. 2006; Ni et al. 2007). However, transcripts encoding splicing regulators were highly enriched for AS–NMD regulation, with the cassette exons frequently overlapping “ultraconserved” DNA regions (Ni et al. 2007). These observations suggest a rigorously conserved post-transcriptional mechanism for controlling levels of key splicing regulators in mammals, with potential cross- and auto-regulatory organization. EXON ARRAYS An alternative array strategy uses “exon-centric” probe (Fig. 2 TILED GENOMIC ARRAYS Tiled oligonucleotide arrays spanning whole chromosomes or genomes provide comprehensive coverage and obviate the need for prior knowledge of exon coordinates. This probe topography is advantageous because it can detect patterns of alternative splicing, such as altered splice site use, intron retention, or use of nonannotated exons, which are difficult or impossible to detect on platforms limited to a defined set of annotated exons. The added expense and substantial computational challenges of analyzing tiled array data sets outweigh potential benefits in many experimental applications. However, some initial profiling efforts suggest tiled arrays can reveal transcript architecture that sequencing efforts and other microarray platforms have missed. Analysis of the Saccharomyces cerevisiae transcriptome on tiled arrays identified previously non-annotated introns and examples of regulated intron retention (Juneau et al. 2007; Zhang et al. 2007). A survey of human transcripts from 12 tissues on tiled ENCODE arrays spanning roughly 1% of the human genome found inclusion of non-annotated sequences for an astonishing 81.5% of the 399 protein-coding transcripts represented on the array, with 20% of novel variants having altered open reading frames (Denoeud et al. 2007). Use of distal 5′ transcription starts, extension of annotated exons, and appearance of novel exons in annotated introns contributed significantly to this tally, and a large degree of tissue-specific transcript architecture was observed. The biological significance of these observations is unclear, but these experiments highlight that our knowledge of the exon landscape of the human genome is far from complete. Since whole-genome tiling arrays for human and mouse are now available, comprehensive surveys of transcript architecture will address the ongoing need for a complete catalog of transcript diversity. OTHER PROFILING TECHNIQUES Coupling microarray analysis to molecular techniques such as chromatin and RNA immunoprecipitation (ChIP-chip and RIP-chip, respectively) can identify populations of genes and transcripts regulated by individual factors. ChIP-chip can provide a genome binding profile for processing factors that are recruited to mRNAs during transcription, thus allowing target identification (Fig. 3
RIP-chip has been used to identify targets of RNABPs in several systems (Fig. 3 Application of a related technique termed CLIP (cross-linking and immunoprecipitation) to the neuronal splicing factor Nova identified a network of targets encoding synaptic and neural inhibitory factors in mouse brain (Ule et al. 2003). CLIP is a modified RIP-chip protocol in which RNP complexes are cross-linked in vivo by UV exposure. This strategy allows stringent immunopurification ensuring that copurified RNAs likely represent directly bound targets. Since traditional RIP-chip requires gentle conditions to preserve native RNP complexes, it cannot unambiguously determine direct RNA–protein interactions. In addition, there is some evidence that RNP complexes can reorganize following cell lysis (Mili and Steitz 2004). These caveats for RIP-chip analysis bear consideration but have proven manageable with sufficient validation and quality control measures. A second difference between CLIP and RIP-chip is that targets were identified by sequencing rather than microarray analysis. This approach foreshadowed the more recent resurgence of high-throughput sequencing (HTS) technologies as an alternative to microarrays for massively parallel measurements of nucleic acids (Kim et al. 2007; Mardis 2007; Mikkelsen et al. 2007). HTS has the benefit of unbiased sequence determination and avoids problems with signal-to-noise and bad probes that are inescapable drawbacks of microarray hybridization. HTS platforms generate short sequence reads (~30 bp). Since human exons average roughly 200 bp, high read numbers would be required to sufficiently profile exon boundaries and distinguish different splice isoforms levels. To date, the suitability and cost of HTS for AS analysis relative to microarrays have not been examined. COMBINED COMPUTATIONAL AND EXPERIMENTAL APPROACHES A number of recent studies have combined experimental and bioinformatic methods to yield incisive observations into global splicing regulation. Screening a random library of decanucleotides for splicing regulatory activity identified many novel exonic splicing silencing (ESS) motifs (Wang et al. 2004). Bioinformatic searches for these ESSs in endogenous human genes revealed positive enrichment in alternatively spliced exons and pseudoexons, and negative enrichment in constitutively spliced exons. Moreover, ESSs correlated to splice site “strength”: Constitutive exons with weak consensus splice sites contained fewer ESSs, which would presumably induce unwanted exon skipping. Subsequent work showed that the presence of ESSs inhibited the use of “decoy” 5′ and 3′ splice sites within or near regulated exons (Wang et al. 2006). In contrast, exonic splicing enhancers (ESEs) had either the opposite effect or no impact on decoy splice site use. In accordance with an endogenous role in regulating alternative splice site choice, ESSs were statistically enriched in alternatively spliced regions of regulated exons. Other integrated approaches have focused on deciphering networks regulated by specific splicing factors. Ule and colleagues developed a comprehensive “RNA map” of NOVA binding sites proximal to regulated exons in mouse brain (Ule et al. 2006). This map not only correlated well with previous genome-scale assessments of NOVA-dependent splicing (Ule et al. 2003, 2005), but allowed highly accurate prediction of NOVA-dependent regulation for many previously unidentified targets. Several novel binding sites for Drosophila splicing regulator SXL were identified by a genome-wide computational search, and sex-specific, regulated splicing was experimentally verified in some instances (Robida et al. 2007). Since many splicing regulators bind highly degenerate RNA sequences or sequences that appear very frequently in the transcriptome (Singh and Valcárcel 2005), de novo searches in this mold have met with variable success. However, in vitro selection techniques such as SELEX have produced refined characterizations of RNA regulatory motifs for several splicing regulators, which may aid related efforts (Fairbrother et al. 2002; Hui et al. 2005; Paradis et al. 2007). Bioinformatic strategies identified significant overlap between “ultraconserved” DNA elements and alternative exons containing PTCs within genes for splicing regulators (Lareau et al. 2007). These findings revealed the highly conserved employment of AS–NMD as a means to regulate cellular levels of splicing regulators, as independently deduced from microarray analysis (Ni et al. 2007). Broadly, these studies demonstrate that skilled application of experimental techniques and computational analysis can reveal features of the “splicing code”. As more genome-wide data sets become available, bioinformatics analysis will improve predictive power in the analysis of splicing regulation. SUMMARY AND FUTURE DIRECTIONS Systems-wide approaches have yielded novel insights into mRNA splicing as a means of biological regulation, engendering several salient themes. First, the regulation of transcript abundance and transcript diversity are distinct processes that can influence different biological properties in the same cell. Accordingly these interconnected modes of regulation encompass distinct, though often overlapping, transcript populations. Second, patterns of alternative splicing comprise a reproducible signature for specific cell types, consistent with a crucial role in splicing regulation in determining cell identity. These patterns are dynamic and show extensive reprogramming during differentiation and disease, and in response to intra- and extra-cellular signals. Finally, functionally related transcripts can be coregulated in splicing networks to shape specific biological functions. Though recent studies have defined fundamental themes in splicing regulation, many questions, especially relating to the roles of splicing in disease and development, remain unanswered. Global insights into mRNA splicing will continue to benefit from the publication and rigorous mining of genome-wide expression and ChIP/RIP binding profiles, which can form an empirical basis for testable, predictive models of splicing regulation. Large-scale proteomic profiling upon knockdown of PTB paralogs recently identified novel targets of those splicing regulators (Spellman et al. 2007). The success of this approach suggests that proteomic analysis, though assessing a downstream readout of alternative splicing, is a promising means to identify regulated splicing. Finally, the increasing availability of high-throughput RNAi and small molecule-based screening platforms opens the door to systematic screening for splicing regulators in mammalian cells. The current view, supported by the work reviewed here, is that cell- and context-specific splicing patterns are a composite output of the RNA-encoded “splicing code”, the differential binding specificities of RNABPs, and the myriad regulatory mechanisms that control RNABP expression and localization (Matlin et al. 2005). Continued advances along the diverse and complementary experimental avenues described here hold promise in further illuminating each of these crucial areas. ACKNOWLEDGMENTS We thank Jessica Hurt, Natalie Gilks-Farny, and Ian Swinburne for critiques of the manuscript. M.J.M. is supported by a National Science Foundation Graduate Fellowship and grants from the National Institutes of Health to P.A.S. Footnotes Article published online ahead of print. Article and publication date are at http://www.rnajournal.org/cgi/doi/10.1261/rna.868008. REFERENCES
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||
Science. 2003 Dec 19; 302(5653):2141-4.
[Science. 2003]Nat Rev Genet. 2007 Oct; 8(10):749-61.
[Nat Rev Genet. 2007]Annu Rev Biochem. 2003; 72():291-336.
[Annu Rev Biochem. 2003]Nature. 2002 Apr 4; 416(6880):499-506.
[Nature. 2002]Nat Rev Mol Cell Biol. 2005 May; 6(5):386-98.
[Nat Rev Mol Cell Biol. 2005]Cell. 2006 Jul 14; 126(1):37-47.
[Cell. 2006]Nat Rev Genet. 2007 Jul; 8(7):533-43.
[Nat Rev Genet. 2007]Annu Rev Biochem. 2005; 74():53-82.
[Annu Rev Biochem. 2005]Science. 2003 Dec 19; 302(5653):2141-4.
[Science. 2003]Science. 2002 May 3; 296(5569):907-10.
[Science. 2002]Nat Struct Mol Biol. 2005 Feb; 12(2):175-82.
[Nat Struct Mol Biol. 2005]Mol Cell. 2004 Dec 22; 16(6):929-41.
[Mol Cell. 2004]PLoS Comput Biol. 2006 Jan; 2(1):e4.
[PLoS Comput Biol. 2006]Genes Dev. 2005 Jun 1; 19(11):1306-14.
[Genes Dev. 2005]Nat Genet. 2005 Aug; 37(8):844-52.
[Nat Genet. 2005]Genes Dev. 2007 Jul 1; 21(13):1636-52.
[Genes Dev. 2007]Bioinformatics. 2003; 19 Suppl 1():i118-21.
[Bioinformatics. 2003]Genes Dev. 2006 Jan 15; 20(2):153-8.
[Genes Dev. 2006]Genes Dev. 2007 Mar 15; 21(6):708-18.
[Genes Dev. 2007]PLoS One. 2006 Dec 20; 1():e88.
[PLoS One. 2006]Genome Biol. 2007; 8(4):R64.
[Genome Biol. 2007]Genome Biol. 2007; 8(8):R159.
[Genome Biol. 2007]Cancer Res. 2007 Jun 15; 67(12):5635-42.
[Cancer Res. 2007]BMC Genomics. 2006 Dec 27; 7():325.
[BMC Genomics. 2006]Proc Natl Acad Sci U S A. 2007 Jan 30; 104(5):1522-7.
[Proc Natl Acad Sci U S A. 2007]Genome Res. 2007 Apr; 17(4):503-9.
[Genome Res. 2007]Genome Res. 2007 Jun; 17(6):746-59.
[Genome Res. 2007]Mol Cell. 2006 Dec 28; 24(6):917-29.
[Mol Cell. 2006]Mol Cell. 2006 Dec 28; 24(6):903-15.
[Mol Cell. 2006]Genome Res. 2006 Jul; 16(7):912-21.
[Genome Res. 2006]Nat Protoc. 2006; 1(1):302-7.
[Nat Protoc. 2006]Genome Biol. 2006; 7(11):R113.
[Genome Biol. 2006]Mol Cell. 2004 Jun 18; 14(6):775-86.
[Mol Cell. 2004]Science. 2003 Nov 14; 302(5648):1212-5.
[Science. 2003]RNA. 2004 Nov; 10(11):1692-4.
[RNA. 2004]Science. 2007 Jun 8; 316(5830):1481-4.
[Science. 2007]Nat Methods. 2007 Aug; 4(8):613-4.
[Nat Methods. 2007]Nature. 2007 Aug 2; 448(7153):553-60.
[Nature. 2007]Cell. 2004 Dec 17; 119(6):831-45.
[Cell. 2004]Mol Cell. 2006 Jul 7; 23(1):61-70.
[Mol Cell. 2006]Nature. 2006 Nov 30; 444(7119):580-6.
[Nature. 2006]Science. 2003 Nov 14; 302(5648):1212-5.
[Science. 2003]Nat Genet. 2005 Aug; 37(8):844-52.
[Nat Genet. 2005]PLoS One. 2007 Feb 28; 2(2):e250.
[PLoS One. 2007]Nat Struct Mol Biol. 2005 Aug; 12(8):645-53.
[Nat Struct Mol Biol. 2005]Mol Cell. 2007 Aug 3; 27(3):420-34.
[Mol Cell. 2007]Nat Rev Mol Cell Biol. 2005 May; 6(5):386-98.
[Nat Rev Mol Cell Biol. 2005]