![]() | ![]() |
Formats:
|
||||||||||||||||||||||||
Copyright Colinas et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Intergenic and Genic Sequence Lengths Have Opposite Relationships with Respect to Gene Expression 1Department of Biology and IGSP Center for Systems Biology, Duke University, Durham, North Carolina, United States of America 2Department of Statistical Sciences, Duke University, Durham, North Carolina, United States of America 3Department of Civil & Environmental Engineering & Geodetic Science, Ohio State University, Columbus, Ohio, United States of America 4Hollywood, Florida, United States of America Juan Valcarcel, Editor Centre de Regulació Genòmica, Spain * E-mail: philip.benfey/at/duke.edu Received April 21, 2008; Accepted October 11, 2008. This article has been cited by other articles in PMC.Abstract Eukaryotic genomes are mostly composed of noncoding DNA whose role is still poorly understood. Studies in several organisms have shown correlations between the length of the intergenic and genic sequences of a gene and the expression of its corresponding mRNA transcript. Some studies have found a positive relationship between intergenic sequence length and expression diversity between tissues, and concluded that genes under greater regulatory control require more regulatory information in their intergenic sequences. Other reports found a negative relationship between expression level and gene length and the interpretation was that there is selection pressure for highly expressed genes to remain small. However, a correlation between gene sequence length and expression diversity, opposite to that observed for intergenic sequences, has also been reported, and to date there is no testable explanation for this observation. To shed light on these varied and sometimes conflicting results, we performed a thorough study of the relationships between sequence length and gene expression using cell-type (tissue) specific microarray data in Arabidopsis thaliana. We measured median gene expression across tissues (expression level), expression variability between tissues (expression pattern uniformity), and expression variability between replicates (expression noise). We found that intergenic (upstream and downstream) and genic (coding and noncoding) sequences have generally opposite relationships with respect to expression, whether it is tissue variability, median, or expression noise. To explain these results we propose a model, in which the lengths of the intergenic and genic sequences have opposite effects on the ability of the transcribed region of the gene to be epigenetically regulated for differential expression. These findings could shed light on the role and influence of noncoding sequences on gene expression. Introduction ‘Noncoding DNA’ can be found both surrounding genes, and within genes (see schematic Figure 1
Over the past few years, studies in both plants and animals have shown correlations between the length of the intergenic and genic sequences of a gene and the expression of its corresponding mRNA transcript. Some studies have focused on the relationship between intergenic sequence length and diversity of expression level across tissues, and found that it was positive [4]–[6]. The general interpretation, based on the current understanding of gene regulation, is that genes under greater regulatory control require more regulatory information, resulting in a longer upstream intergenic sequence [4], [6]. However this interpretation does not explain why a positive relationship is also observed for downstream intergenic sequences [6], if, as a recent study with 61 Arabidopsis thaliana transcription factor genes suggests, the downstream intergenic sequence is generally not required to drive the appropriate gene expression pattern [7]. Other studies have focused on gene rather than intergenic length, and found that there is a negative relationship between gene length and expression level. This was usually interpreted as a sign of selection pressure for highly expressed genes to remain small [8]–[10]. Other explanations have been proposed, such as “transcriptional interference”, whereby highly expressed genes would tend to be more distant from adjacent genes such that their transcription is not hindered by that of their neighbors [11]. Moreover, reports have shown that there is also a correlation between gene length and expression pattern [6], [12]. The reason for this remains unclear, since it cannot be explained easily by our current understanding of gene expression regulation [6], [12]. In any case, because different studies use different datasets and measures of gene expression, it is difficult to draw from them a clear picture of the relationships between sequence length and gene expression. Here, we sought to study the relationships between sequence length and mRNA expression of protein-coding genes, thoroughly and without a priori hypotheses. Using nonparametric smoothing regression we studied the relationships between intergenic, genic coding and genic noncoding sequence length, and three different aspects of gene expression across tissues. For our expression data, we made use of tissue-specific global gene expression data of high resolution from the root of the plant A. thaliana [7], [13]–[15]. To ‘measure’ gene expression, we sought to capture measures of gene expression that would relate not only to “level” and “breadth” (or “diversity”) of expression across tissues, but also to biological noise (i.e. random variation) which has been shown in recent years to be an important component of gene expression [16], [17]. For expression level we used the median expression across tissues; for the expression breadth or pattern, the variability between tissues; and for noise, the variability between biological replicates. We found that intergenic and genic sequences have opposite relationships with respect to both expression variability and expression level, and that this does not hold for coding sequences when considered individually. Moreover, we found similar results for expression variability and for noise. Finally, categories of genes expressed with greater variability generally have longer intergenic sequences and shorter gene noncoding sequences, but not as much difference is observed for the coding sequence. To explain these results we propose a model, in which the length of intergenic and genic sequences have opposite effects on the ability of a gene to be epigenetically regulated for differential expression. Results With the goal of precisely identifying genome-wide interdependencies between the length of the sequences associated with a gene and the expression of its mRNA transcript, we used the genome sequence of A. thaliana and gene expression data derived from microarray experiments. Before comparing sequence lengths and gene expression, we first determined if interdependencies existed between the different sequences themselves, and between the measures of transcript expression that we used. Interdependencies between genetic sequences We first separated the sequences associated with a gene into four entities: 1) upstream intergenic, 2) downstream intergenic, 3) coding (the sum of all the coding regions) and 4) gene noncoding (introns and untranslated regions (UTRs)) (Figure 1 We found a positive relationship between the upstream and the downstream intergenic sequence lengths (Figure 2A
Expression level, variability between tissues and noise are inter-dependent We chose to consider the three aspects of gene expression that have generally been considered by others: 1) overall expression level in a group of tissues (e.g. [8]–[10]); 2) some measure of the unevenness of the expression pattern across tissues [4]–[6]; and 3) expression noise [16], [17] which has not been previously studied in the context of relationships with sequence lengths. To obtain values for these three metrics, we used tissue-specific genome-wide gene expression data from different tissues of the root of A. thaliana [7], [13]–[15]. Comparisons of these data to other experimental results suggest that they provide a reliable estimate of the expression pattern of the mRNA [7], [13]–[15]. For level we used the median across-tissue expression; for the expression pattern we used the variability of the expression across tissues (a more variable expression is likely to represent a more tissue-specific expression pattern); and for noise we used the variability between biological replicates, i.e. between groups of plants grown independently (see Methods for precise definitions). This noise measure should include noise from both technical and biological origin. Because we do not expect technical noise to be related to sequence length (especially in the case of intergenic sequence length, which is unrelated to the microarray method of assaying mRNA level), any relationship between noise and length should presumably arise from the biological noise component. We investigated the relationships between our three measures of expression. Tissue variability and noise had a generally negative but non-linear relationship with median expression, but it was much stronger for noise than for tissue variability (Figure 3A, B
Intergenic and genic sequence lengths have opposite relationships with respect to expression We then studied the relationships between sequence length and mRNA expression for the 11,725 genes for which both types of data were available (see Methods). For variability between tissues, we found that more variable expression was associated with longer intergenic sequences [4]–[6] (Figure 4A, B
These relationships should be viewed in light of the inter-dependencies between the variables themselves. First, we found that the gene coding and noncoding sequences have similar relationships to expression, although they are weaker for the coding sequence (Figure 4 Overall therefore, we conclude that intergenic sequences (upstream or downstream) globally have relationships with gene expression that are the opposite of those of the genic sequences (noncoding or coding) (Figure 4 Individual coding and gene noncoding sequences have opposite relationships with variability and level We then examined the individual components of the genic sequences, i.e. the individual coding regions, introns, 5′ and 3′ UTRs, limiting ourselves to the first four introns or coding regions, and to tissue variability and median expression. For tissue variability, all components had very weak relationships to variability. However, although weak, the relationships were negative for the noncoding components (introns and UTRs), and positive for the coding components (Figure 5A
Therefore the individual components of the gene noncoding sequence have similar relationships with expression as the total gene noncoding sequence does. However this is not the case for the gene coding sequence. This argues for the possibility that the relationships of the total coding sequence with expression are indirect and caused by its strong correlation to the total gene noncoding sequence. The trends hold for individual examples The results presented so far deal with trends over the scale of 10,000 genes. We asked if these trends can also be seen when looking at individual genes. For instance, can we observe that individual genes with higher noise have longer intergenic sequences and shorter genic sequences? To address this question, we randomly selected sets of ten genes with low or high noise, tissue variability, and median expression, and compared the intergenic and gene sequence lengths in the low and high sets. As expected from the relationships in Figure 4
Discussion Our aim was to perform a thorough and objective study of the relationships between the mRNA expression of genes and the length of their associated noncoding and coding sequences. For this we used tissue-specific microarray data from the A. thaliana root, and the A. thaliana genome annotation. We considered three aspects of gene expression: tissue variability, expression noise, and median expression between tissues. We first divided sequences between intergenic upstream and downstream and genic coding and noncoding (Figure 1 Comparison to previous reports Previous studies have examined some of the relationships between sequence lengths and gene expression, and their results are generally consistent with ours. Expression variability between tissues: we found a positive relationship with intergenic sequences. In agreement with this, a positive relationship between intergenic sequence length and expression ‘complexity’ has been reported in Drosophila melanogaster and Caenorhabditis elegans [4]; human housekeeping genes (generally widely expressed) were found to have shorter intergenic sequences [5]; and finally a positive relationship was found between intergenic sequence length and the breadth of response (number of experiments with differential gene expression) in a recent A. thaliana study [6]. The negative relationship between variability and genic sequence length is also similar to that reported in another A. thaliana study [6]. Also in accordance with our findings, human genes expressed in all tissues (i.e., with low variability) were found to have shorter introns, UTRs and coding regions [19]. Median expression: we found that the relationships with median expression were not linear (Figure 4 Contribution of this work Even though the relationships between sequence lengths and gene expression have been studied in the past by various authors at different depths, here we employed a methodology which differs from previous studies in a number of ways, including: 1) Analyzing all the sequence lengths associated with a gene, i.e. both intergenic and genic; 2) Analyzing, from a single dataset, measures relating to three metrics of gene expression: level (median across tissues), expression diversity across tissues (variability between tissues) and noise (variability between biological replicates). As noted above, the relationships between noise and sequence length had not been studied previously; 3) Making explicit the possible inter-dependencies between the different variables; 4) Using contour plots and smooth regression instead of linear regression or ‘binned data’, i.e. observation methods which do no assume a specific shape for the relationships. This approach allowed us to make several new observations: 1) Intergenic and genic noncoding sequences globally have opposite relationships with all aspects of gene expression studied, variability between tissues, noise, and level; 2) Expression noise has relationships to sequence lengths that are similar to those of expression variability, and in general these relationships are even stronger than those with variability; 3) The measures for expression noise, expression level and variability that we used here seem to be deeply interconnected with each other, since their inter-interdependencies are in agreement with their relationships with expression. Even though the genome-wide relationships that we report here are relatively weak, we believe that their biological significance should be taken seriously for the following reasons. First, as noted above, some of our results have been previously reported in other studies, which used other types of datasets and other analysis methods, and animal rather than plant data. Therefore these relationships seem to hold both in plants and animals, strongly arguing for their biological significance. Second, although one should not over-interpret the direction or shape of the local linear regressions since the contour plots show that the data are not uniformly distributed, it is clear that intergenic and genic sequences have opposite relationships with expression, which can be seen even with the contour plots, and it is difficult to imagine how such opposite relationships could arise spuriously. Proposed explanation for relationships, based on epigenetic regulation of gene expression These relationships are not explained by our current understanding of the role of genic noncoding sequences in regulating or influencing gene expression. Taken in isolation, it would appear straightforward to explain the positive relationship between the upstream intergenic sequence length and expression variability. Indeed, since it is known that these sequences harbor regulatory elements important for gene expression, it would be plausible that longer upstream intergenic sequences have a greater potential for harboring a larger number of such regulatory elements, and are therefore able to drive more elaborate expression patterns. This is the explanation proposed by other authors [4], [6]. However, this explanation cannot explain why the length of the gene noncoding sequence also has a relationship with respect to expression variability, and that this relationship is negative (Figure 4 It has been suggested earlier that ‘chromosome organization’ could be the source of the relationships between sequence lengths and gene expression [12]. What exactly could be the nature of this organization? It is known that the transcription of a gene can be induced upon re-localization within the nucleus [21], and that cellular differentiation is associated with restriction of chromatin movement on the nuclear matrix [22]. Using our observation that intergenic and genic sequences have opposite relationships with expression, we suggest that intergenic and gene noncoding sequences could have opposite effects on these re-localization and restriction activities, because intergenic sequences are ‘outside’, while gene noncoding sequences are ‘inside’ the transcribed region. It could be that secure attachment of the gene region via the intergenic sequences is required to better control the gene region and to send it to an area of the nucleus where there can be high transcription upon induction. Long gene noncoding sequences could somehow prevent this re-localization, perhaps by keeping the transcribed region securely attached to the matrix, thus lowering the possible variability of expression. Variability of expression of a gene would therefore mostly depend on its ability to be re-localized to a different region of the nucleus upon induction, and this capability would be independent from the overall level at which it can be expressed when it is not under epigenetic regulation. Perhaps the coding sequence does not have much influence on these processes, as suggested by the fact that individual coding sequences have weaker relationships with variability and noise than the noncoding sequence does. How could this model explain that noise has similar relationships to sequence lengths as expression variability does, and explain why noise and variability are positively related to each other (Figure 3 This model could be tested experimentally in vivo by modifying the sequence lengths of reporters and studying the effects on chromosome structure using chromosome-structure capturing assays [24]. Effects on variability of gene expression across tissues could be accurately measured using newly developed image recognition methods (as in [25]). Overall, our work indicates that studying the relationships between genomic features and gene expression using large-scale gene expression data could help to better understand the relationships between the genome and gene expression. Methods Genetic annotation data Arabidopsis thaliana genome annotation files available at the TAIR ftp site (ftp://ftp.arabidopsis.org/home/tair/) were parsed with java scripts to obtain genetic sequence components lengths (all but UTRs from: sv_gene_feature.data file, 04/27/06 version; UTRs: TAIR6_3_UTR_20060126 and TAIR6_5_UTR_20060126). The gene noncoding sequence length was calculated by subtracting the coding sequence length from the transcribed region length (because some introns can be in UTRs, this is not necessarily equal to the sum of the lengths of UTRs and introns in a gene, and it gives the accurate value for the ‘total noncoding sequence’ of a gene). Transcript expression data We used microarray data measured in three biological replicates each (except for quiescent center which had two replicates) on the Affymetrix ATH1 GeneChip from the following seven tissues: lateral root cap and epidermis [13]; quiescent center and columella [14]; cortex, xylem and phloem [7]. Microarray data is available at http://www.arexdb.org. Gene expression values were calculated with the MAS5 algorithm (from the Affymetrix software), log(MAS5) values being used. Not shown here, RMA values were also used with slightly different results but not altering our conclusions (see [26]). Let Aij be the log10(MAS5) value in each of 7 tissues, i, and 3 replicates (for all but the quiescent center data, 2 replicates), j. The mean expression in a tissue and the total mean expression are then:
= α+βi+εij for each gene via restricted maximum likelihood [27], where βi's are tissue random-effects with variance , and where εij's have variance σ2. The estimated variance components
Data set construction and analysis Of 28,580 annotated genes in the A. thaliana genome, we retrieved 11,725 with both expression information and annotation (data table available in Table S1). Genes annotated such that the gene length was inferior to the coding sequence length or which did not have an annotation for the coding sequence length were discarded. Only the first listed alternative transcript of each gene was considered. When the adjacent gene overlapped, the intergenic sequence length was put to zero. When a UTR was not annotated, its length was put to 0. For each graph, zero values of sequence length were discarded. Data analysis was performed in R. Contours were obtained from a 2 d kernel density estimate; zero lengths are not shown. The solid red line shows the trend in y axis as a function of x axis obtained via a local linear regression smoother. Dashed red lines show 95% confidence intervals, blue lines show constant mean and axes are scaled to the (0.5%, 0.95%) quantiles of the respective variables. Figure S1 Examples of genetic sequence lengths and expression values for genes with low and high variability and median ~1, median ~3. Genes were ordered by increasing variability (A) or median (B) value, and sets of ten consecutive genes were randomly selected around the ends of the ranges of values seen in Figure 4A–D (1.21 MB EPS) Click here for additional data file.(1.1M, eps) Table S1 Data file used for the analysis. (1.72 MB TXT) Click here for additional data file.(1.6M, txt) Acknowledgments We thank E. Iversen and members of the Benfey lab for help. Footnotes Competing Interests: The authors have declared that no competing interests exist. Funding: This work was supported by grants from the National Science Foundation, Arabidopsis 2010 program. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. References 1. Mattick JS, Makunin IV. Non-coding RNA. Hum Mol Genet. 2006;15 Spec No 1:R17–29. [PubMed] 2. Le Hir H, Nott A, Moore MJ. How introns influence and enhance eukaryotic gene expression. Trends Biochem Sci. 2003;28:215–220. [PubMed] 3. Mignone F, Gissi C, Liuni S, Pesole G. Untranslated regions of mRNAs. Genome Biol. 2002;3:REVIEWS0004. [PubMed] 4. Nelson CE, Hersh BM, Carroll SB. The regulatory content of intergenic DNA shapes genome architecture. Genome Biol. 2004;5:R25. [PubMed] 5. Vinogradov AE. Compactness of human housekeeping genes: selection for economy or genomic design? Trends Genet. 2004;20:248–253. [PubMed] 6. Walther D, Brunnemann R, Selbig J. The regulatory code for transcriptional response diversity and its relation to genome structural properties in A. thaliana. PLos Genetics. 2007;3 7. Lee JY, Colinas J, Wang JY, Mace D, Ohler U, et al. Transcriptional and posttranscriptional regulation of transcription factor expression in Arabidopsis roots. Proc Natl Acad Sci U S A. 2006;103:6055–6060. [PubMed] 8. Seoighe C, Gehring C, Hurst LD. Gametophytic Selection in Arabidopsis thaliana Supports the Selective Model of Intron Length Reduction. PLoS Genet. 2005;1:e13. [PubMed] 9. Urrutia AO, Hurst LD. The signature of selection mediated by expression on human genes. Genome Res. 2003;13:2260–2264. [PubMed] 10. Castillo-Davis CI, Mekhedov SL, Hartl DL, Koonin EV, Kondrashov FA. Selection for short introns in highly expressed genes. Nat Genet. 2002;31:415–418. [PubMed] 11. Chiaromonte F, Miller W, Bouhassira EE. Gene length and proximity to neighbors affect genome-wide expression levels. Genome Res. 2003;13:2602–2608. [PubMed] 12. Vinogradov AE. ‘Genome design’ model and multicellular complexity: golden middle. Nucleic Acids Res. 2006;34:5906–5914. [PubMed] 13. Birnbaum K, Shasha DE, Wang JY, Jung JW, Lambert GM, et al. A gene expression map of the Arabidopsis root. Science. 2003;302:1956–1960. [PubMed] 14. Nawy T, Lee JY, Colinas J, Wang JY, Thongrod SC, et al. Transcriptional profile of the Arabidopsis root quiescent center. Plant Cell. 2005;17:1908–1925. [PubMed] 15. Levesque MP, Vernoux T, Busch W, Cui H, Wang JY, et al. Whole-genome analysis of the SHORT-ROOT developmental pathway in Arabidopsis. PLoS Biol. 2006;4:e143. [PubMed] 16. Newman JR, Ghaemmaghami S, Ihmels J, Breslow DK, Noble M, et al. Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature. 2006;441:840–846. [PubMed] 17. Bar-Even A, Paulsson J, Maheshri N, Carmi M, O'Shea E, et al. Noise in protein expression scales with natural protein abundance. Nat Genet. 2006;38:636–643. [PubMed] 18. Zhang MQ. Statistical features of human exons and their flanking regions. 1998;7:919–932. 19. Eisenberg E, Levanon EY. Human housekeeping genes are compact. Trends Genet. 2003;19:362–365. [PubMed] 20. Ren XY, Vorst O, Fiers MW, Stiekema WJ, Nap JP. In plants, highly expressed genes are the least compact. Trends Genet. 2006;22:528–532. [PubMed] 21. Taddei A, Van Houwe G, Hediger F, Kalck V, Cubizolles F, et al. Nuclear pore association confers optimal expression levels for an inducible yeast gene. Nature. 2006;441:774–778. [PubMed] 22. Gondor A, Ohlsson R. Transcription in the loop. Nat Genet. 2006;38:1229–1230. [PubMed] 23. Gaszner M, Felsenfeld G. Insulators: exploiting transcriptional and epigenetic mechanisms. Nat Rev Genet. 2006;7:703–713. [PubMed] 24. Cai S, Lee CC, Kohwi-Shigematsu T. SATB1 packages densely looped, transcriptionally active chromatin for coordinated expression of cytokine genes. Nat Genet. 2006;38:1278–1288. [PubMed] 25. Mace DL, Lee JY, Twigg RW, Colinas J, Benfey PN, et al. Quantification of transcription factor expression from Arabidopsis images. Bioinformatics. 2006;22:e323–331. [PubMed] 26. Colinas J. Non coding DNA sequences and gene regulation. Durham: Duke University; 2006. p. 147. 27. Searle SR, Casella G, McCulloch CE. Variance components. Hoboken, NJ: Wiley; 2006. p. xxiii, 501. 28. Pinheiro JC, Bates DM. Mixed-effects models in S and S-PLUS. New York: Springer; 2000. p. xvi, 528. |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||
Hum Mol Genet. 2006 Apr 15; 15 Spec No 1():R17-29.
[Hum Mol Genet. 2006]Genome Biol. 2002; 3(3):REVIEWS0004.
[Genome Biol. 2002]Genome Biol. 2004; 5(4):R25.
[Genome Biol. 2004]Proc Natl Acad Sci U S A. 2006 Apr 11; 103(15):6055-60.
[Proc Natl Acad Sci U S A. 2006]PLoS Genet. 2005 Aug; 1(2):e13.
[PLoS Genet. 2005]Nat Genet. 2002 Aug; 31(4):415-8.
[Nat Genet. 2002]Genome Res. 2003 Dec; 13(12):2602-8.
[Genome Res. 2003]Proc Natl Acad Sci U S A. 2006 Apr 11; 103(15):6055-60.
[Proc Natl Acad Sci U S A. 2006]Science. 2003 Dec 12; 302(5652):1956-60.
[Science. 2003]PLoS Biol. 2006 May; 4(5):e143.
[PLoS Biol. 2006]Nature. 2006 Jun 15; 441(7095):840-6.
[Nature. 2006]Nat Genet. 2006 Jun; 38(6):636-43.
[Nat Genet. 2006]PLoS Genet. 2005 Aug; 1(2):e13.
[PLoS Genet. 2005]Nat Genet. 2002 Aug; 31(4):415-8.
[Nat Genet. 2002]Genome Biol. 2004; 5(4):R25.
[Genome Biol. 2004]Nature. 2006 Jun 15; 441(7095):840-6.
[Nature. 2006]Nat Genet. 2006 Jun; 38(6):636-43.
[Nat Genet. 2006]Genome Biol. 2004; 5(4):R25.
[Genome Biol. 2004]Genome Biol. 2004; 5(4):R25.
[Genome Biol. 2004]Trends Genet. 2004 May; 20(5):248-53.
[Trends Genet. 2004]Trends Genet. 2003 Jul; 19(7):362-5.
[Trends Genet. 2003]PLoS Genet. 2005 Aug; 1(2):e13.
[PLoS Genet. 2005]Nat Genet. 2002 Aug; 31(4):415-8.
[Nat Genet. 2002]Genome Biol. 2004; 5(4):R25.
[Genome Biol. 2004]Nucleic Acids Res. 2006; 34(20):5906-14.
[Nucleic Acids Res. 2006]Nature. 2006 Jun 8; 441(7094):774-8.
[Nature. 2006]Nat Genet. 2006 Nov; 38(11):1229-30.
[Nat Genet. 2006]Nat Rev Genet. 2006 Sep; 7(9):703-13.
[Nat Rev Genet. 2006]Nat Genet. 2006 Nov; 38(11):1278-88.
[Nat Genet. 2006]Bioinformatics. 2006 Jul 15; 22(14):e323-31.
[Bioinformatics. 2006]Science. 2003 Dec 12; 302(5652):1956-60.
[Science. 2003]Plant Cell. 2005 Jul; 17(7):1908-25.
[Plant Cell. 2005]Proc Natl Acad Sci U S A. 2006 Apr 11; 103(15):6055-60.
[Proc Natl Acad Sci U S A. 2006]