![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2008 The Author(s) MicroRNA prediction with a novel ranking algorithm based on random walks 1Department of Computer Science and Engineering and 2Department of Genetics, Washington University, Saint Louis, MO, 63130, USA *To whom correspondence should be addressed. †The authors wish to be known that, in their opinion, the first two authors should be regarded as joint First Authors. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract MicroRNA (miRNAs) play essential roles in post-transcriptional gene regulation in animals and plants. Several existing computational approaches have been developed to complement experimental methods in discovery of miRNAs that express restrictively in specific environmental conditions or cell types. These computational methods require a sufficient number of characterized miRNAs as training samples, and rely on genome annotation to reduce the number of predicted putative miRNAs. However, most sequenced genomes have not been well annotated and many of them have a very few experimentally characterized miRNAs. As a result, the existing methods are not effective or even feasible for identifying miRNAs in these genomes. Aiming at identifying miRNAs from genomes with a few known miRNA and/or little annotation, we propose and develop a novel miRNA prediction method, miRank, based on our new random walks- based ranking algorithm. We first tested our method on Homo sapiens genome; using a very few known human miRNAs as samples, our method achieved a prediction accuracy greater than 95%. We then applied our method to predict 200 miRNAs in Anopheles gambiae, which is the most important vector of malaria in Africa. Our further study showed that 78 out of the 200 putative miRNA precursors encode mature miRNAs that are conserved in at least one other animal species. These conserved putative miRNAs are good candidates for further experimental study to understand malaria infection. Availability: MiRank is programmed in Matlab on Windows platform. The source code is available upon request. Contact: zhang/at/cse.wustl.edu 1 INTRODUCTION MicroRNAs (miRNAs) are endogenous single-stranded non-coding RNAs of ~22 nt in length. They are derived from long precursors that fold into hairpin structures (Bartel, 2004; Jones-Rhoades et al., 2006). miRNAs have been shown to play fundamentally important roles in animal and plant development (Bartel, 2004; Jones-Rhoades et al., 2006, in stress response in plants (Bari et al., 2006; Chiou et al., 2006; Jones-Rhoades and Bartel, 2004; Lu et al., 2005; Sunkar and Zhu, 2004; Zhou et al., 2007), and in genetic diseases including various types of cancer (Blenkiron et al., 2007; Chen et al., 2007; He et al., 2007; Hobert, 2007; Ma et al., 2007; Ozen et al., 2008; Pedersen et al., 2007; Tili et al., 2007). In animals, most miRNAs bind to 3′ untranslated regions of their target mRNAs and repress the translation of the targets (Bartel, 2004). In contrast, in plants, most mature miRNAs directly base-pair with complementary sites in the coding regions of target mRNAs, resulting in cleavage or degradation of the targets (Bartel, 2004; Jones-Rhoades et al., 2006). Identification of novel miRNA genes is an eminent and challenging problem towards the understanding of post-transcriptional gene regulation. Two major strategies for identifying novel miRNAs are experimental cloning and in silico prediction (Bartel, 2004; Berezikov et al., 2006; Jones-Rhoades et al., 2006). In a cloning-based approach, distinct ~22 nt RNA transcripts were first isolated and then intensively cloned and sequenced. However, these methods are highly biased towards abundantly and/or ubiquitously expressed miRNAs; only abundant miRNA genes can be easily detected (Bartel, 2004; Berezikov et al., 2006; Jones-Rhoades et al., 2006). Note that not all miRNAs are well expressed in many tissues, cell types and developmental stages that have been tested (Bartel, 2004). Some miRNAs are expressed constitutively at low abundance, and some miRNAs have preferential or specific temporal and spatial expression patterns. Computational approaches have been developed to overcome some of the technical difficulties of experimental methods. First, in silico methods have been shown to be efficient for finding miRNAs that are expressed at constitutively low levels or in specific tissues (Bartel, 2004). Further, breakdown products of mRNA transcripts, other endogenous ncRNAs (e.g. tRNAs, rRNAs, nat-siRNAs) as well as exogenous siRNAs comprise a large portion of the non-coding small RNA population isolated from the cytoplasmic total RNA extracts (Bartel 2004; Berezikov et al. 2006; Jones-Rhoades et al. 2006; Lagos-Quintana et al., 2001). To avoid erroneously designation of other non-coding small RNAs and even broken mRNA fragments as novel miRNAs, the secondary structures of flanking genomic sequences of cloned small RNAs are further assessed computationally. Only those small RNA sequences reside on arms of hairpin-structured precursors are likely to be miRNAs. However, due to their short length, cloned small RNAs may match to many genome regions that can potentially fold into hairpin structures which may not be unique to miRNAs. Thus, genome-wide screening of novel miRNA precursors is technically involved. Based on particular strategies adopted, the existing computational methods for miRNA prediction can be grouped into two categories. Methods in the first category were developed to find homologous miRNAs in closely related species. In these methods, known miRNA precursors were first folded into typical hairpin structures, local features in the hairpins were extracted, and extreme values of these featured were obtained from all known miRNAs. A filter was then constructed to screen novel hairpinned sequences. Those hairpinned sequences that passed the filter were further analyzed in related species to assess their evolutionary conservation (Berezikov et al., 2005; Bonnet et al., 2004; Grad et al., 2003; Lai et al., 2003). As they were designed for, these conservation-based methods built upon this framework were successful in detecting evolutionarilly conserved miRNAs. In order to further identify distantly homologous miRNAs, a probabilistic co-learning method using a paired hidden Markov model (HMM) was recently developed as a more general miRNA prediction method (Nam et al., 2005). All these methods relied mainly on evolutionary conservation to eliminate a large number of false positive predictions. However, a substantial number of lineage- or species-specific miRNA genes do exist (Fahlgren et al., 2007; Kasschau et al., 2007; Lindow and Krogh, 2005; Molnar et al., 2007; Rajagopalan et al., 2006; Zhang et al., 2006), which escape the detection of a conservation-based approach. In the second category, supervised classification methods, e.g. Support Vector Machines (SVMs), were adopted to train models based on positive sets of genuine miRNA precursors and negative sets of hairpins obtained from exon regions of protein coding genes (Hertel and Stadler, 2006; Ng and Mishra, 2007; Xue et al., 2005). Thanks to the sequence features extracted from the training examples, such classification models were expected to preform well in predicting novel miRNAs from unseen sequences. However, many sequenced genomes are poorly annotated. Hence, it is difficult to obtain a good set of negative training samples for these methods. Moreover, miRNAs are located in intergenic regions and introns. A major task for miRNA prediction is to distinguish miRNA hairpins from other hairpin structures originated from intergenic or intronic sequences. In summary, classification-based methods are not effective on not well-annotated genomes. The methods in both categories discussed above all require a non-trivial set of positive samples of known miRNAs. Unfortunately, except well-studied model species such as human and Caenorhabditis elegans, most sequenced genomes have a small number of miRNAs reported. For instance, only 38 miRNAs have been identified in the genome of A.gambiae. This is particularly true for viral genomes for which not many miRNAs have been studied or reported (Stern-Ginossar et al., 2007). Furthermore, the small number of known viral miRNAs rarely share any sequence homology (Stern-Ginossar et al., 2007). Yet, viral miRNAs have been shown to play important roles in the pathology of viral infection by targeting host immune systems (Stern-Ginossar et al., 2007). In short, the importance of miRNAs in post-transcriptional regulation, the lack of a sufficient number of known miRNAs and poorly annotated genomes collectively call for novel effective computational approaches to miRNA prediction. In this work, we propose and develop a novel ranking algorithm based on random works to computationally identify novel miRNAs from genomes with a few known miRNAs and with poor annotation. Our algorithm uses very few positive samples, requires no negative sample and does not rely on genome annotation. When applied to human data, our approach was able to identify known miRNAs with a relatively high precision and recall. As an application, we applied our algorithm to predict novel miRNAs in A.gambiae, which is the most important vector of malaria in Africa and one of the most efficient malaria vectors in the world. Our analysis also showed that some of putative miRNAs encode mature miRNAs conserved in other species. 2 PROBLEM FORMULATION In our study, we cast the problem of miRNA prediction as a problem of information retrieval, in which novel miRNAs are to be retrieved from a pool of candidates by the known miRNAs as query samples. Specifically, we model this information retrieval process as belief propagation on a weighted graph, and develop a new ranking algorithm based on random walks. For a given species, the known miRNAs, putative candidate miRNAs and their relationship can be modeled by a weighted graph G= V, E . Each vertex v V in G represents a known miRNA precursor or a putative candidate, an edge e E V × V captures the relation between two vertices linked by the edge, and the weight w of edge e quantifies the relation. In general, edge weights are determined by pairwise distances. For example, two closely related samples may share an edge with a large weight. The degree of vertex vi is di=∑j wij, i.e. the total weight of all edges that are connected to vi. Note that the graph is fully connected, so that there always exists a path between each pair of nodes in the graph.We refer the known miRNA precursors as query samples and putative candidates as unknown samples. Consider query samples XQ={xq1, xq2, …, xqn} and unknown samples XU={xu1, xu2, …, xum}, where xqi, xuj, ![]() d. Our goal is to rank XU with respect to XQ. To achieve this goal, we associate each sample xi with a relevancy value fi, where fi=1 for all query samples and fi [0, 1] for all unknown samples. A larger fi value means a higher relevancy of xi with respect to the queries. We then sort the relevancy values of all unknown samples and select the top ranked samples as retrieved samples, which constitute our predicted miRNA precursors. Therefore, the key to the ranking algorithm is to precisely compute the relevancy values of all unknown samples. In this study, we adopted the random walks method for this ranking problem, to be discussed next.3 METHOD Query by samples is a paradigm for information retrieval in the information retrieval and machine learning fields. Zhou et al. proposed a manifold ranking method, which ranks the data with respect to the intrinsic manifold structure collectively revealed by the given data (Zhou et al., 2004). Different from their work, our method uses Markov random walks to model a belief propagation process and therefore has a direct physical interpretation. 3.1 Ranking based on random walks Random walks is a classical stochastic process on a weighted finite state graph, which exploits the structure of the given data probabilistically. In the random walks formulation, each sample is treated as a graph vertex, which corresponds to a state on a Markov chain. The one-step transition probability pij from vertex vi to vertex vj can be defined as
× n transition matrix among the query states, PQU is an n × m transition matrix from the query states to the unknown states, PUQ is a m × n transition matrix from the unknown states to the query states and PUU is a m × m transition matrix for the unknown states. Correspondingly, we partition the weight matrix W and the degree matrix D as
In our model, when a random walker transits from vi to vj, it will transmit the current relevancy information of vi to vj. This forms a dynamic process of relevancy information propagation: at each iterative step, a vertex transmits its relevancy information to its neighbors, and simultaneously receives the relevancy information from its neighbors. At the end of the iteration, each vertex updates its relevancy value according to the received information. Note that the query vertices only transmit their relevancy information and will never update their relevancy values. Furthermore, since query samples are more important than the unknown ones, the former are assigned higher weights for their relevancy information in transmission. This suggests the following relevancy updating rule,
(0, 1) the weight of the relevancy from the unknown samples. In our method, fi is also called the ranking score of sample i, which we use to rank the unknown samples.The convergence of the iterations is guaranteed by the following theorem. THEOREM 1.The sequence PROOF. generated by the updating rule of () converges when k approaches infinity.For convenience, let , =[0,1]m. Let , . Then (7) can be written as
→ and the measure d on it as follows:
, d) is a complete metric space.According to the Contractive Mapping Theorem of Banach (Istratescu, 1981), it is suffice to prove that T(f) is a contractive mapping, which holds if R satisfies
![]() Therefore, the limit of and can be substituted by fU, resulting in
× m identity matrix. Therefore,
3.2 The ranking algorithm The key to the above discussion and derivation is that we can compute the relevancy values of all unknown states according to (16) without actually performing the procedure of iterative random walks. Therefore, we propose the following ranking algorithm, which runs as follows:
For the problem of predicting new miRNAs, each sample is represented by a vector of 36 features (Section 3.3), and distance between two samples is the Euclidean distance of the feature vectors. In order to reduce the data noise and computational expense, we make the graph sparse by removing the ‘weak’ edges with low weights. Since putative miRNAs are to be retrieved by known miRNAs according to the ranking scores of the candidates, we named our miRNA prediction method as miRank. 3.3 Extraction of global and local sequence-structure features The most salient characteristics of miRNA precursors are their hairpinned secondary structures. In our study, we used RNAfold (Hofacker, 2003) to predict RNA secondary structures. We postulated that the entire hairpin structure of a miRNA precursor could be characterized solely by 36 global and local intrinsic attributes that capture sequence, structural and topological properties of the miRNA precursor. Four important global features at the structural and topological levels are the normalized minimum free energy of folding (MFE), the normalized base-pairing propensities (number of nucleotides that are paired) of both arms and the normalized loop length. Here the normalization factor is the length of the precursor sequence. Local features characterize various sequence properties in RNA sequences, which we discuss using an example. As shown in Figure 1 × 8) possible sequence-structure configurations for each triplet in the precursors of miRNAs. Similar features have been used in some of the existing methods, e.g. that in Xue et al. (2005).
4 EXPERIMENTAL EVALUATIONS AND DISCUSSIONS 4.1 Evaluation of the ranking algorithm on toy examples In order to show the belief propagation of our new ranking algorithm, along the manifold of the data, we generated a set of toy data, selected one data point as a query sample, and queried the rest samples to rank them. Figure 2
4.2 Datasets for miRNA prediction All reported miRNAs precursor sequences of H.sapiens and A.gambiae (533 and 38, respectively) were downloaded from the miRBase (http://microrna.sanger.ac.uk/sequences/) (Griffiths-Jones et al., 2006) as of September 1, 2007. Genome sequences of H.sapiens and A.gambiae were retrieved from UCSC Genome Browser (http://genome.ucsc.edu/). For the H.sapiens genome, we randomly extracted non-overlapping fragments of 90 nt from the genome so that no information of genome annotation was used. We first discarded all fragments overlapping with known miRNA precursors. For a fragment not overlapping with any known H.sapiens miRNA precursor, we further predicted its secondary structure using RNAfold (Hofacker, 2003). Nevertheless, not every such fragment would be kept for further analysis. The criteria for retaining hairpinned fragments are as follows: minimum 18 base pairings on the stem of the hairpin structure, maximum -12 kcal/mol free energy of the secondary structure and no multiple loops. The threshold 18 is the lowest number of base pairings, and the threshold -12 is the highest free energy among all genuine animal miRNA precursors. Finally, 1000 of such fragments were arbitrarily chosen and pooled together with some of the known human miRNA precursors—which were all known human miRNA precursors except the ones used as query samples in our experiments—to form the pool of candidates to be ranked by the miRank algorithm. The reason we added those known human miRNA precursors to this pool of samples is to evaluate the prediction performance of the miRank algorithm. Every chromosome of A.gambiae was fragmented, from 5′-end to 3′-end, using a sliding window of 90 nt and a shift increment of 45 nt. These fragments were folded using RNAfold (Hofacker, 2003), and hairpinned fragments were selected by the same criteria described above. The chosen hairpin sequences formed the initial candidate pool. In the fragmentation, some putative candidates might be cut into two pieces, and have lost their hairpin structures, hence were excluded from the candidate pool. To avoid this, we further fragmented, with the same sliding window and increment, the sequences between each pair of hairpinned fragments next to each other. The secondary structures of the new set of fragments were predicted and selected by the same tool and criteria. This process was iterated until no hairpinned fragment could be found. We obtained 22 297 hairpinned sequences, which included all 38 known miRNAs precursors for A.gambiae. 4.3 Evaluation on human miRNAs We first evaluated the performance of our miRank algorithm on H.sapiens data based on the known miRNA precursors embedded in the pool of samples to be ranked (Section 4.2). The prediction quality was assessed by the recall and the precision, which are, respectively, defined as:
The number of query samples is the most critical parameter for algorithm miRank. We tested on H. sapiens data with 1, 5, 10, 15, 20 and 50 known miRNA precursors randomly chosen as query samples. To reiterate, in each of these experiments, the rest known miRNA precursors were combined with the 1000 hairpin sequences extracted from the genome to form the pool of candidates to be ranked (Section 4.2). In each experiment, we chose n topmost ranked candidates, and determined the precision and recall of the result by comparing the chosen candidates with the known human miRNAs that were hidden in the candidate pool. By varying the number n, we obtained the receiver operating characteristic (ROC) curves (Spackman, 1989) for the performance of miRank using different number of query samples, as shown in Figure 3
In the extreme case of querying with only one sample, we obtained a precision greater than 70% in the 20 topmost retrieved samples. When we queried with more than 20 query samples, we obtained a high precision of over 95% in the 160 top ranked samples. These results suggest that we can expect miRank to be effective in predicting miRNAs on a species with a small number of reported miRNAs and a poorly annotated genome. Note that the precisions in these experiments could be under-estimated because genuine but unvalidated miRNA precursors may actually exist in the 1000 human hairpinned sequences that were analyzed in the experiment. Candidates with higher ranks are most likely to be true miRNAs; they are excellent candidates for further experimental validation. To obtain as many putative miRNAs with high confidence as possible, we need to set a ranking score cutoff. Figure 4
In order to further evaluate miRank, we compared it with some supervised classification algorithms (Hertel and Stadler, 2006; Ng and Mishra, 2007; Xue et al., 2005), which are the best existing miRNA prediction algorithms, on the human data. Since these existing methods are Support Vector Machines (SVMs)-based classification models trained with more than a hundred known miRNA precursors, it is difficult to directly compare miRank with these methods. To overcome the difficulty, we followed the strategy in (Xue et al., 2005), and trained six SVM-based models with 1, 5, 10, 15, 20 and 50 known miRNA precursors as positive training examples. We included twice as many negative samples as positive samples in the training sets as that gave the best results. All training examples and candidates used by the existing methods were represented by the same features as used by miRank. The results in Table 1 show that miRank is more accurate than the SVM-based methods. With fewer known miRNA precursors, miRank outperforms them all. Essentially, miRank possesses the advantage of Semi-supervised Learning (Zhu, 2005), which incorporates the data distribution information of the unlabeled data in the training process. Therefore, miRank has a much better performance than SVMs when the number of training samples is small.
4.4 Novel miRNA genes in A.gambiae Anopheles gambiae is one of the most important vectors of malaria in Africa, and one of the most efficient malaria vectors in the world. Identification of miRNAs is important for the study of the development of A. gambiae, hence may broaden our perspectives on the control of the prevalence of malaria (Moffett et al., 2007). To date, only 38 miRNAs of A.gambiae have been reported and curated in miRBase (Griffiths-Jones et al., 2006), and its genome is not well annotated. As discussed in Section 1, the existing methods are not effective in predicting novel miRNAs in A.gambiae. We took the 38 miRNAs curated in miRBase (Griffiths-Jones et al., 2006) as query samples, and applied miRank to 22 259 A.gambiae candidate sequences that have hairpinned secondary structures (Section 4.2). Candidates with higher ranks are most likely to be true miRNAs, as indicated by Figure 3 Conservation across related species has been widely exploited for predicting miRNAs in animals and plants. Three major observations have been made on known conserved miRNAs (Reinhart et al., 2002). First, the mature miRNA sequences are conserved, whereas the rest of the precursor sequences can be diverged. Second, the propensity of precursor sequences to form hairpinned secondary structures is conserved, although the actual structures may vary. Third, the conserved mature miRNA homologs are mostly located on the same arm of hairpinned secondary structures. In our research, we used hairpin structures as the most significant features to extract 22 259 candidates. We further analyzed the conservation of mature miRNAs of our top candidates. Figure 5
5 CONCLUSIONS AND FINAL REMARKS In this study, we cast the problem of miRNA prediction as a problem of information retrieval where novel miRNAs were retrieved by the known miRNAs (as query samples) from a genome-scale pool of candidate sequences that can form hairpinned structures. We modeled the novel miRNA retrieval process by a process of belief propagation on a weighted graph, which was constructed from known miRNAs and candidate RNA sequences to be analyzed. We then developed a novel ranking algorithm based on random walks to propagate information of known miRNAs to candidates. We named our final miRNA prediction method based on ranking miRank. The miRank method has the following remarkable properties. First, it does not require information of genome annotation. This is particularly important because many sequenced genomes have not been well annotated, and their closely related species are yet to be sequenced. Thus, a large number of false positive candidates with hairpinned secondary structures cannot be filtered out with genome annotation or by phylogenetic conservation. miRank can be applied to such newly sequenced genomes with little annotation. Second, it does not rely on cross-species conservation so that it can identify species-specific miRNAs. Third, miRank is able to accommodate a small number of known miRNAs while enjoys a high-prediction accuracy. Hence, miRank is a useful tool for many species including most viruses that have a very few reported miRNAs. Identification of novel miRNAs in various viral genomes will be our future research topic. To demonstrate these favorable features of the miRank algorithm, we tested it on human data where more than 530 miRNAs have been reported which we used to validate our result. miRank achieved a prediction accuracy of more than 95% using a small number of known miRNAs as labeled samples. We also applied miRank to A.gambiae to predict 200 novel miRNA precursors. Our result showed that 78 of the novel miRNA precursors encode mature miRNAs that are conserved in at least one other animal species. ACKNOWLEDGEMENTS Funding: This research was supported in part by NSF grant IIS-0535257, a grant from the Alzheimer's Association and a grant from Monsanto Corporation. Conflict of Interest: none declared. REFERENCES
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||||||||||||||||||||||||
Cell. 2004 Jan 23; 116(2):281-97.
[Cell. 2004]Annu Rev Plant Biol. 2006; 57():19-53.
[Annu Rev Plant Biol. 2006]Plant Physiol. 2006 Jul; 141(3):988-99.
[Plant Physiol. 2006]Plant Cell. 2006 Feb; 18(2):412-21.
[Plant Cell. 2006]Mol Cell. 2004 Jun 18; 14(6):787-99.
[Mol Cell. 2004]Cell. 2004 Jan 23; 116(2):281-97.
[Cell. 2004]Nat Genet. 2006 Jun; 38 Suppl():S2-7.
[Nat Genet. 2006]Annu Rev Plant Biol. 2006; 57():19-53.
[Annu Rev Plant Biol. 2006]Science. 2001 Oct 26; 294(5543):853-8.
[Science. 2001]Cell. 2005 Jan 14; 120(1):21-4.
[Cell. 2005]Proc Natl Acad Sci U S A. 2004 Aug 3; 101(31):11511-6.
[Proc Natl Acad Sci U S A. 2004]Mol Cell. 2003 May; 11(5):1253-63.
[Mol Cell. 2003]Genome Biol. 2003; 4(7):R42.
[Genome Biol. 2003]Nucleic Acids Res. 2005; 33(11):3570-81.
[Nucleic Acids Res. 2005]Bioinformatics. 2006 Jul 15; 22(14):e197-202.
[Bioinformatics. 2006]Bioinformatics. 2007 Jun 1; 23(11):1321-30.
[Bioinformatics. 2007]BMC Bioinformatics. 2005 Dec 29; 6():310.
[BMC Bioinformatics. 2005]Science. 2007 Jul 20; 317(5836):376-81.
[Science. 2007]Nucleic Acids Res. 2003 Jul 1; 31(13):3429-31.
[Nucleic Acids Res. 2003]Nucleic Acids Res. 2003 Jul 1; 31(13):3429-31.
[Nucleic Acids Res. 2003]BMC Bioinformatics. 2005 Dec 29; 6():310.
[BMC Bioinformatics. 2005]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D140-4.
[Nucleic Acids Res. 2006]Nucleic Acids Res. 2003 Jul 1; 31(13):3429-31.
[Nucleic Acids Res. 2003]Nucleic Acids Res. 2003 Jul 1; 31(13):3429-31.
[Nucleic Acids Res. 2003]Bioinformatics. 2006 Jul 15; 22(14):e197-202.
[Bioinformatics. 2006]Bioinformatics. 2007 Jun 1; 23(11):1321-30.
[Bioinformatics. 2007]BMC Bioinformatics. 2005 Dec 29; 6():310.
[BMC Bioinformatics. 2005]PLoS One. 2007 Sep 5; 2(9):e824.
[PLoS One. 2007]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D140-4.
[Nucleic Acids Res. 2006]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D140-4.
[Nucleic Acids Res. 2006]Genes Dev. 2002 Jul 1; 16(13):1616-26.
[Genes Dev. 2002]