![]() | ![]() |
Formats:
|
||||||||||||||||||||||||
Copyright © 2005, Cold Spring Harbor Laboratory Press Discovering functional transcription-factor combinations in the human cell cycle Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA 1Corresponding authors.E-mail zhou-zhu/at/student.hms.harvard.edu; fax (617) 432-7266.E-mail http://arep.med.harvard.edu/gmc/email.html; fax (617) 432-7266. Received October 21, 2004; Accepted March 31, 2005. This article has been cited by other articles in PMC.Abstract With the completion of full genome sequences and advancement in high-throughput technologies, in silico methods have been successfully used to integrate diverse data sources toward unraveling the combinatorial nature of transcriptional regulation. So far, almost all of these studies are restricted to lower eukaryotes such as budding yeast. We describe here a computational search for functional transcription-factor (TF) combinations using phylogenetically conserved sequences and microarray-based expression data. Taking into account both orientational and positional constraints, we investigated the overrepresentation of binding sites in the vicinity of one another and whether these combinations result in more coherent expression profiles. Without any prior biological knowledge, the search led to the discovery of several experimentally established TF associations, as well as some novel ones. In particular, we identified a regulatory module controlling cell cycle-dependent transcription of G2-M genes and expanded its functional generality. We also detected many homotypic combinations, supporting the importance of binding-site density in transcriptional regulation of higher eukaryotes. Cis-regulation of gene expression by the binding of transcription factors (TFs) is a critical component of cellular physiology. In eukaryotes, a battery of TFs often work together in a combinatorial fashion to enable cells to respond to a wide spectrum of environmental and developmental signals. Integration of genome sequences and/or ChIP-chip data with gene-expression data has facilitated in silico discovery of how the combinatorics and positioning of TF-binding sites underlie gene activation in a variety of cellular processes for relatively simple organisms such as Saccharomyces cerevisiae and Caenorhabditis elegans (Bussemaker et al. 2001; Pilpel et al. 2001; Banerjee and Zhang 2003; Beer and Tavazoie 2004; Kato et al. 2004; Terai and Takagi 2004). Application of these methods to the human genome, however, is complicated by its significantly larger size and substantial repetitive content, as well as the greater complexity of the transcriptional network. Early studies suggest that phylogenetic footprinting (Wasserman and Fickett 1998; Wasserman et al. 2000; Levy et al. 2001; Blanchette and Tompa 2002; Liu et al. 2004) and a focus on motif combinations (Frech et al. 1998; Kel et al. 1999; Krivan and Wasserman 2001; Aerts et al. 2003) may prove essential to computational analyses of human transcription-factor binding sites. As the functional interactions between TFs often require them to be in physical proximity, their binding sites are likely to be overrepresented in the vicinity of each other. Exploiting such property, we devised a two-step strategy (Fig. 1
Results Extraction of human promoter sequence and phylogenetic footprinting We had previously mapped UniGene clusters onto the human genome as well as generated a “mousenized” version of the genome (http://club.med.harvard.edu/hummus/hummus.html). To build a promoter sequences set, we extracted the sequence 1 kb upstream of 11,436 curated RefSeq mRNAs as putative promoter regions. While some regulatory elements can act over very large distances, up to several kilobases from transcriptional start sites (TSS), we focused on sequences in the relative proximity of TSS, as they are most likely to contain regulatory information for evolutionarily conserved biological processes such as the cell cycle. Accuracy of identifying binding sites by weight matrix For anchor motifs, we utilized 134 experimentally derived position weight matrices from the TRANSFAC database. They correspond to ~70 distinct motifs, as estimated by CompareACE (Hughes et al. 2000) with a cut-off of 0.85. Putative binding sites were identified by scanning promoter sequences with the anchor motifs using PATSER (Hertz and Stormo 1999). To estimate the accuracy of our in silico predictions, we compared the list of E2F site (M00516)-containing genes with those identified as E2F4 targets in primary fibroblasts using ChIP-chip technology (Ren et al. 2002). Our promoter set includes 96 of the experimentally determined target genes, and 56 of them were predicted computationally based on 1-kb promoter sequences (P = 1.1 E-11). Phylogenetic footprinting has been demonstrated to be a powerful strategy for filtering out false-positive results of motif discovery algorithms (Loots et al. 2000; Wasserman et al. 2000), as real binding sites are far more likely to be conserved under selection pressure than random sequences. By confining to those hits that are also found in mouse, we refined our in silico predictions from 2759 to 230, 15 of which overlap with the E2F4 target genes by ChIP-chip. The evolutionarily conserved E2F-binding sites have a much sharper peak over the first 100 nucleotides upstream from transcription start site (TSS, as estimated by the start of mRNA sequence; Fig. 2
Significantly enriched neighbor motifs To search for enriched neighbor motifs, we extracted both human and corresponding mouse sequences in the vicinity (e.g., 50 and 100 bp) of the conserved anchor motif sites. As the functional interactions between TFs often impose orientational constraints, upstream and downstream sequences were grouped separately. A blind and systematic search was then conducted for shared sequence features with the program AlignACE (Roth et al. 1998; Hughes et al. 2000), which identifies motifs that are overrepresented in a set of unaligned input sequences. AlignACE calculates a statistic called the MAP score, which is an internal metric to determine the statistical significance of an alignment. We only considered those with a MAP score of 10 or higher (Tavazoie et al. 1999), and at least five genes containing the anchor-neighbor combination. This resulted in a total of 6293 neighbor motifs (3227 downstream + 3066 upstream) for the window size of 50 bp, and 9278 (4831 downstream + 4447 upstream) for the window size of 100 bp. We selected the most statistically significant neighbor motifs using a measure called neighbor specificity (NS) score. It quantifies how specific a neighbor motif targets the neighboring region of the anchor motif, given its rate of occurrence in all promoters. To reduce the bias in assessing the degree of specificity, the calculation was performed based on statistics over the entire 1-kb human sequences, including conserved as well as nonconserved regions. We corrected for multiple testing by calculating NS score cutoffs corresponding to an FDR (Storey and Tibshirani 2003) of 0.05. Homotypic distribution of TF-binding sites We noticed that the vicinity of anchor motifs are often specifically enriched with themselves (i.e., anchor = neighbor; Table 1), e.g., 20 instances for 50 bp, and 14 for 100 bp. This observation is in agreement with findings in S. cerevisiae (Wagner 1999) and Drosophila (Lifanov et al. 2003), where statistically significant homotypic clusters of transcription-factor binding sites have been reported. The presence of multiple copies of the same cis-regulatory motifs has been documented in biological literature (Arnone and Davidson 1997). While some represent sites for transcription factors that act as oligomers (e.g., p53, NF-κB, GAGA), others activate morphogen TFs in response to their low local concentration (Gurdon and Bourillot 2001). Meanwhile, they could also contribute to the robustness of regulatory elements (Simpson 2002), or play a role in differentiating real sites from spurious ones by increasing the binding affinity of the former, a task increasingly important in larger genomes, such as that of human.
Heterotypic neighbor motifs It is conceivable that some neighbor motifs may happen to be enriched in the vicinity of anchor motifs simply because they both prefer the same location relative to TSS, but have nothing to do with each other. To filter out such potential positionally biased scenarios, for each statistically overrepresented hetero-neighbor motif (i.e., anchor ≠ neighbor), we randomly selected the same number of promoters as those with its parent anchor motif and extracted segments of same window size from the same distance upstream of TSS as those containing anchor motif, followed by identical motif search procedures. The random sampling process was repeated 100 times, and we rejected the neighbor motif if it was “rediscovered” by any of these runs. After applying such a positional bias filter, we ended up with 636 (852) significant hetero-neighbor motifs from a 50-bp (100-bp) window, 40 (37) of which could be mapped to known TRANSFAC matrices (CompareACE > 0.85). Functional anchor-neighbor motif combinations Given the specific enrichment of neighbor motifs, we next asked whether some of them may functionally interact with their corresponding anchors by analyzing a human cell cycle expression data set (Whitfield et al. 2002), which recorded genome-wide expression levels of synchronized HeLa S3 cells using spotted microarray. For any anchor-neighbor motif combination (within a specified window size), we generated three groups of genes, one with the combination, one with anchor motif but not the combination and one with neighbor motif, but not the combination. An anchor-neighbor motif combination was considered “functional” if the expression profiles of the genes from the first set are significantly more highly correlated than both the second and third sets. We used a multivariate hypergeometric model (Banerjee and Zhang 2003) to calculate the probability of obtaining the observed or higher fraction of correlated gene pairs in the set with the combination, given the fractions of correlated gene pairs in the sets without the combination. After accounting for multiple testing with Bonferroni correction, we obtained 167 and 225 significant combinations for 50 and 100 bp, respectively (corrected P-value < 0.05). Among those containing neighbor motifs that could be mapped to known TRANSFAC matrices, 10 (10) homotypic and 8 (10) heterotypic pairs are represented (Tables 1, 2) with a window size of 50 bp (100 bp). A complete list of significant combinations can be found at http://genetics.med.harvard.edu/~zzhu/combination.html.
Our approach identified several experimentally established associations between TFs. For instance, the cooperation between E2F and NF-Y, two main regulators of cell cycle, has been well documented (van Ginkel et al. 1997; Caretti et al. 2003), and we uncovered their connection using both 50- and 100-bp window sizes. We found RFX1 to be significantly enriched within 100 bp upstream of GABP, and the combination shows a functional effect on expression, in agreement with previous observations that GABP and RFX-1 act synergistically in boosting activity at ribosomal protein L30 promoter, with a RFX-1 site at -128 and a GABP site at -56 (Safrany and Perry 1995). The cAMP responsiveness via CREB has been demonstrated to require a proximal TATA box (Conkright et al. 2003). We indeed observed TATA overrepresented within 50 bp downstream of CREB, and they appear to functionally interact with each other. In addition, YY1-cMyc, Oct1-C/EBP, CREB-YY1, Oct1-NF-Y, ELK1-HIF1, and POU-TBP, known to either form a physical complex or act in concert at some promoters, were also linked in our analysis (Shrivastava et al. 1993; Zhou et al. 1995; Hatada et al. 2000; Bertolino and Singh 2002; Chang et al. 2003; Hirose et al. 2003). A module controlling transcription of G2-M genes One of the most significant motif combinations uncovered in our analysis (P = 9.97 E-7; Fig. 3A
Most of the genes containing the NF-Y-CDE-CHR combination are indeed cell cycle periodic and peak in G2-M phases. We identified 20 genes with the module in their 1-kb promoter sequences, 17 of which are included in the cell cycle expression data set. Based on their microarray data set, Whitfield et al. (2002) reported 872 cell cycle periodic genes by Fourier Transform analysis. Each of them was assigned to a cell cycle phase by their peak correlation to an idealized expression profile from well-studied genes. Among our 17 putative targets, 14 were characterized as cell cycle periodic (Table 3), with all but one peaking in G2 or G2/M phases (P = 1.18 E-12).
Combinations enriched in other expression clusters We also looked for TF modules that may regulate genes of other phases of the cell cycle. E2F has a well-established role in controlling G1/S transition. We found two E2F combinations with GC-rich motifs within 100 bp downstream of E2F that are overrepresented among G1/S genes (P = 1.40 E-5 and 8.12 E-5, respectively). Another enriched combination involves E2F and a neighbor motif strongly resembling the binding site of NF-Y (CompareACE score = 0.98). While genes with this combination have a clear preference for peaking in G1/S and S phases (P = 4.39 E-4), it should also be noted that a small number of them belong to G2 and G2/M clusters instead, including CDC2 and CYCLIN B1, whose promoter elements have recently been characterized to contain functional E2F and NF-Y sites (Zhu et al. 2004). Our in silico observations are in line with a newly emerged role for E2F beyond G1/S transition as uncovered from microarray and ChIP-chip studies (Ishida et al. 2001; Ren et al. 2002). We also noticed several combinations enriched with S-genes. But a close inspection of their putative targets revealed that all are histones. Given the cross-hybridization of histone genes on the microarray and their manual assignments to S phase (Whitfield et al. 2002), this result has to be interpreted with caution. None of our combinations appears to be predictive of M/G1. Discussion Understanding the regulation of the human cell-division cycle is central to the study of many diseases. A recent genome-wide in silico study identified TF binding sites that are overrepresented in the promoters of cell cycle periodic genes (Elkon et al. 2003). Here, in an effort to explore the combinatorial aspect of the transcriptional control in the human cell cycle, we developed a computational algorithm to identify transcription factors that preferentially act together. Based on de novo motif finding, our method is not limited to interactions involving known TF target sites, but rather has the potential of discovering novel ones. Considering the small number of binding motifs that have been well characterized in mammals so far, we believe such capability is imperative to a thorough understanding of transcriptional regulatory networks. Functional interactions between TFs not only require their co-occurrence on the same promoter (enhancer), but often with positional (Makeev et al. 2003) and orientational (Terai and Takagi 2004) constraints as well. By grouping together neighboring sequences upstream and downstream of anchor motifs separately, our algorithm provided an additional layer of insights regarding whether there is a preference in a relative location of the two motifs with respect to the genes they regulate. Some of our findings are consistent with experimental observations (see Results). Furthermore, we incorporated a distance parameter into the algorithm, as the statistical overrepresentation of two potentially cooperating motifs in the vicinity of each other is more likely to be biologically relevant. In this study, we experimented with window sizes of 50 and 100 bp, respectively. The success of our approach clearly relies on the correct identification of true TF-binding sites from sequences. To estimate the reliability of in silico predictions, we compared the list of E2F site-containing genes with those in vivo targets determined using ChIP-chip technology. While there is a significant overlap between the two, it is worth noting that some ChIP-chip targets are “missed” by sequence-based search. Such apparent discrepancy has been reported before (Iyer et al. 2001; Ren et al. 2002; Weinmann et al. 2002; Cawley et al. 2004; Euskirchen et al. 2004), and may be contributed to by a number of factors. For instance, binding could occur indirectly through association with other proteins; there may exist unknown sequence variants or elements beyond primary sequence recognized by the transcription factor. In addition, as a relatively new experimental technique, ChIP-chip is prone to noise itself, and additional strategies have been utilized to filter out false positives (Garten et al. 2005). The findings of many significant motif pairs, where neighbor seems to be the same as anchor, underscores the importance of homotypic interactions in transcriptional regulation. Two recent bioinformatics studies have based their search for cis- regulatory modules (CRM) in Drosophila upon the clustering of a single motif (Markstein et al. 2002; Papatsenko et al. 2002). What we observed here using human sequence and expression data supports a functional role of binding-site densities, suggesting an analogous search strategy may also be applied to the genome of higher eukaryotes. Several TF combinations uncovered in our analysis appear to control specific phases of the cell cycle. For example, we found the NF-Y-CDE-CHR module predictive of G2 and G2/M genes. E2F-NF-Y, on the other hand, is preferably associated with those peaking in G1/S and S. NF-Y binding has been reported in many cell cycle promoters (Bolognese et al. 1999; Farina et al. 1999; Yun et al. 1999; Caretti et al. 2003), and its activity can be regulated through nuclear localization, splicing, or post-transcriptional modification (Mantovani 1999). Our results further suggest that as a master transcriptional regulator of cell cycle progression, it may achieve phase specificity by coupling with different functional partners. In addition to combinations of cell cycle-related regulators, we also identified a number of experimentally established regulatory modules involved in other biological processes. Deciphering transcription regulatory networks from genomic sequence is an exciting but challenging task, especially given the enormous size and complexity of the human genome. In this study, we attempted to uncover the signals that may direct gene expression by searching for evolutionarily conserved and overrepresented TF-binding site combinations associated with more coherent mRNA patterns. While the current analysis was performed with an expression data set obtained from synchronized HeLa cells, it can be readily extended to probe different cellular conditions and types. We anticipate such approaches will be useful for understanding how gene regulation is encoded in the genomic instruction book of life. Methods Promoter sequences Sequence and human-mouse (HUMMUS) alignment information were obtained from a previous study (Shendure and Church 2002). For each of the 11436 curated human mRNA RefSeq in the data set, we extracted the sequence 1 kb upstream of the mRNA as putative promoter region. A total of 8141 have at least portions of their promoter regions conserved in mouse. Anchor motifs Position weight matrices (PWM) for transcription-factor binding sites were obtained from the TRANSFAC database (Wingender 2004) (release 6.1). There are 281 matrices annotated as bound by human factors (based on BF field). In consideration of computational time, we limited our analysis to 134 of those with no more than 7000 gene targets and 2000 human-mouse conserved sites in our human promoter set. Expression data We utilized the cell cycle expression data from Whitfield et al. (2002). In their study, in order to obtain better resolution at various cell cycle phases, HeLa S3 cells were synchronized with three different methods, double thymidine block, thymidine-nocodazole block, and mitotic shake off. For our analysis, we used the time series from experiments Thy_Thy3, Thy_Noc, and mitotic shake-off, each of which covers one to two cell cycles at 1- to 2-h intervals. Genome-wide scanning for anchor motifs PATSER (Hertz and Stormo 1999) (v. 3e) was used to scan our promoter set for matches to the PWM. It was run with the following command line options “-c -li -s -u2.” An “alphabet” file was used to provide the following background frequencies: A/T = 0.48 and C/G = 0.52. These frequencies were determined from our 1-kb human promoter set. Search for neighbor motifs AlignACE was used to search for enriched neighbor motifs. It was run with default parameters and a GC background frequency of 0.54, which was calculated using the human-mouse conserved regions of our promoter set. Statistics of overrepresented neighbor motifs To determine whether the enrichment of neighbor motif around anchor motif is statistically significant, we devised a measure called neighbor specificity (NS) score based on the binomial distribution. Consider constructing a set of sample genes (promoters) with all those containing anchor motifs. “Success” is scored if the sampled promoter contains a neighbor motif within a specified window around the anchor, or “failure” otherwise. The probability of a random success for each sampled promoter, p, is approximated by: where Nn is the number of promoters with neighbor motif, Nt is the total number of promoters (11436), W is the specified window size (50 or 100 bp), L is the promoter length (1000 bp), Ca is the average copy number of anchor motifs per anchor-containing promoter, and Cn is the average copy number of neighbor motifs per neighbor-containing promoter. The W/L term is included, because a success is scored only if the neighbor motif falls within a particular window around the anchor; the Ca and Cn terms take into account scenarios where there may be more than one copy of anchor or neighbor motif on a promoter, which essentially leads to a larger window size or multiple tests for success, respectively. The probability of getting at least the observed number of (anchor-containing) promoters with neighbor motifs within a specified window by chance follows as: where Na is the number of promoters with anchor motif, and x is the number of promoters with anchor-neighbor motif combination. Correcting for multiple hypothesis testing Correction for multiple testing was conducted with a Q-value package (http://faculty.washington.edu/~jstorey/qvalue/), which uses an FDR method. FDR, or false discovery rate, is the rate that significant features are truly null. It has increased power over the Bonferroni-type approach. We determined NS score thresholds corresponding to a FDR of 0.05: 5.75 E-3 and 5.09 E-3 for 50 and 100 bp, respectively. Identifying functional anchor-neighbor motif combinations We quantified the similarity of expression profiles within a given set of genes using the expression-coherence score (Pilpel et al. 2001), which is defined as the fraction of gene pairs whose expression profiles are closely correlated (i.e., correlation coefficient falls within the top fifth percentile of all gene pairs in the genome). To determine whether the expression-coherence score of genes with the motif combination is significantly higher than both anchor- and neighbor-containing genes without the combination, we adopted a model based on multivariate hypergeometric distribution as previously described (Banerjee and Zhang 2003). The probability of observing at least mcomb out of ncomb gene pairs closely correlated was calculated as following: where n = the number of gene pairs in a set, m = the number of closely correlated gene pairs in a set, comb = the set of genes with anchor-neighbor motif combination, neg1 = the set of genes with anchor motif but without neighbor in the vicinity, neg2 = the set of genes with neighbor motif but without anchor in the vicinity, N =nneg1+ncomb+nneg2, and M =mneg1+mcomb+mneg2. We reported the combinations with P-values <7.5 E-5 and 5.7 E-5 for 50 and 100 bp, respectively, as the implied significance level for these cut-offs is 0.05 when applying Bonferroni correction for multiple testing. Search for genes containing the NF-Y-CDE-CHR module Based on a few experimentally characterized instances of CDE-CHR repressor, we expanded our search for genes containing the NF-Y-CDE-CHR module by allowing a flexible link of up to 10 bp between CDE ([G/C]GCG[G/C]) and CHR ([G/A][T/C]TTGAA). We then scanned 100 bp upstream of CDE-CHR for NF-Y motif (M00287). Hypergeometric distribution was used to estimate the chance probability of obtaining at least the observed number of NF-Y-CDE-CHR module containing genes peaking in G2 and G2/M phases. More specifically, it is calculated as following: where x is the number of genes with potential NF-Y-CDE-CHR regulatory module and peak in G2 and G2/M phases, M is the total number of genes that we have both expression data and sequence data for (8162), n is the number of genes with potential NF-Y-CDE-CHR regulatory module, and K is the total number of G2 and G2/M genes (230). Acknowledgments We are grateful to John Aach, Patrik D'haeseleer, Philippe Marc, Allegra Petti, and Fritz Roth for inspiring discussions and valuable comments. We also thank the anonymous reviewers for their helpful feedback. Z.Z. was a Howard Hughes Medical Institute predoctoral fellow. This work was supported by CEGS, DOE-GTL, DARPA, and the Lipper Foundation. Footnotes [Supplemental material is available online at www.genome.org and http://genetics.med.harvard.edu/~zhu/combination.html.] Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.3394405. References
WEB SITE REFERENCES
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||
Nat Genet. 2001 Feb; 27(2):167-71.
[Nat Genet. 2001]Nat Genet. 2001 Oct; 29(2):153-9.
[Nat Genet. 2001]Nucleic Acids Res. 2003 Dec 1; 31(23):7024-31.
[Nucleic Acids Res. 2003]Cell. 2004 Apr 16; 117(2):185-98.
[Cell. 2004]Genome Biol. 2004; 5(8):R56.
[Genome Biol. 2004]J Mol Biol. 2000 Mar 10; 296(5):1205-14.
[J Mol Biol. 2000]Bioinformatics. 1999 Jul-Aug; 15(7-8):563-77.
[Bioinformatics. 1999]Genes Dev. 2002 Jan 15; 16(2):245-56.
[Genes Dev. 2002]Science. 2000 Apr 7; 288(5463):136-40.
[Science. 2000]Nat Genet. 2000 Oct; 26(2):225-8.
[Nat Genet. 2000]Nat Biotechnol. 1998 Oct; 16(10):939-45.
[Nat Biotechnol. 1998]J Mol Biol. 2000 Mar 10; 296(5):1205-14.
[J Mol Biol. 2000]Nat Genet. 1999 Jul; 22(3):281-5.
[Nat Genet. 1999]Proc Natl Acad Sci U S A. 2003 Aug 5; 100(16):9440-5.
[Proc Natl Acad Sci U S A. 2003]Bioinformatics. 1999 Oct; 15(10):776-84.
[Bioinformatics. 1999]Genome Res. 2003 Apr; 13(4):579-88.
[Genome Res. 2003]Development. 1997 May; 124(10):1851-64.
[Development. 1997]Nature. 2001 Oct 25; 413(6858):797-803.
[Nature. 2001]Nat Rev Genet. 2002 Dec; 3(12):907-17.
[Nat Rev Genet. 2002]Mol Biol Cell. 2002 Jun; 13(6):1977-2000.
[Mol Biol Cell. 2002]Nucleic Acids Res. 2003 Dec 1; 31(23):7024-31.
[Nucleic Acids Res. 2003]J Biol Chem. 1997 Jul 18; 272(29):18367-74.
[J Biol Chem. 1997]J Biol Chem. 2003 Aug 15; 278(33):30435-40.
[J Biol Chem. 2003]Eur J Biochem. 1995 Jun 15; 230(3):1066-72.
[Eur J Biochem. 1995]Mol Cell. 2003 Apr; 11(4):1101-8.
[Mol Cell. 2003]Science. 1993 Dec 17; 262(5141):1889-92.
[Science. 1993]Nat Genet. 2001 Oct; 29(2):153-9.
[Nat Genet. 2001]J Biol Chem. 2002 Mar 22; 277(12):10719-26.
[J Biol Chem. 2002]EMBO J. 1995 Jan 3; 14(1):132-42.
[EMBO J. 1995]EMBO J. 1995 Sep 15; 14(18):4514-22.
[EMBO J. 1995]Mol Biol Cell. 2002 Jun; 13(6):1977-2000.
[Mol Biol Cell. 2002]EMBO J. 2004 Nov 24; 23(23):4615-26.
[EMBO J. 2004]Mol Cell Biol. 2001 Jul; 21(14):4684-99.
[Mol Cell Biol. 2001]Genes Dev. 2002 Jan 15; 16(2):245-56.
[Genes Dev. 2002]Mol Biol Cell. 2002 Jun; 13(6):1977-2000.
[Mol Biol Cell. 2002]Genome Res. 2003 May; 13(5):773-80.
[Genome Res. 2003]Nucleic Acids Res. 2003 Oct 15; 31(20):6016-26.
[Nucleic Acids Res. 2003]Bioinformatics. 2004 May 1; 20(7):1119-28.
[Bioinformatics. 2004]Nature. 2001 Jan 25; 409(6819):533-8.
[Nature. 2001]Genes Dev. 2002 Jan 15; 16(2):245-56.
[Genes Dev. 2002]Genes Dev. 2002 Jan 15; 16(2):235-44.
[Genes Dev. 2002]Cell. 2004 Feb 20; 116(4):499-509.
[Cell. 2004]Mol Cell Biol. 2004 May; 24(9):3804-14.
[Mol Cell Biol. 2004]Proc Natl Acad Sci U S A. 2002 Jan 22; 99(2):763-8.
[Proc Natl Acad Sci U S A. 2002]Genome Res. 2002 Mar; 12(3):470-81.
[Genome Res. 2002]Oncogene. 1999 Mar 11; 18(10):1845-53.
[Oncogene. 1999]Oncogene. 1999 May 6; 18(18):2818-27.
[Oncogene. 1999]J Biol Chem. 1999 Oct 15; 274(42):29677-82.
[J Biol Chem. 1999]J Biol Chem. 2003 Aug 15; 278(33):30435-40.
[J Biol Chem. 2003]Gene. 1999 Oct 18; 239(1):15-27.
[Gene. 1999]Genome Biol. 2002 Aug 22; 3(9):RESEARCH0044.
[Genome Biol. 2002]In Silico Biol. 2004; 4(1):55-61.
[In Silico Biol. 2004]Mol Biol Cell. 2002 Jun; 13(6):1977-2000.
[Mol Biol Cell. 2002]Bioinformatics. 1999 Jul-Aug; 15(7-8):563-77.
[Bioinformatics. 1999]Nat Genet. 2001 Oct; 29(2):153-9.
[Nat Genet. 2001]Nucleic Acids Res. 2003 Dec 1; 31(23):7024-31.
[Nucleic Acids Res. 2003]Mol Biol Cell. 2002 Jun; 13(6):1977-2000.
[Mol Biol Cell. 2002]