• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Jan 2008; 36(Database issue): D854–D859.
Published online Oct 16, 2007. doi:  10.1093/nar/gkm729
PMCID: PMC2238932

Cyclebase.org—a comprehensive multi-organism online database of cell-cycle experiments

Abstract

The past decade has seen the publication of a large number of cell-cycle microarray studies and many more are in the pipeline. However, data from these experiments are not easy to access, combine and evaluate. We have developed a centralized database with an easy-to-use interface, Cyclebase.org, for viewing and downloading these data. The user interface facilitates searches for genes of interest as well as downloads of genome-wide results. Individual genes are displayed with graphs of expression profiles throughout the cell cycle from all available experiments. These expression profiles are normalized to a common timescale to enable inspection of the combined experimental evidence. Furthermore, state-of-the-art computational analyses provide key information on both individual experiments and combined datasets such as whether or not a gene is periodically expressed and, if so, the time of peak expression. Cyclebase is available at http://www.cyclebase.org.

INTRODUCTION

The cell division cycle is one of the most fundamental processes of life, allowing cells to multiply and faithfully pass on their genetic information to future generations. The full complexity of this process became apparent a decade ago with the first genome-wide microarray studies of the mitotic cell cycle of budding yeast (1,2). Since then, numerous other microarray studies have been published on the cell cycle of the budding yeast Saccharomyces cerevisiae (3,4), the fission yeast Schizosaccharomyces pombe (5,7), human (8) and the plant Arabidopsis thaliana (9).

Accessing, analyzing and comparing these many datasets has unfortunately remained difficult for a variety of reasons. First, there is no single database from which one can download all the datasets in an unified file format. The expression profiles for each experiment are often stored on individual websites. Second, the same gene identifiers are not used across datasets, making it difficult to compare expression profiles from different studies on the same organism. Third, a variety of different methods have, with varying success, been used for identifying the significantly regulated genes (1–28). The use of many different algorithms has introduced uncertainty as to which is the correct set of cell-cycle regulated genes. Fourth, new experimental studies tend to disregard already existing expression data, and thus only evaluate cell-cycle regulation based on their own experiments. Finally, general microarray repositories, analysis methods and visualization tools have by nature not been designed to meet the specific needs of the cell-cycle community.

Here, we present Cyclebase.org, a database and web resource of cell-cycle microarray expression datasets (see Table 1 for an overview of the datasets included in Cyclebase). These datasets have been mapped to common gene identifiers and normalized onto a common timescale, facilitating direct comparison of expression profiles between all experiments within an organism. The web interface provides a good visual overview of all available expression data on a given gene, as well as the results from state-of-the-art computational analyses. This interface aids the user in interpreting the combined evidence on the cell-cycle regulation of a given gene.

Table 1.
Summary of cell-cycle microarray experiments in Cyclebase

PRESENTING CYCLEBASE

The interface of Cyclebase is designed to make it as simple as possible for users to find and browse the genes of interest. Searching for key terms such as standard gene names (e.g. HTA2), systematic names (e.g. YBL003C) or descriptions (e.g. histone) will produce a list of candidate genes for inspection. Genes in this list are initially sorted by their match to the search criteria and then in ascending order on the cell-cycle rank score (most periodic genes at the top). The list can be sorted on any of the other columns simply by clicking them. In addition, an advanced search page allows the user to browse for genes that match certain criteria; for example, it allows researchers to find among the 100 most periodic human genes, those that peak in S-phase.

When a gene of interest has been selected, or if a query is entered that matches only a single gene, the user is taken to the Gene Details page (Figure 1). This page is the primary interface for viewing expression profiles, key results from statistical analyses and general information about the gene in question. By default, the statistical results are based on all available experiments. Expression profiles and analysis results for the individual experiments can be accessed by clicking on a single experiment in the experiments list (Figure 1A).

Figure 1.
Screenshot for budding yeast CLB1. The figure shows the Gene Details Page for the gene CLB1 (a cyclin). (A) The list of experiments in which the gene is measured. Clicking any of these takes the user to another Gene Details Page with only data from that ...

To allow for inspection of the accumulated evidence for transcriptional regulation during the cell cycle, all available expression data for a gene of interest are depicted in the expression profile chart (Figure 1B). Easy comparison of different experiments is obtained by placing each profile onto a common time scale, which we have chosen to be in percent of the cell division cycle with zero corresponding to cytokinesis (M/G1-transition) (16,29,30). Such normalization is necessary as the individual experiments vary greatly in their absolute interdivision times, depending on the experimental conditions. Subsequent alignment of the timescales is also necessary, because different experiments release the cells from different points in the cell cycle. Finally, the expression values have been normalized to a standard deviation of one over the entire experiment to further aid comparison across experiments.

To provide an unbiased and comparable assessment of the expression data, a common computational analysis framework has been applied to all datasets in the database. For every expression profile, two P-values are calculated that assess the significance of periodicity and regulation (16). The P-values are summarized across all experiments in an organism and combined to a final score, which is used to rank all genes in the genome (16) (Figure 1C). A brief explanation of the algorithms is provided in the Methods section of Cyclebase.

Based on independent benchmarking, this methodology has previously been proven to be as good as or superior to all other published methods for identifying periodically expressed genes (16,29,30). We have expanded this benchmark to also include recent methods (1,2,5–28) and experiments (Figure 2). Benchmark sets were compiled that are enriched in cell-cycle regulated genes from targets of known cell-cycle transcription factors (16,29,30). We benchmarked each method's; ability to retrieve genes in these sets. Figure 2 displays the benchmarking results, which shows that the method used in Cyclebase provides clear improvements over other methods and that combining all data for an organism is, not surprisingly, superior to any single dataset analyzed on its own. Based on the benchmarks, we have selected a set of significantly periodically expressed genes within each organism (labeled with a small ‘Periodic’ icon). We found 600 periodic genes in budding yeast, 500 in fission yeast, 600 in human and 400 in the plant A. thaliana. For these periodic genes, we compute the ‘peaktime’ based on all available expression profiles (16).

Figure 2.
Benchmark of methods for identifying cell-cycle regulated genes. For each of the four organisms, a benchmark set was compiled of genes whose promoters are bound by known cell-cycle transcription factors (16,29,30), under the assumption that these genes ...

The peaktime is a measure of when in the cell cycle a given gene is maximally expressed, and represents a summary of all the expression data (16). The peaktime is given as percent into the cell cycle (from when the new cell is born in cytokinesis) and is depicted as a red dot in both the expression profile chart (Figure 1B) and the peaktime chart (Figure 1D). The phase length can vary widely from organism to organism (e.g. G2-phase occupies ~60–70% of the cell cycle in fission yeast versus only ~25% in budding yeast), and the peaktime chart is therefore drawn differently for each species. Consequently, the peaktime values cannot be directly compared across organisms, since a specific percent (e.g. 60%) into the cell cycle may correspond to different phases in different organisms. The peaktime is only computed for genes that display periodicity and the remaining genes are labeled with ‘uncertain’ for the peaktime value. This label is also used if the different experiments disagree too much for a peaktime to be reliably assigned (16).

When comparing expression data across experiments, one issue is that different gene names for the same gene have been used in the different experiments. We have solved this problem by combining expression data and key results based on systematic gene identifiers. When they exist, a list of aliases is provided in the Gene Details page (Figure 1E), allowing the user to relate to the original experiment and to crosslink to external databases. The Gene Details page also contains a functional description (Figure 1E) populated from external databases (31–35) and is therefore not available for all genes.

All Cyclebase analysis results are available for download, both as values for individual genes and as whole-experiment datasets. XML and tab-delimited formats are available, both of which are fully documented on the website. Furthermore, where permission has been granted from the original authors, expression profile datasets are also available for download. Every page in Cyclebase also contains links to information about the database (FAQ and Methods), information about the individual experiments, and a link to the datasets available for download (Figure 1G).

OUTLOOK

Many more cell-cycle experiments may be performed in the future, and we encourage researchers to contact us, so that new cell-cycle experiments are analyzed consistently, and can be included in Cyclebase. As other types of large-scale experiments (e.g. metabolite information, kinase activity or protein expression) become available, it will become imperative that researchers integrate and analyze these data together with existing datasets. Cyclebase has been designed to store diverse data types from time-series experiments and we intend for Cyclebase to become a standard interface and tool for combining cell cycle datasets beyond transcriptional regulation. This would give researchers a one-stop shop for visualizing and downloading time-series events from the cell-cycle.

ACKNOWLEDGEMENTS

The authors wish to thank Hans-Henrik Stæfeldt, Kristoffer Rapacki and Peter W. Sacket for technical help with the database. This work was supported by grants from the Villum Kahn Rasmussen Foundation, the Danish Technical Research Council, as well as the BioSapiens Network of Excellence (LSHG-CT-2003-503265) funded by the European Commission FP6 Programme. Funding to pay the Open Access publication charges for this article was provided by the Villum Kahn Rasmussen Foundation.

Conflict of interest statement. None declared.

REFERENCES

1. Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, Wodicka L, Wolfsberg TG, Gabrielian AE, et al. A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell. 1998;2:65–73. [PubMed]
2. Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B. Comprehensive identification of cell cycle-regulated genes of the yeast S. cerevisiae by microarray hybridization. Mol. Biol. Cell. 1998;9:3273–3297. [PMC free article] [PubMed]
3. de Lichtenberg U, Wernersson R, Jensen TS, Nielsen HB, Fausbøll A, Schmidt P, Hansen FB, Knudsen S, Brunak S. New weakly expressed cell cycle-regulated genes in yeast. Yeast. 2005;22:1191–1201. [PubMed]
4. Pramila T, Wu W, Miles S, Noble WS, Breeden LL. The forkhead transcription factor Hcm1 regulates chromosome segregation genes and fills the S-phase gap in the transcriptional circuitry of the cell cycle. Genes Dev. 2006;20:2266–2278. [PMC free article] [PubMed]
5. Rustici G, Mata J, Kivinen K, Lió P, Penkett CJ, Burns G, Hayles J, Nurse P, et al. Periodic gene expression program of the fission yeast cell cycle. Nature Genet. 2004;36:809–817. [PubMed]
6. Peng X, Karuturi RK, Miller LD, Lin K, Jia Y, Kondu P, Wang L, Wong L, Liu ET, et al. Identification of cell cycle-regulated genes in fission yeast. Mol. Biol. Cell. 2005;16:1026–1042. [PMC free article] [PubMed]
7. Oliva A, Rosebrock A, Ferrezuelo F, Pyne S, Chen H, Skiena S, Futcher B, Leatherwood J. The cell cycle-regulated genes of Schizosaccharomyces pombe. PLoS Biol. 2005;3:e225. [PMC free article] [PubMed]
8. Whitfield ML, Sherlock G, Saldanha AJ, Murray JI, Ball CA, Alexander KE, Matese JC, Perou C, Hurt MM, et al. Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol. Biol. Cell. 2002:13. 1977–2000. [PMC free article] [PubMed]
9. Menges M, Hennig L, Gruissem W, Murray JAH. Genome-wide gene expression in an Arabidopsis cell suspension. Plant Mol. Biol. 2003;53:423–442. [PubMed]
10. Zhao LP, Prentice R, Breeden L. Statistical modeling of large microarray data sets to identify stimulus-response profiles. Proc. Natl Acad. Sci. USA. 2001;98:5631–5636. [PMC free article] [PubMed]
11. Johansson D, Lindgren P, Berglund A. A multivariate approach applied to microarray data for identification of genes with cell cycle-coupled transcription. Bioinformatics. 2003;19:467–473. [PubMed]
12. Langmead C, Yan T, McClung CR, Donald BR. Phase-independent rhythmic analysis of genome-wide expression patterns. J. Comput. Biol. 2003;10:521–536. [PubMed]
13. Lu X, Zhang W, Qin ZS, Kwast KE, Liu JS. Statistical resyncronization and Bayesian detection of periodically expressed genes. Nucleic Acids Res. 2004;32:447–455. [PMC free article] [PubMed]
14. Wichert S, Fokianos K, Strimmer K. Identifying periodically expressed transcripts in microarray time series data. Bioinformatics. 2004;20:5–20. [PubMed]
15. Luan Y, Li H. Model-based methods for identifying periodically expressed genes based on time course microarray gene expression data. Bioinformatics. 2004;20:332–339. [PubMed]
16. de Lichtenberg U, Jensen LJ, Fausbøl A, Jensen TS, Bork P, Brunak S. Comparison of computational methods for the identification of cell cycle regulated genes. Bioinformatics. 2005;21:1164–1171. [PubMed]
17. Ahdesmaki M, Lahdesmaki H, Pearson R, Huttunen H, Yli-Harja O. Robust detection of periodic time series measured from biological systems. BMC Bioinformatics. 2005;6:117. [PMC free article] [PubMed]
18. Chen J. Identification of significant periodic genes in microarray gene expression data. BMC Bioinformatics. 2005;6:286. [PMC free article] [PubMed]
19. Willbrand K, Radvanyi F, Nadal J-P, Thiery J-P, Fink TMA. Identifying genes from up-down properties of microarray expression series. Bioinformatics. 2005;21:3859–3864. [PubMed]
20. Qiu P, Wang ZJ, Liu KJR. Tracking the herd: resynchronization analysis of cell-cycle gene expression d ata in Saccharomyces cerevisiae. Conf. Proc. IEEE Eng. Med. Biol. Soc. 2005;5:4826–4829. [PubMed]
21. Qiu P, Wang ZJ, Liu KJR. Polynomial model approach for resynchronization analysis of cell-cycle gene expression data. Bioinformatics. 2006:22. 959–966. [PubMed]
22. Ahnert SE, Willbrand K, Brown FCS, Fink TMA. Unbiased pattern detection in microarray data series. Bioinformatics. 2006;22:1471–1476. [PubMed]
23. Andersson CR, Isaksson A, Gustafsson MG. Bayesian detection of periodic mRNA time profiles without use of training examples. BMC Bioinformatics. 2006;7:63. [PMC free article] [PubMed]
24. Glynn EF, Chen J, Mushegian AR. Detecting periodic patterns in unevenly spaced gene expression time series using Lomb-Scargle periodograms. Bioinformatics. 2006;22:310–316. [PubMed]
25. Lu Y, Rosenfeld R, Bar-Joseph Z. Identifying cycling genes by combining sequence homology and expression data. Bioinformatics. 2006;22:e314–e322. [PubMed]
26. Xu H, Sung W-K, Feng L. PEM: a general statistical approach for identifying differentially expressed genes in time-course cDNA microarray experiment without replicates. Proc. IEEE Computer Society Bioinformatics Conference; 2006. pp. 123–132. [PubMed]
27. Liew AW-C, Xian J, Wu S, Smith D, Yan H. Spectral estimation in unevenly sampled space of periodically expressed microarray time series data. BMC Bioinformatics. 2007;8:137. [PMC free article] [PubMed]
28. Lu Y, Mahony S, Benos PV, Rosenfeld R, Simon I, Breeden LL, Bar-Joseph Z. Combined analysis reveals a core set of cycling genes. Genome Biol. 2007;8:R146. [PMC free article] [PubMed]
29. Marguerat S, Jensen TS, de Lichtenberg U, Wilhelm BT, Jensen LJ, Bähler J. The more the merrier: comparative analysis of microarray studies on cell cycle-regulated genes in fission yeast. Yeast. 2006;23:261–277. [PMC free article] [PubMed]
30. Jensen LJ, Jensen TS, de Lichtenberg U, Brunak S, Bork P. Coevolution of transcriptional and post-translational cell-cycle regulation. Nature. 2006;443:594–597. [PubMed]
31. Nash R, Weng S, Hitz B, Balakrishnan R, Christie KR, Costanzo MC, Dwight SS, Engel S, Fisk DG, et al. Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms. Nucleic Acids Res. 2007;35:D468–D471. [PMC free article] [PubMed]
32. Hertz-Fowler C, Peacock CS, Wood V, Aslett M, Kerhornou A, Mooney P, Tivey A, Hall N, et al. GeneDB: a resource for prokaryotic and eukaryotic organisms. Nucleic Acids Res. 2004;32:D339–D343. [PMC free article] [PubMed]
33. Hubbard TJP, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Cunningham F, et al. Ensembl 2007. Nucleic Acids Res. 2007;35:D610–D617. [PMC free article] [PubMed]
34. Rhee SY, Beavis W, Berardini TZ, Chen G, Dixon D, Doyle A, Garcia-Hernandez M, Lander G, et al. The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Res. 2003;31:224–228. [PMC free article] [PubMed]
35. The UniProt Consortium. The universal protein resource (UniProt) Nucleic Acids Res. 2007;35:D193–D197. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...