![]() | ![]() |
Formats:
|
||||||||||
Copyright © 2007 Wu et al; licensee BioMed Central Ltd. cTFbase: a database for comparative genomics of transcription factors in cyanobacteria 1Institute of Biomedical Informatics/Zhejiang Provincial Key Laboratory of Medical Genetics, Wenzhou Medical College, Wenzhou 325000, China 2Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China 3Department of Biochemistry and Molecular Biology, Pennsylvania State University, Pennsylvania 16802, USA Corresponding author.Jinyu Wu: iamwujy/at/yahoo.com.cn; Fangqing Zhao: biofqzhao/at/gmail.com; Shengqin Wang: wzsqwang/at/gmail.com; Gang Deng: gdeng/at/wzmc.net; Junrong Wang: jrwang/at/wzmc.net; Jie Bai: baijie/at/wzmc.net; Jianxin Lu: jxlu313/at/163.com; Jia Qu: jqu/at/wz.zj.cn; Qiyu Bao: baoqy/at/genomics.ac.cn Received January 10, 2007; Accepted April 18, 2007. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This article has been cited by other articles in PMC.Abstract Background Comprehensive identification and classification of the transcription factors (TFs) in a given genome is an important aspect in understanding transcriptional regulatory networks of a specific organism. Cyanobacteria are an ancient group of gram-negative bacteria with strong variation in genome size ranging from about 1.6 to 9.1 Mb and little is known about their TF repertoires. Therefore, we constructed the cTFbase database to classify and analyze all the putative TFs in cyanobacterial genomes, followed by genome-wide comparative analysis. Description In the current release, cTFbase contains 1288 putative TFs identified from 21 fully sequenced cyanobacterial genomes. Through its user-friendly interactive interface, users can employ various criteria to retrieve all TF sequences and their detailed annotation information, including sequence features, domain architecture and sequence similarity against the linked databases. Furthermore, cTFbase provides phylogenetic trees of individual TF family, multiple sequence alignments of the DNA-binding domains and ortholog identification from any selected genomes. Comparative analysis revealed great variability of the TF sequences in cyanobacterial genomes. The high variance on the gene number and domain organization would be related to their diverse biological functions and their adaptation to various environmental conditions. Conclusion cTFbase provides a centralized warehouse for comparative analysis of putative TFs in cyanobacterial genomes. The availability of such an extensive database would be of great interest for the community of researchers working on TFs or transcriptional regulatory networks in cyanobacteria. cTFbase can be freely accessible at http://cegwz.com/ and will be continuously updated when the newly sequenced cyanobacterial genomes are available. Background Deciphering and reconstructing gene transcriptional regulatory networks are important for better understanding the fundamental cellular processes, such as cell division, growth control and gene expression by which cells can adapt to environment more effectively [1]. The most basic components of transcriptional regulatory networks are the transcription factors (TFs), TF binding sites at upstream and downstream of the target genes and the target genes. Among them, TFs play the crucial role by enhancing or inhibiting the target gene expression by means of binding to the promoter sequences. Studies on TFs are of extreme significance and can glean more information about the mechanism of transcriptional regulatory networks. Genome-wide analysis of completely sequenced genomes have revealed that TFs account for a large proportion of all encoded proteins [2-4]. Escherichia coli was one of the best-studied organisms and was revealed to have more than 271 TFs in its whole genome. Despite the divergent domain organizations and sequence identities, the TFs characterized to share a significant degree of structural similarity of the DNA-binding domain (DBD), which binds to the specific DNA region [5]. TFs can be classified into several families based on structure difference of DBD which include helix-turn-helix motif, Zinc fingers, Leucine zippers, and Basic-helix-loop-helix, etc. Helix-turn-helix motif is the most common structure of DBD in prokaryotes [3,4,6]. Cyanobacteria are an ancient group of gram-negative bacteria, which exhibit extraordinary diversity in physiological properties, ecological niches and morphology [7]. They survive in different environments, such as fresh and marine waters and extreme conditions. Different member of cyanobacteria shows a remarkable size variation ranging from about 1.6 to 9.1 Mb. For example, Prochlorococcus sp. 1986 is only 1.75 Mb in genome size which is supposed to be one of the smallest genomes and most compact oxyphototrophic organisms discovered to date [8]. Nostoc punctiforme has a large genome size and complicated ecological niches which may suggest a relatively sophisticated organization of this species [9]. Currently, 21 cyanobacterial genomes have been fully sequenced, representing a wide range of species from unicellular to filamentous ones. In addition, more than 20 members of cyanobacteria are in the process of in-finishing or being sequenced [10,11]. Such data resources undoubtedly provide an opportunity for genome-wide analysis of TFs in cyanobacteria. To date, only a few numbers of TFs in cyanobacteria have been studied in detail. However, those studied provide useful insights into the crucial roles of TFs in biology, and further into their functions. NtcA, one of the extensively studied TF that mediates global nitrogen control and regulates many genes involved in nitrogen assimilation, was identified in all cyanobacteria [12]. FUR could regulate iron assimilation and storage, and modulate the expression of genes involved in the response to different environmental stresses [13]. NtcB, another member of TFs, was identified to activate nitrate assimilation [14]. In order to make all the putative TFs in cyanobacteria available to the scientific community and fill in the gap without online databases, we present cTFbase, which will be a valuable resource for further research of TFs and transcriptional regulatory networks in cyanobacteria. Whole genome comparative analysis revealed great variability of the TFs in cyanobacterial genomes. The high variance on the gene number and domain organization would be related to their diverse biological functions and their adaptation to various environmental conditions. Construction and content Collection of TFs The amino acid sequences of the protein-encoding genes of 21 cyanobacterial genomes were retrieved from the IMG database (version 2.0). A number of bioinformatics tools were used to identify the TFs (Fig. (Fig.1).1
Implementation and web interface The popular MySQL backend was used as the database machine to store the results. Web interfaces for database browsing and the browse result pages were developed using PHP scripts. Through the cTFbase web system, the following main functionalities are implemented in the current release: (1) To browse domain architectures of TFs. The repertoire of TFs can be browsed and the domain architectures can be viewed from a selected genome. It can also show the special TF families and their related domain architectures in one or all the cyanobacterial genomes; (2) To identify the orthologs from the selected genomes. The results of orthology relationships from multiple species was previously performed using the program OrthoMCL [21]; (3) To browse the phylogeny of individual TF family. Using the neighbor joining method, the phylogenetic tree for each of the TF families was constructed based on the whole TF sequences. The reasons for using the whole sequences of TFs instead of their DBD motifs to perform phylogenetic analysis was that the sequence of DBD motif in specific family is well conserved and the length is quite short, which may lead to very few deep nodes supported by high bootstrap values; (4) To search the database via protein ID, species or family; (5) To perform a BLAST-based sequence similarity search. Users can search their target sequences or identify homologs in the database; (6) To perform sequence alignment. The sequence alignment tool, MUSCLE [22], was implemented to enable users to align amino acid sequences of DBDs within the specific families; (7) To link some useful references, including literatures and databases; (8) To download one (or all) specific TF sequence(s), including proteins and/or their corresponding DNA sequences, and phylogenetic trees in phylip format. Furthermore, each entry provides the sequence itself and detailed annotations, including basic information, domain architecture assigned by Pfam database (version 21.0) and SUPERFAMILY database (release 1.69) and sequence similarity against major databases (PDB collected by 11-March-2007, Swiss-prot release 52.0, Refseq release 22 and DBD version 2.0). Through links to inner section and other databases, it would be a platform in which information on putative TFs in cyanobacteria has highly integrated and will be a centralized warehouse for the comparative genomic analysis. We expect that this database will help to further understand the transcriptional regulatory networks of microbiology. Utility and discussion Comparison of TFs among different species of cyanobacteria Different from other prokaryotes, cyanobacteria exhibit extraordinary diversities in physiology, ecological niches and morphology. Genome-wide analysis of TFs undoubtedly helps to understand the relationship between cyanobacterial TF composition and their environmental adaptation. Here, we found that the cyanobacteria living in fresh water or soil has a larger amount of putative TFs comparing to those living in marine water (The distribution of TFs in these 21 cyanobacterial genomes can be seen in the statistics section of website). Given that the genome size of the fresh water or soil cyanobacteria is much larger than those of the marine species and it is well known that growth in bacterial genome size is accompanied with the accumulation of paralogous protein families [23], we then calculated the relative number of TFs in cyanobacteria. The results demonstrate that the relative number of TFs in fresh water or soil cyanobacteria is still significantly higher than that in marine cyanobacteria (Fig. (Fig.2).2
Several recent studies showed that many TFs from both eukaryotes and prokaryotes contain additional domains with distinct functions [5,29]. In support of this, a large number of TFs were found to possess at least one other domain besides DBDs among all the putative TFs in cyanobacteria (Fig. (Fig.3).3
Furthermore, we found that there were 12 putative TF families were present in all cyanobacterial genomes. Among them, four families (BolA, DUF387, SfsA and DnaA) have nearly the same gene copies over the genomes, which highlight the fundamental importance of these families. They are presumably very ancient families shared by the most recent common ancestor of cyanobacteria and may have not undergone lineage-specific expansions/loss or horizontal gene transfer. The remaining eight families (OmpR, GerE, Crp, LysR, arsR, FUR, GntR, Bac_DNA_binding) exhibit different distribution patterns among various species. However, we found that a variety of orthologous TFs in these families formed monophyletic clades, which were strongly supported by their high bootstrap values (nearly 100% in several clades) of the constructed phylogenetic trees mentioned above (phylogenetic trees could be queried on the website). Within the families of FUR, Crp, LysR and GntR, only one such branch is found, whereas two branches are observed in GerE and OmpR phylogenies, respectively. ArsR and Bac_DNA_binding phylogenies, however, do not have any such branches. Previously, Brune et al. [2] and Moreno-Campuzano et al. [31] identified the conserved TFs among Corynebacterium and Firmicutes, respectively. Here, we for the first time defined a minimal core of conserved TFs in cyanobacteria: the putative TFs in these nine branches plus these four TF families mentioned above (BolA, DUF387, SfsA and DnaA). These "universal" putative TFs mediate the functions of response regulators of two-component systems (OmpR, GerE), global nitrogen control (Crp), cell-cycle regulation (BolA), sugar fermentation (SfsA), chromosomal replication initiating and regulating (DnaA), general metabolism (GntR), chromosome condensation and segregation (DUF387), iron homeostasis control (FUR), CO2 fixation (LysR) and so on. As the physiological function of most TFs in cyanobacteria is still unknown, this identified core set of conserved TFs might thus provide some guidance for further investigations. Conclusion Currently, the cTFbase is limited to 21 cyanobacterial genomes retrieved from IMG database and works as a centralized warehouse for the comparative genomic analysis of putative TFs in cyanobacteria. Without regular update, however, the database would quickly lose its advantages. Therefore, we prepared to update its data on a regular basis and our update policy is mainly based on following three cases. First, the repertoire of TFs will be identified and integrated into the database when newly completed or draft cyanobacterial genomes are available. Second, novel TFs verified by experiments will be also added into the cTFbase. We encourage users to submit new TFs to our database through the interactive web interface. Third, cTFbase will be updated periodically according to main databases, such as the Pfam and SUPERFAMILY database. Any questions, comments and suggestions will be welcome, which will be a useful feedback for future updating. Availability and requirements Project name cTFbase: a database for comparative genomics of transcription factors in cyanobacteria Project home page Operating system(s) For user: Standard WWW browser (Safari, Mozilla and Internet Explorer); For server: Linux Programming language PHP, SQL, Perl and Bioperl License GNU GPL Any restrictions to use by non-academics None Authors' contributions JW performed bioinformatic analysis, constructed the database, developed the web interface, and wrote the manuscript. SW helped with the design of web interface and update of the database. GD, JB and JW prepared the figures in manuscript and website. JQ and JL provided scientific suggestions and criticisms for improving the manuscript and website. QB and FZ participated in its design, helped write the manuscript and supervised the whole project. All authors read and approved the final manuscript. Acknowledgements We are grateful to Dr. Juyuan Zhang from Huazhong Agricultural University and Mr. Rusty Childers from Wenzhou Medical College for their checking the writing of our manuscript. We are indebted to institute of biomedical informatics and Zhejiang provincial key laboratory of medical genetics (Wenzhou Medical College, China). This work was supported by the National Natural Science Foundation of China (30571009). References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||
Curr Opin Struct Biol. 2004 Jun; 14(3):283-91.
[Curr Opin Struct Biol. 2004]BMC Genomics. 2005 Jun 7; 6(1):86.
[BMC Genomics. 2005]Comput Biol Chem. 2004 Dec; 28(5-6):341-50.
[Comput Biol Chem. 2004]FEMS Microbiol Rev. 2005 Apr; 29(2):231-62.
[FEMS Microbiol Rev. 2005]DNA Res. 2005; 12(5):269-80.
[DNA Res. 2005]J Bacteriol. 2001 Jan; 183(2):411-25.
[J Bacteriol. 2001]Microbiology. 1996 Jun; 142 ( Pt 6)():1469-76.
[Microbiology. 1996]J Bacteriol. 2001 Oct; 183(20):5840-7.
[J Bacteriol. 2001]J Mol Biol. 2001 Nov 2; 313(4):903-19.
[J Mol Biol. 2001]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D247-51.
[Nucleic Acids Res. 2006]Bioinformatics. 1998; 14(9):755-63.
[Bioinformatics. 1998]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D187-91.
[Nucleic Acids Res. 2006]Nucleic Acids Res. 1990 Sep 11; 18(17):5019-26.
[Nucleic Acids Res. 1990]Genome Res. 2003 Sep; 13(9):2178-89.
[Genome Res. 2003]Nucleic Acids Res. 2004; 32(5):1792-7.
[Nucleic Acids Res. 2004]Proc Natl Acad Sci U S A. 2004 Mar 2; 101(9):3160-5.
[Proc Natl Acad Sci U S A. 2004]J Mol Microbiol Biotechnol. 2005; 9(3-4):154-66.
[J Mol Microbiol Biotechnol. 2005]Physiol Genomics. 2006 Feb 14; 24(3):181-90.
[Physiol Genomics. 2006]Photosynth Res. 2001; 70(1):85-106.
[Photosynth Res. 2001]Proc Natl Acad Sci U S A. 2003 Aug 19; 100(17):9647-9.
[Proc Natl Acad Sci U S A. 2003]FEMS Microbiol Rev. 2005 Apr; 29(2):231-62.
[FEMS Microbiol Rev. 2005]J Bacteriol. 2006 Jun; 188(12):4169-82.
[J Bacteriol. 2006]Microbiol Mol Biol Rev. 2006 Jun; 70(2):472-509.
[Microbiol Mol Biol Rev. 2006]BMC Genomics. 2005 Jun 7; 6(1):86.
[BMC Genomics. 2005]BMC Genomics. 2006 Jun 13; 7():147.
[BMC Genomics. 2006]