Logo of narLink to Publisher's site
Nucleic Acids Res. Jan 2011; 39(Database issue): D1114–D1117.
Published online Nov 18, 2010. doi:  10.1093/nar/gkq1141
PMCID: PMC3013715

PlantTFDB 2.0: update and improvement of the comprehensive plant transcription factor database

Abstract

We updated the plant transcription factor (TF) database to version 2.0 (PlantTFDB 2.0, http://planttfdb.cbi.pku.edu.cn) which contains 53 319 putative TFs predicted from 49 species. We made detailed annotation including general information, domain feature, gene ontology, expression pattern and ortholog groups, as well as cross references to various databases and literature citations for these TFs classified into 58 newly defined families with computational approach and manual inspection. Multiple sequence alignments and phylogenetic trees for each family can be shown as Weblogo pictures or downloaded as text files. We have redesigned the user interface in the new version. Users can search TFs with much more flexibility through the improved advanced search page, and the search results can be exported into various formats for further analysis. In addition, we now provide web service for advanced users to access PlantTFDB 2.0 more efficiently.

INTRODUCTION

Transcription factors (TFs) are key regulators for transcriptional expression in biological processes (1). During the past years, several databases of plant TFs and other transcription regulators have been publicly available, such as PlnTFDB (2), PlantTAPDB (3), GRASSIUS (4), DATFAP (5), AGRIS (6), RARTF (7), LegumeTFDB (8) and TOBFAC (9). Start from 2005, we have constructed several species-specific plant TF databases with available genome sequences of Arabidopsis (DATF) (10), rice (DRTF) (11) and poplar (DPTF) (12), and integrated them into a comprehensive plant TF database (PlantTFDB 1.0) (13) with 26 402 TFs identified from 22 species. Of these 22 plants, five species have completed genome sequences and the others have unique transcripts integrated by PlantGDB (14). PlantTFDB 1.0 has received millions web hits since it went online in July 2007.

With the rapid increase of plant genome sequences in public databases, we have updated the PlantTFDB 1.0 to version 2.0. PlantTFDB 2.0 contains TFs from 49 species covering the main lineages of the plant kingdom, 9 from green algae, 1 from moss, 1 from fern, 3 from gymnosperm and 35 from angiosperm. Using the refined pipeline, a total of 53 319 TFs were identified from these 49 species and classified into 58 families. We made both computational annotation and manual curation for those putative TFs. In order to infer the evolutionary relationships among identified TFs, we constructed phylogenetic trees for each TF family and predicted ortholog groups for the TFs identified from species with completed genome sequences. The web interface of the PlantTFDB 2.0 was redesigned to provide users with more flexible search functionality. In addition to browsing through a web browser, standard web service interface is now supported for advanced users to retrieve data from PlantTFDB 2.0 in a batch mode or integrate data in PlantTFDB 2.0 into their website. All resources in PlantTFDB 2.0 can be browsed, retrieved and downloaded freely.

RESULTS AND DISCUSSION

Improved identification pipeline for plant TFs

While annotations generated by genome sequencing projects provide the most abundant source for proteome of the given species, the automatic annotation nature may often produce incomplete or incorrect annotation (15). On the other hand, dedicated sequence databases like RefSeq (16) provide relatively high quality curation-based annotation. And expressed sequence tag (EST) is also an important source to complement genome annotation. By integrating all existing annotations derived from genome annotation, RefSeq, PlantGDB (14) and UniGene (17), we compiled a non-redundant reference proteome dataset for all 49 species (Supplementary Table S1, Supplementary Figures S1 and S2) for TF prediction.

TFs are characterized by their signature DNA-binding domains (DBDs). We employed HMMER 3.0 to identify those signature DBDs from the above proteome data set. In total, 64 HMM models were used to identify domains in TF (Supplementary Table S2), of which 53 models were collected from Pfam 24.0 (18) and 11 models were built using the sequences we collected locally. In the previous version, we set e-value 0.01 as the threshold for domain identification. Based on manual inspection and literature review, we adopted domain-specific bit-score as the threshold in the current version, since e-value is dependent on the size of given protein data set (Supplementary Tables S3 and S4).

In PlantTFDB 2.0, we adopted a slightly stringent definition that TFs are ‘proteins that show sequence-specific DNA binding and are capable of activating or/and repressing transcription’ (19). We made an extensive literature review and refined the rule-based classification scheme accordingly (Figure 1 and Supplementary Table S5). In PlantTFDB 2.0, we excluded families that do not meet the above criteria (Supplementary Table S6), including transcription cofactors and chromatin-related proteins such as remodeling factors, histone demethylases, DNA methyltransferases and histone acetyltransferases. Families such as TUBBY-like and Alfin-like were also removed since they were questioned or disproved by new experimental evidences. On the other hand, five newly identified TF families (DBB, FAR1, LSD, NF-X1, STAT) were added in PlantTFDB 2.0. Due to differences in domain composition, DNA binding specificity and function, AP2/ERF and HB were divided to sub-families. The M type of MADS TFs was classified as a new subfamily, since it has been reported that some M type of MADS-box genes could be pseudogenes or a new class of transposable element (19). Finally, we predicted 53 319 TFs from 49 species and classified them into 58 families (Tables 1 and and2,2, Supplementary Tables S7 and S8) using the refined pipeline.

Figure 1.
Family assignment rules used to identify and assign TFs into different families. Green ellipses represent TF families, and red rectangles denote DBDs. Blue and purple rectangles denote auxiliary and forbidden domains, respectively. Green solid lines link ...
Table 1.
Summary of TFs identified from species with genome sequences
Table 2.
Summary of TFs identified from species without genome sequences

Comprehensive annotation for plant TFs

Comprehensive and accurate annotations derived from various sources provide valuable clues for further functional analysis. Based on our established annotation pipeline, we performed systematic annotation for each family and individual TF.

The main page of each family has a distribution chart to show the number of TFs of each species in this family. The information of brief introduction and key references for each family was updated based on literature survey. Multiple sequence alignments for DBDs of each family, either of individual species or among species, can be viewed as WebLogo pictures, or downloaded as text files. Phylogenetic trees can be displayed online or downloaded to local PC in Nexus format. Intra-species phylogenetic trees for each TF family were inferred by MrBayes (v3.2) (20) using the Dayhoff substitution model with 50 000 generations, and FastTree2.1 (21) was employed to construct inter-species trees with 100 resamplings. Annotations at the individual TF level contain general information, domain architecture, gene ontology, PDB hits, expression profiles, cross-references to other databases, ortholog groups, literature citations and links to other useful resources.

Improvement of user interface

We have redesigned the web interface for PlantTFDB 2.0 which has a uniform interface for all species now. Users can browse individual TFs of different families for each species by simply clicking the unique IDs assigned to each TF. The text search page has been greatly improved with much more flexibility for users to make advanced search. Users can select several species in the same or different lineages within the species tree to search TFs in one or more families. Users can combine several query conditions in a single search, including general descriptions, protein properties such as the range of sequence length, various tissues of gene expression and different fields of annotation for TF entries. Users can also customize and save the search results in various formats for further processing.

While accessing the resource through web browsers is an easy and intuitive way for most users, web service is efficient for advanced users to access and integrate data into their own sites. We implemented a standard web service interface for PlantTFDB 2.0 (http://planttfdb.cbi.pku.edu.cn/webservice/server.php). A demo for client implementation in PHP is available to help users to get familiar with the web service interface (http://planttfdb.cbi.pku.edu.cn/webservice_client/client.php).

FURTHER DIRECTION

In conclusion, PlantTFDB 2.0 is not only an extensive update of the previous version with newly released 29 completed genomes and updated data sets, but also a great improvement of the user interface. The pipelines we developed for the prediction of TFs at genome scale, the scheme we defined to classify TF families in plants may provide the user community with some useful tools. We will continue on this project to make further update and improvement of PlantTFDB in the future.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

China 863 (2007AA02Z165), 973 (2007CB946904) and NSFC (31071160) programs. Funding for open access publication: China NSFC (31071160) program.

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

We thank JGI for genome annotations of 10 unpublished species, MGSC for Medicago truncatula data. We appreciate critical comments from all users.

REFERENCES

1. Riechmann JL, Heard J, Martin G, Reuber L, Jiang C, Keddie J, Adam L, Pineda O, Ratcliffe OJ, Samaha RR, et al. Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science. 2000;290:2105–2110. [PubMed]
2. Perez-Rodriguez P, Riano-Pachon DM, Correa LG, Rensing SA, Kersten B, Mueller-Roeber B. PlnTFDB: updated content and new features of the plant transcription factor database. Nucleic Acids Res. 2010;38:D822–D827. [PMC free article] [PubMed]
3. Richardt S, Lang D, Reski R, Frank W, Rensing SA. PlanTAPDB, a phylogeny-based resource of plant transcription-associated proteins. Plant Physiol. 2007;143:1452–1466. [PMC free article] [PubMed]
4. Yilmaz A, Nishiyama MY, Jr, Fuentes BG, Souza GM, Janies D, Gray J, Grotewold E. GRASSIUS: a platform for comparative regulatory genomics across the grasses. Plant Physiol. 2009;149:171–180. [PMC free article] [PubMed]
5. Fredslund J. DATFAP: a database of primers and homology alignments for transcription factors from 13 plant species. BMC Genomics. 2008;9:140. [PMC free article] [PubMed]
6. Palaniswamy SK, James S, Sun H, Lamb RS, Davuluri RV, Grotewold E. AGRIS and AtRegNet. a platform to link cis-regulatory elements and transcription factors into regulatory networks. Plant Physiol. 2006;140:818–829. [PMC free article] [PubMed]
7. Iida K, Seki M, Sakurai T, Satou M, Akiyama K, Toyoda T, Konagaya A, Shinozaki K. RARTF: database and tools for complete sets of Arabidopsis transcription factors. DNA Res. 2005;12:247–256. [PubMed]
8. Mochida K, Yoshida T, Sakurai T, Yamaguchi-Shinozaki K, Shinozaki K, Tran LS. LegumeTFDB: an integrative database of Glycine max, Lotus japonicus and Medicago truncatula transcription factors. Bioinformatics. 2010;26:290–291. [PubMed]
9. Rushton PJ, Bokowiec MT, Laudeman TW, Brannock JF, Chen X, Timko MP. TOBFAC: the database of tobacco transcription factors. BMC Bioinformatics. 2008;9:53. [PMC free article] [PubMed]
10. Guo A, He K, Liu D, Bai S, Gu X, Wei L, Luo J. DATF: a database of Arabidopsis transcription factors. Bioinformatics. 2005;21:2568–2569. [PubMed]
11. Gao G, Zhong Y, Guo A, Zhu Q, Tang W, Zheng W, Gu X, Wei L, Luo J. DRTF: a database of rice transcription factors. Bioinformatics. 2006;22:1286–1287. [PubMed]
12. Zhu QH, Guo AY, Gao G, Zhong YF, Xu M, Huang M, Luo J. DPTF: a database of poplar transcription factors. Bioinformatics. 2007;23:1307–1308. [PubMed]
13. Guo AY, Chen X, Gao G, Zhang H, Zhu QH, Liu XC, Zhong YF, Gu X, He K, Luo J. PlantTFDB: a comprehensive plant transcription factor database. Nucleic Acids Res. 2008;36:D966–D969. [PMC free article] [PubMed]
14. Duvick J, Fu A, Muppirala U, Sabharwal M, Wilkerson MD, Lawrence CJ, Lushbough C, Brendel V. PlantGDB: a resource for comparative plant genomics. Nucleic Acids Res. 2008;36:D959–D965. [PMC free article] [PubMed]
15. Ouyang S, Thibaud-Nissen F, Childs KL, Zhu W, Buell CR. Plant genome annotation methods. Methods Mol. Biol. 2009;513:263–282. [PubMed]
16. Pruitt KD, Tatusova T, Klimke W, Maglott DR. NCBI Reference Sequences: current status, policy and new initiatives. Nucleic Acids Res. 2009;37:D32–D36. [PMC free article] [PubMed]
17. Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Federhen S, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2010;38:D5–D16. [PMC free article] [PubMed]
18. Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, et al. The Pfam protein families database. Nucleic Acids Res. 2010;38:D211–D222. [PMC free article] [PubMed]
19. Riechmann J. Transcription factors of Arabidopsis and rice: a genomic perspective. In: Grasser K, editor. Regulation of Transcription in Plants. Oxford: Wiley-Blackwell; 2006. pp. 28–53.
20. Ronquist F, Huelsenbeck JP. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003;19:1572–1574. [PubMed]
21. Price MN, Dehal PS, Arkin AP. FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5:e9490. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...