Laboratory of Gene-Product Informatics, Center for Information Biology-DNA Data Bank of Japan, National Institute of Genetics, Research Organization of Information and Systems, Yata, Mishima, Shizuoka, Japan.
Assignment of all transcription factors (TFs) from genome sequence data is not a straightforward task due to the wide variation in TFs among different species. A DNA binding domain (DBD) and a contiguous non-DBD with a characteristic SCOP or Pfam domain combination are observed in most members of TF families. We found that most of the experimentally verified TFs in prokaryotes are detectable by a combination of SCOP or Pfam domains assigned to DBDs and non-DBDs. Based on this finding, we set up rules to detect TFs and classify them into 52 TF families. Application of the rules to 154 entirely sequenced prokaryotic genomes detected >18,000 TFs classified into families, which have been made publicly available from the 'GTOP_TF' database. Despite the rough proportionality of the number of TFs per genome with genome size, species with reduced genomes, i.e. obligatory parasites and symbionts, have only a few if any TFs, reflecting a nearly complete loss. Also the number of TFs is significantly lower in archaea than in bacteria. In addition, all but 1 of the 19 TF families present in archaea is present in bacteria, whereas 33 TF families are found exclusively in bacteria. This observation indicates that a number of new TF families have evolved in bacteria, making the transcription regulatory system more divergent in bacteria than in archaea.