Clustering of Type II REase sequences and their assignment to three-dimensional folds. (A) Representative structures of nuclease domains of Type II REases or proteins sharing the same fold: PD-(D/E)XK: BamHI (3bam); the universally conserved core is indicated in green, nonconserved structures in gray, HNH: catalytic domain of T4 endonuclease VII (1en7), PLD: catalytic domain of R.BfiI (2c1l), GIY-YIG: catalytic domain of homing endonuclease I-TevI (1mk0), HALFPIPE: R.PabI (2dvy). (B) Results of clustering of Type II REases from REBASE and their homologs in the nr and env_nr database with CLANS (with promiscuous domains, such as MTase or GHKL domains, excluded from analysis). Structures in (A) and sequences in (B) are colored according to the their assignment to fold families (see below): PD-D(E)XK: green, HNH: blue, GIY-YIG: yellow, PLD: magenta, HALFPIPE: cyan, unclassified: red. Connections between dots represent the degree of pairwise sequence similarity, as quantified by BLAST P-value (the darker the line, the higher similarity). The whole ‘galaxy’ of REases is held together by a certain level of ‘background’ similarity between different (often unrelated) sequences that is due to pure chance. Thus, while connections within dense clusters practically always reflect high similarity and evolutionary relationship, connections between clusters do not have to reflect their phylogenetic relationships (although they often do, especially in the case of close connections with multiple dark lines). All subfamilies with >20 members or with representatives with solved X-ray structures have been labeled by the name of their representative sequence.