• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Jun 2008; 36(11): 3552–3569.
Published online May 2, 2008. doi:  10.1093/nar/gkn175
PMCID: PMC2441816

Structural and evolutionary classification of Type II restriction enzymes based on theoretical and experimental analyses

Abstract

For a very long time, Type II restriction enzymes (REases) have been a paradigm of ORFans: proteins with no detectable similarity to each other and to any other protein in the database, despite common cellular and biochemical function. Crystallographic analyses published until January 2008 provided high-resolution structures for only 28 of 1637 Type II REase sequences available in the Restriction Enzyme database (REBASE). Among these structures, all but two possess catalytic domains with the common PD-(D/E)XK nuclease fold. Two structures are unrelated to the others: R.BfiI exhibits the phospholipase D (PLD) fold, while R.PabI has a new fold termed ‘half-pipe’. Thus far, bioinformatic studies supported by site-directed mutagenesis have extended the number of tentatively assigned REase folds to five (now including also GIY-YIG and HNH folds identified earlier in homing endonucleases) and provided structural predictions for dozens of REase sequences without experimentally solved structures. Here, we present a comprehensive study of all Type II REase sequences available in REBASE together with their homologs detectable in the nonredundant and environmental samples databases at the NCBI. We present the summary and critical evaluation of structural assignments and predictions reported earlier, new classification of all REase sequences into families, domain architecture analysis and new predictions of three-dimensional folds. Among 289 experimentally characterized (not putative) Type II REases, whose apparently full-length sequences are available in REBASE, we assign 199 (69%) to contain the PD-(D/E)XK domain. The HNH domain is the second most common, with 24 (8%) members. When putative REases are taken into account, the fraction of PD-(D/E)XK and HNH folds changes to 48% and 30%, respectively. Fifty-six characterized (and 521 predicted) REases remain unassigned to any of the five REase folds identified so far, and may exhibit new architectures. These enzymes are proposed as the most interesting targets for structure determination by high-resolution experimental methods. Our analysis provides the first comprehensive map of sequence-structure relationships among Type II REases and will help to focus the efforts of structural and functional genomics of this large and biotechnologically important class of enzymes.

INTRODUCTION

Type II restriction endonucleases (REases) are enzymes that recognize short DNA sequences (usually 4–8-bp long) and cleave the target in both strands at, or in close proximity to the recognition site. Orthodox REases are homodimeric, cleave within palindromic sequences, require Mg2+ ions and can act on single copies of their targets. Type II enzymes that exhibit structural and functional peculiarities (requirement of more than one target site for cleavage, cleavage at a distance from the asymmetrical target, etc.) have been classified into subtypes [nomenclature reviewed in ref. (1)]. Because of remarkably high specificity in recognizing and cleaving their target sequences, they are of high interest as model systems for analyzing protein-DNA interactions and one of the most frequently used tools for recombinant DNA technology [most recent reviews: (2,3), a comprehensive collection of reviews on REases has been also published as a book (4)]. In nature, Type II REases are found in prokaryotic organisms, where they form so-called Restriction-Modification (RM) systems with DNA methyltransferases (MTases) of the same or very similar substrate specificity. DNA MTases use S-adenosylmethionine (AdoMet) as a methyl group donor to modify specific bases in the target sequence, thereby rendering it resistant to cleavage by the REase. Thus, while the RM system's own DNA (together with the whole DNA of the prokaryotic host) is protected against suicidal degradation by REase, any foreign DNA that invades the host cell and lacks protective methylation (e.g. phages, plasmids, etc.), may be efficiently destroyed (5). In order to distinguish the components of RM system the names of MTase and REase are preceded with ‘M’. and ‘R.’ prefixes, respectively, (e.g. M.FokI and R.FokI).

Type II REases have a very high specificity and simple substrate requirements, which makes them very popular as tools in biotechnology. There are other classes of REases (Types I, III and IV), multisubunit and complex molecular machines that may combine multiple activities including restriction, methylation and DNA translocation, require additional cofactors (e.g. AdoMet, ATP or GTP), bind more than one target site, and cleave outside the recognition sequence, often at a random distance. Comparative analysis of these enzymes is outside the scope of this article, the reader is referred to recent review articles for a survey and summary of their functional properties (4,6,7). A wealth of information about all REases, including sequences, structures and functional annotations, is stored in a dedicated database REBASE (8).

Since the first genes encoding Type II REases were cloned and sequenced, comparisons have been made, aimed at detecting similarities indicative of common evolutionary history and/or mechanism of action (9–11). Surprisingly, these analyses revealed very little sequence similarity, usually limited to groups of isoschizomers, i.e. enzymes that exhibit identical DNA recognition sites and cleavage specificities (11,12). Database searches with REase sequences typically revealed either no significant similarity to any protein, or very high similarity (often >90% identity) to a few isoschizomers, and no similarity to other proteins. This strongly biased distribution of similarities and dissimilarity made comparative sequence analysis of all REases impossible with the use of standard tools for sequence alignment and raised a question whether the diversity of amino acid sequences of REases indicates polyphyletic evolution (convergence) or extreme divergence from a common ancestor (5,13).

The first answer to the question whether or not REases are related to each other was provided by crystallographic analyses. Already the first two structures of REases with apparently dissimilar sequences [R.EcoRI (14) and R.EcoRV (15)] revealed a common three-dimensional fold and similar active sites (16), which indicates that they are evolutionarily related and that the overall sequence dissimilarity is due to divergent evolution (homology) rather than convergence (analogy). Essentially the same features were repeatedly observed in all crystal structures of Type II REases, at least until 2005, and in many other nucleases involved in a variety of cellular processes, e.g. DNA repair enzyme MutH or Holliday junction resolvases (17,18). Catalytic domains of these proteins share a common structural core, comprising a mixed β-sheet of 4 strands flanked on both sides by α-helices and additional, variable elements of secondary structure (16,19–21). The core serves as a scaffold for a weakly conserved active site, typically comprising two or three acidic residues (Asp or Glu) and one Lys residue, which together form the hallmark bipartite catalytic motif (P)DXn…(D/E)XK (where X is any amino acid). This motif has led to naming this superfamily of proteins as ‘PD-(D/E)XK’ (22,23).

It was found that some members of the PD-(D/E)XK superfamily exhibit deviations from the consensus. First, the active sites of Type II REases often contain nonstandard residues at the otherwise conserved positions, e.g. Q or N at the positions occupied by the (D/E)XK half-motif (24,25). Second, catalytic residues have been also found to ‘migrate’ between nonequivalent positions in sequence, preserving the spatial orientation of functional groups in the active site without the correspondence at the level of the sequence alignment (26–28). These two features have been also reported in some non-REase members of the PD-(D/E)XK superfamily (23,29,30), but when combined with the extreme overall sequence divergence characteristic for REases, they essentially prevent the identification of an active site by ‘sequence gazing’. As a result, sequence–function analysis usually requires the aid of three-dimensional structure (ideally—solved experimentally, or obtained by comparative modeling techniques).

Type II REases are notorious for presenting elaborations of the common fold in the form of large insertions and terminal extensions that often contain regular elements of secondary structure, even entire domains. These elaborations form a variable ‘shell’ surrounding the conserved core and are often involved in DNA binding or formation of contacts between protomers in oligomeric structures. They may be responsible for the formation of completely different quaternary structures even by enzymes that are very similar at the level of tertiary structure, e.g. R.EcoRV and R.BglI (31). In a phylogenetic tree of PD-(D/E)XK enzymes with known structures, Type II REases radiate from all major branches of the superfamily, indicating multiple independent recruitment of the same fold to the process of restriction. The accumulation of a large number of changes suggests higher speed of evolution associated with being involved in restriction, compared to other PD-(D/E)XK enzymes involved in house-keeping processes such as DNA repair (20). Type II REases are therefore extremely hard targets for protein structure prediction methods, and even detection of the PD-(D/E)XK motif in their sequence remains a formidable challenge (20,32).

Not all REases, however, are members of the PD-(D/E)XK superfamily. In 2000, three groups discovered a few REases that appeared to be members of structurally and evolutionarily unrelated superfamilies: Siksnys and co-workers discovered that R.BfiI belongs to the phospholipase D (PLD) superfamily (33), the group of Koonin and independently one of the authors of this article (J.M.B.) predicted that a few REases belong to the HNH superfamily (34,35); J.M.B. also predicted that R.Eco29kI and its two nearly identical isoschizomers belong to the GIY-YIG superfamily (35). Since then, all these theoretical predictions have been confirmed experimentally. The structure of R.BfiI has been solved, revealing a PLD-like dimer of catalytic domains with a single symmetrical active site at the domain interface (36). Structural models of HNH nuclease domain in R.KpnI (37) and GIY-YIG nuclease domain in R.Eco29kI (38) have been supported by mutagenesis and biochemical experiments. Most recently, a newly identified REase R.PabI was predicted to be a candidate for a new fold (39), which has been validated by X-ray crystallography and mutagenesis, revealing a novel tertiary and quaternary architecture (40). It must be mentioned that two of these nonstandard enzymes (R.BfiI and R.PabI) exhibit a feature that may be even more unusual than their nonclassical folds: they cleave DNA in the absence of metal ions (33,40). Thus, structurally characterized Type II REases present five unrelated three-dimensional folds, several different variants of active sites and catalytic mechanisms, and a plethora of modes for protein–protein and protein–DNA interactions.

REBASE, the database of restriction enzymes makes available to the public (as of 25 January 2008) 1637 sequences of Type II REases, including 302 experimentally characterized enzymes and 1335 putative ones, inferred from sequence comparisons or genomic analyses. Many REase candidates are ORFans, i.e. proteins that show no similarity to any other protein (or only very high similarity to a few other proteins). Some of them have been predicted only because they are encoded by genes located close to genes encoding true or predicted DNA MTases. The disproportion between the number of known or predicted sequences and the number of experimentally characterized proteins with known three-dimensional structures (>50 to 1) is similar to the average value reported for sequences inferred from genome sequencing projects. Thus, Type II REases can be regarded as a ‘firing range’ for structural genomics projects in a sense that any methodology (theoretical or experimental) developed to narrow down this gap may be broadly applicable to all proteins. Some efforts have been made in this direction. Bioinformatics analyses have been made to assign a fraction of REase sequences to the previously identified folds (22,23,34,35,41) and site-directed mutagenesis has been used, often in connection with the circular dichroism (CD) analysis, to test some of these predictions [e.g. (27,42–49)]. Because of the difficulties in predicting variable regions, most of the published alignments and models contain only the catalytic domain, or just the immediate neighborhood of the active site. Nonetheless, these predictions, especially if supported by experimental data, are usually sufficient to provide a confident three-dimensional fold assignment (which implies evolutionary relationship to other members), and provide numerous additional hints regarding the possible mechanism of action (e.g. the mode of DNA binding).

Bioinformatic and low-resolution experimental analyses have aided X-ray crystallography in assigning a number of Type II REases to known folds and superfamilies. However, a large fraction of REases remains without any predictions or experimental data. Moreover, there is no single resource a researcher could use, that indicates whether any structural or evolutionary prediction has been made for a given REase sequence, what the assigned fold is, where the structural model is available, and whether any experimental data support the theoretical analyses. Currently, navigation in a large volume of data and literature concerning different REase structures and families is very difficult not only for newcomers in the REase field, but also for biochemists, who are not necessarily experts in molecular evolution or structural bioinformatics, but would like to take the advantage of published predictions to plan new experiments. We have therefore decided to survey the published literature and databases for experimental data and predictions concerning the structure of all Type II REases with sequences available in REBASE, and to make new predictions for the great majority that had no such information available. We carried out a search for additional homologs of Type II REases, not yet available in REBASE, and clustered all sequences to identify groups of close homologs that are likely to share very similar structures as well as substrate specificities (isoschizomers or nearly-isoschizomers). As a result, we provide the very first classification of all Type II REase sequences into families and superfamilies, and a comprehensive structural census. We also provide a list of prospective candidates for crystallographic analyses, with two priorities in mind: (i) maximization of structural coverage (availability of structural templates for confident modeling of a possibly largest number of sequences significantly related to these templates), and (ii) high-resolution structural characterization of folds that are either completely new or at least have not been reported among Type II REases.

METHODS

Sequence analyses

Sequence searches of the nonredundant (nr) and environmental samples (env_nr) database were carried out using a locally installed version of PSI-BLAST (50). Gapped blast algorithm (blastpgp) was used with default parameters [BLOSUM62 substitution matrix, gap open penalty 11, gap extension penalty 1, without iterating and with expectation (E) value threshold of 0.02].

To identify (sub)families of closely related sequences and visualize similarities within and between all genuine REases and their homologs we used CLANS (CLuster ANalysis of Sequences), a Java utility based on the Fruchterman-Reingold graph layout algorithm (51). CLANS uses the P-values of high-scoring segment pairs (HSPs) obtained from an N × N BLAST search, to compute attractive and repulsive forces between each sequence pair in a user-defined dataset. A 3D or 2D representation is achieved by randomly seeding sequences in the arbitrary distance space. The sequences are then moved within this environment according to the force vectors resulting from all pairwise interactions and the process is repeated to convergence.

Groups of two or more sequences that formed clusters were extracted from the CLANS output and aligned using MUSCLE (52). In cases of low sequence similarity, alignments were also constructed with other programs, MUMMALS (53), MAFFT (54) and PROBCONS (55), and checked for consistency. Those sequences of REase homologs, which could be aligned to true REases, but exhibited deletions (>30% of the alignment missing) were discarded. Manual adjustments were introduced into the alignments to preserve the continuity of secondary structure elements, either observed in crystal structures of representative family members, or predicted computationally (see below).

Domain assignment for proteins was performed mainly by Conserved Domain Database search service (56) with default parameters. Additional searches were made using HHPRED (57) against the database of all available sequence profiles. If a reliable multiple sequence alignment for a given sequence was available (see above), it was used as a query instead of a single sequence.

Structure prediction

Protein structure prediction was carried out using a new version (http://genesilico.pl/meta2/) of the GeneSilico MetaServer (58), which is a gateway for a variety of methods for making predictions and analyzing their results. For each REase subfamily, at least one representative sequence was submitted, and often additional predictions were made for individual domains, other members and whole alignments. Secondary structure was predicted using a consensus of PSIPRED (59), PROFsec (60), PROF (61), SABLE (62), JNET (63), JUFO (64), PORTER (65), SSPRO2 (66) and SAM-T02 (67). Solvent accessibility for individual residues was predicted with SABLE (62), ACCPRO2 (66) and JNET (63). The fold-recognition (FR) analysis (attempt to match the query sequence to known protein structures) was carried out using a series of methods: PDB-BLAST [local implementation of a PSI-BLAST (50) search against sequences of proteins from PDB], HHSEARCH (68), FORTE (69), SAM-T02 (67), 3DPSSM (70), INBGU (71), FUGUE (72), mGENTHREADER (73) and SPARKS (74). Target-template alignments reported by these methods were compared, evaluated and ranked by the PCONS server (75) to identify the preferred template.

We have not attempted to build three-dimensional models for all REases, as currently this analysis is too demanding; it usually requires iterative comparative modeling of the core and model evaluation often accompanied by de novo folding of variable parts, with a lot of manual intervention and time-consuming calculations, which can take weeks or even months per protein [see previously published examples, e.g. (76)]. The alignments published in this work, will however serve as a convenient starting point for building complete models in the future, when experimental data to directly test the models become available, and it will be worthwhile to invest time and computing power.

RESULTS

Identification of known and putative REases

We retrieved 1637 sequences of all Type II REases (genuine and putative enzymes, including sequences from metagenomics projects) from REBASE (edition 25 January 2008). For these sequences, we carried out preliminary clustering with CLANS (51), to detect groups of proteins exhibiting BLAST P-value <0.001 in pairwise comparisons (see Methods section for details). The results (data not shown) revealed four large clusters of 471, 221, 125 and 42 sequences, comprising all experimentally characterized Type IIC enzymes (including Type IIG and Type IIB) and their closest homologs, and a large number of very small clusters and ORFans. By definition, all known type IIC enzymes possess in the same polypeptide a nuclease domain and a DNA:m6A MTase domain. While the nuclease domains exhibit relatively low similarity (characteristic for REases of all types), the MTase domains exhibit very high sequence conservation (typical for MTases), leading all Type IIC enzymes to cluster together—regardless of the presence or absence of similarity between their non-MTase parts of the sequence. Preliminary clustering revealed also several other smaller clusters of proteins that shared sequence similarity in various kind of non-nuclease domains (such as the GHKL domain common to the ATPase/kinase superfamily (77) or the DEXDc helicase domain), but no similarity in known or predicted nuclease domains.

In order to cluster Type II REase sequences only with respect to similarity of their nuclease domains, we decided to identify all domains in sequences from REBASE and create a set of sequences from which all conserved non-nuclease domains have been deleted. This was made by retrieving sequences from sequence clusters, making multiple sequence alignments, assigning domains by CDD and HHPRED (see Methods section), followed by deletion of assigned non-nuclease domains. If necessary, additional subclustering and domain assignment was done for each cluster. We omitted very short sequences (<50 aa, e.g. from peptide sequencing), identical sequences and those lacking nuclease domains (e.g. due to truncation); this included partial sequences of some experimentally characterized enzymes, e.g. Aor13HI or PvuI.

To identify additional homologs not present in REBASE, we carried out BLAST searches of the nr database and environmental samples database (env_nr) using all Type II REase sequences (without conserved non-nuclease domains). For all BLAST hits, we performed domain assignment with the same procedure as for sequences from REBASE. Likewise, non-nuclease domains were identified and removed. As a result, we obtained a set of 3132 sequences in two categories: one comprising full-length sequences, and the other with promiscuous domains removed (i.e. REases comprising either exclusively nuclease domains, or nuclease domains with extensions that did not exhibit high similarity to domains in non-REase proteins). The latter set will be referred to the ‘nuclease domain’ set for simplicity.

Classification of REases

The nuclease domain dataset was clustered using CLANS (Figure 1), which allowed us to classify all Type II REases into 190 subfamilies that contain mutually related proteins and ORFans that exhibit no easily detectable similarity of nuclease domain to proteins from other subfamilies. The distribution of size of these 190 subfamilies is shown in Figure 2.

Figure 1.
Clustering of Type II REase sequences and their assignment to three-dimensional folds. (A) Representative structures of nuclease domains of Type II REases or proteins sharing the same fold: PD-(D/E)XK: BamHI (3bam); the universally conserved core is indicated ...
Figure 2.
The distribution of size (number of members) among REase subfamilies. Seventy-seven subfamilies (41% of all subfamilies) contain < 5 sequences, which makes it very difficult to analyze the patterns of sequence conservation and e.g. identify invariant ...

For all confirmed and putative Type II REases in our dataset, we carried out an extensive survey of the published literature and databases to identify experimental data, structural predictions, sequence analyses and phylogenetic studies. Our aim was to collect all experimental data and reasonable predictions that could provide hints to the structural and evolutionary classification of Type II REases, i.e. assignment of sequences to structural folds, grouping of subfamilies into families and families into superfamilies. We were able to identify published crystallographic evidence for members of 23 subfamilies, published structural prediction supported by experiment (e.g. mutagenesis) for members of additional 20 subfamilies and published predictions that have so far not been tested for additional 21 subfamilies. For 126 subfamilies we could find neither experimental data nor reliable predictions, which made them priority targets for our structure prediction methods. Based on analysis of all types of data available as well as the results of our preliminary sequence analyses, we named each subfamily after one representative enzyme, which in our subjective opinion was best studied from the structural or functional point of view or which exhibited features that were most typical for a given subfamily.

For 126 subfamilies that comprised structurally uncharacterized proteins and for any of the previously mentioned subfamilies where we had any doubts about the correctness of the published structural assignments, we carried out structure prediction via the GeneSilico MetaServer (58) using the protein Fold Recognition (FR) approach (see Methods section). The interpretation of FR results and selection of the best template was aided by analyzing the patterns of residue conservation in the light of predicted secondary structure, both in the target subfamily and in the putative templates. In a few particularly difficult cases the fold prediction was aided by building three-dimensional models using the FRankenstein's Monster approach (78,79) and analysis of sequence–structure compatibility in 3D using a series of Model Quality Assessment Programs (80) (see Methods section for details). The FR analysis allowed us to predict 3D folds and identify putative homology between Type II REase subfamilies and proteins of known structure, including all previously solved structures of Type II REases and their homologs. We have also used HHSEARCH (68) to perform a series of pairwise profile-to-profile comparisons for all alignments of subfamilies represented as profile hidden Markov models (HMMs) that include information both about sequence conservation (if more than one sequence is available) and secondary structure predicted by PSIPRED (59). This type of analysis allowed us to identify putative homology between different Type II REase subfamilies, including those for which no experimental structural information is available. Combination of structure and sequence-oriented searches allowed us to make fold predictions based on the principle of transitivity of homology. For example if subfamily A was found to be homologous to subfamily B, and the same sequence region in subfamily B that was matched with subfamily A was also found to match a structure of a known fold characteristic for subfamily C, then subfamily A was predicted to be homologous to subfamily C regardless of the absence of a direct match.

As a result of the aforementioned analyses, we confirmed all previously reported 3D fold predictions, and made new predictions for 52 subfamilies. Thus, as a result of our survey, we assigned three-dimensional folds to 1528 Type II REase sequences and their homologs based on previously published analyses and our alignments, and we made new predictions about the fold and active site for 1027 Type II REase sequences and their homologs. For 577 Type II REase sequences and their homologs (i.e. 18.4% of all sequences; 73 subfamilies among 190 subfamilies total), we could not make any structural assignment, based either on literature and database searches or on our new bioinformatic analyses. The results of our survey are summarized in Table 1. Sequence alignments of core residues for representatives of all ‘assignable’ subfamilies are shown in Figure 3 [PD-(D/E)XK superfamily, 98 or 51.6% subfamilies], Figure 4 (HNH superfamily, 14 or 7.4% subfamilies), Figure 5 (PLD superfamily, 2 or 1.1% subfamilies) and Figure 6 (GIY-YIG superfamily, 2 or 1.1% subfamiles). We found no new subfamilies from the HALFPIPE superfamily compared to the previously published study, therefore readers are referred to the original publication for comparative analysis (40,81).

Figure 3.
Sequence alignment of representative Type II REases from all subfamilies of the PD-(D/E)XK superfamily. Sequences of REases are preceded with sequences of several proteins from this superfamily with solved crystal structures and with typical secondary ...
Figure 4.
Sequence alignment of representative Type II REases from all subfamilies of the HNH superfamily. Sequences of REases are preceded with sequences of several proteins from this superfamily with solved crystal structures and with typical secondary structure ...
Figure 5.
Sequence alignment of representative Type II REases from the PLD superfamily. Sequences of REases are preceded with a sequence of Nuc nuclease (1BYR) from the PLD superfamily and with the secondary structure of R.BfiI (2c1l). Amino acids are colored according ...
Figure 6.
Sequence alignment of representative Type II REases from the GIY-YIG superfamily. Sequences of two REases are preceded by sequences of GIY-YIG members with solved crystal structures and with the secondary structure of I-TevI homing endonuclease (1mk0). ...
Table 1.
3D-fold classification for Type II REase subfamilies

Analysis of domain architectures

3D fold assignment of nuclease domains together with assignment of non-nuclease domains enabled us to study the diversity of domain organization of confirmed and putative Type II REases. We found out that REases show great variety of possible compositions as we observed 50 different types of domain fusions and rearrangements (Figure 7). The most frequently found domains in REases (apart from nuclease domains) are: MTase domains, variants of helix–turn–helix (HTH) DNA-binding domains (e.g. ‘winged helix’, wH) and different kinds of domains associated with helicase or ATPase functions (DEXD-box, GHKL). Interestingly, in seven subfamilies (e.g.: R.MboI, R.SdaI) MTase domains are present only in one or a few members. This observation suggests that translational fusions of REase and MTase domains occurred independently multiple times in the evolution, and has been facilitated by the frequent occurrence of REase and MTase domains in operons (i.e. transcriptional fusions).

Figure 7.
A variety of primary structures (domain architectures on the sequence level) in confirmed and putative Type II REases. Sequences are aligned by their nuclease domains. Drawing in scale, length of PD-D(E)XK domain corresponds to 110 aa. Some very long ...

Characterization of selected subfamilies

Although a complete description of all new fold assignments and all domain organizations is beyond the limits of a single publication, we would like to describe in more detail the most interesting or most intriguing (in a few cases potentially controversial) new findings and predictions:

R.LlaBIIP: this long protein (1461 aa) appears to be a fusion of HsdR-like and HsdM-like subunits, comprising the putative ATP-dependent translocase and MTase modules. However, the N-terminal region appears to lack the PD-D(E)XK domain common to HsdR subunits. Instead, the N-terminus contains a putative helical domain HEPN found in nucleotidyltransferases (aa 1–130), and another putative domain (aa 130–250), which shows no sequence or secondary structure similarity to any known nuclease domains. It would be very interesting to test experimentally whether R.LlaBIIP (and in particular its unusual N-terminal region) exhibits a nuclease activity.

R.CviAI (GATC specific) (82) is predicted to be a PD-(D/E)XK superfamily member, yet it shows no obvious similarity to other GATC-specific enzymes (e.g. neither the R.MboI nor the R.Sau3AI subfamily). Thus, we predict that its substrate specificity represents a case of convergent evolution within the same structural scaffold, used multiple times to independently develop recognition of the same DNA sequence.

R.HgiDII contains two domains. As mentioned earlier, the N-terminal domain belongs to the GHKL superfamily, which includes e.g. the MutL enzyme involved in DNA mismatch repair [where MutH is the associated nuclease from the PD-(D/E)XK superfamily]. The C-terminal domain of R.HgiDII remains unassigned to any of the known REase folds, or in fact to any known fold or protein family. Interestingly, among four other subfamilies of REases that exhibit the GHKL domain in the N-terminus, one (R.VeiORF1182P) contains the C-terminal domain of the PD-(D/E)XK fold, and in three others (R.NmeAIP, R.EcoUTORF4938P and R.LweSORF291P) the C-terminal extension is apparently different from that in either R.HgiDII or R.VeiORF1182P. The C-terminal domain of R.NmeAIP shows significant similarity to an uncharacterized protein family dubbed ‘Hypoth_Ymh’ in PFAM (CDD search e-value 3e-22). On the other hand, the C-terminus of R.EcoUTORF4938P exhibits similarity to a signal transduction histidine kinase domain from the GHKL superfamily (CDD search e-value 3e-8) with conserved N, D, F and G motifs required for the catalytic activity (83). However, middle parts of both R.NmeAIP and R.EcoUTORF4938P remain unassigned to any known protein family and may contain additional domains. It will be very interesting to determine experimentally the role of the unassigned domains in GHKL-containing REases, and if they turn out to be responsible for the REase activity, they would constitute interesting candidates for new folds (and thereby, for structure determination by X-ray crystallography).

R.DpnI is a representative of a large family of REases that cleave GATC sequence only if the adenosine is methylated to m6A. We identified a putative Zn-binding region in the N-terminal part of their sequences (a conserved tetrad of Cys residues), but thus far we failed to determine its relationship to any known protein family or any known protein structure. Thus, we propose R.DpnI as an attractive target for structure determination by X-ray crystallography.

R.HphI: the analysis of this subfamily has been published (49), but we believe it is worth re-emphasizing that many members of this subfamily are most likely not Type II REases, as they lack MTase neighbors. Thus, it has been predicted that they might belong to another category of selfish nucleases, perhaps similar to homing endonucleases (HEases).

R.LcaA2P is a very close relative of HEases I-HmuI, I-HmuII and I-BsoI that act as nicking enzymes (BLAST e-value: 6e-11 with I-HmuI). Many other members of the LcaA2P family are therefore most likely HEases rather than Type II REases. On the other hand, it will be very interesting to determine whether R.LcaA2P is functional, and if it is—whether it acts as a nicking enzyme or as a ‘normal’ dsDNA nuclease and whether its activity can be inhibited by DNA methylation by the putative MTase encoded by the neighboring gene (M.LcaA2P). Should cleavage by LcaA2P be prevented by methylation, this enzyme may be considered an evolutionary intermediate between REases and HEases.

R.NgoAVIIP: sequences from this subfamily are confidently predicted to belong to the PLD superfamily, based on results of both FR and HHSEARCH analyses (e.g. FFAS score—23.9 to R.BfiI REase, HHSEARCH e-value 1.9e-14 to the profile of the R.BfiI subfamily). Moreover, analysis of the multiple sequence alignment reveals that putative catalytic residues are present. However, thus far efforts to detect the nuclease activity of R.NgoAVIIP have remained unsuccessful (V. Siksnys, IBT Vilnius, Lithuania, personal communication). Interestingly, the C-terminal domain of R.NgoAVIIP shows significant similarity (HHPRED e-value 7e-19) to the C-terminal domain of proteins from another nuclease subfamily (R.Fsp4HI), but they do not seem to share any detectable similarity in the catalytic domain. Thus far, we were unable to identify a known nuclease domain in R.Fsp4HI therefore we propose it as an interesting candidate for further experimental analysis. It would be worthwhile to identify catalytic residues in this nuclease and to check whether its mode of action resembles other REases or other enzymes from the PLD family.

Hypothetical protein SAV_2336 (gi:29828878): this very long protein (1667 aa) from Streptomyces avermitilis shows clear similarity to R.NaeI enzyme from the PD-D(E)XK superfamily in its C-terminus (aa 1359-1657 BLAST e-value 2e-28, alignment spanning the catalytic domain and the wH DNA-binding domain, suggesting that SAV_2336 binds two copies of the target DNA sequence, like R.NaeI). The N-terminal part of SAV_2336 sequence shows significant similarity to the VWA-type domain of unknown function from CO-oxidizing operons in bacteria (HHPRED e-value 1e-06). The central part of SAV_2336 is related to ATPase domains from the MalT family of transcription regulators (e-value <1e-06). This combination of multiple domains that may be involved not only in restriction, but also other aspects of nucleic acid metabolism, makes SAV_2336 an attractive target for experimental analyses.

R.PhoI shows remote similarity to archaeal Holliday junction resolvases from the PD-(D/E)XK superfamily (HHSEARCH hit to PFAM profile for archaeal Holliday junction resolvases Hjc with probability 58.4%) and an expected pattern of secondary structures associated with the catalytic core. However, its catalytic residues appear to be missing, as the PD-(D/E)XK motif is replaced by a PI-ERL variant. One possible explanation is that this protein exhibits an extreme case of catalytic residue migration to alternative locations in protein structure, as described earlier individually for the (D/E) residue in R.Cfr10I (84) and for the K residue in putative cyanobacterial nucleases (29) and in the R.SdaI subfamily (28). However, we found only one close homolog of R.PhoI, which provided insufficient information to predict catalytic residues based on residue conservation, and the preliminary model (data not shown) revealed no good candidates for a spatially reorganized active site. Thus, if R.PhoI is indeed active as a REase and if our 3D fold prediction is correct, it will be very interesting to determine its exact mode of action in vitro, especially its ability to catalyze the phosphodiester bond hydrolysis.

Some catalytically inactive mutants of Type IV REase McrA have been shown to restrict phage growth in vivo, presumably due to unproductive site-specific binding of the protein to a phage DNA, which could disrupt the phage development program at an early stage (85). It will be interesting to determine if REases, such as R.NgoAVIIP and perhaps also R.PhoI, that may be inactive as nucleases, can nevertheless function as REases in vivo, and if this activity can be inhibited by site-specific methylation by the cognate MTase.

Putative REases from the MjaORF1200P subfamily (four sequences in the REBASE set) are most likely RNA MTases rather than REases. According to the fold-recognition analysis [recently published as a separate article (86)], these proteins show clear similarity to the SPOUT superfamily of RNA MTases, and they exhibit no additional domains or residues that would suggest them to act as REases. We suspect that they were (most likely incorrectly) assigned as Type II REases due to the genomic association of MjaORF1200P (ORF MJ1199) with a putative DNA:m5C MTase (M.MjaORF1200P).

Putative REases from the BceAUORF42P subfamily (three sequences in the REBASE set) are most likely Type III rather than Type II REases. In database searches they show clear similarity to Type III Res subunits and they are genetically associated with homologs of Type III MTase subunits.

R.SauN315ORF189P: members of this family show significant sequence similarity and similar domain organization to Type I REase HsdR proteins [e.g. e-value 3.5e-47 for a HHSEARCH alignment with the N terminus of R subunit of Type I restriction enzyme (HSDR_N) profile from the PFAM database].

R.EcoCH14P: sequence of this short protein (95 aa) is similar (HHPRED E-value 4e-05) to a C-terminal helical domain found in Type I REase HsdR proteins and implicated in binding to the Type I MTase complex rather than in the nuclease activity.

Distribution of 3D folds among confirmed and putative REases

From the aforementioned examples it is quite clear that the correctness of our estimated 3D fold distribution among REases is influenced not only by the quality of bioinformatic methods and the confidence in individual predictions or the availability of experimental data to support structural predictions, but also by the confidence in assignment of a given protein as a REase candidate. In particular, our analysis revealed a number of protein families comprising REases, in which some (or even most) members are most likely not REases, but fulfill some other function. Therefore, it is interesting to compare the distribution of 3D fold assignments in sets of experimentally validated Type II REases versus the expanded dataset comprising also putative enzymes.

To this end, we divided all sequences of Type II REases and their homologs into classes on the basis of their source:

  1. CONFIRMED set: all sequences from REBASE with nuclease activity confirmed experimentally;
  2. PREDICTED set: sequences from REBASE without direct experimental confirmation, excluding the data from environmental DNA sequencing projects;
  3. NR set: homologs of sequences from sets 1–2 that are not present in REBASE, but were identified by us in the nr database at the NCBI; and
  4. ENV set: putative REases in REBASE predicted from environmental DNA sequencing projects and identified by us in the environmental samples database (env_nr).

For each of these classes, we additionally created a ‘purged’ variant, from which we removed sequences above the level of 90% sequence identity. We used the following hierarchy of importance (from the most important to the least important): CONFIRMED set > PREDICTED set > NR set > ENV set. Thus, we removed all sequences from environmental samples not present in REBASE that exhibited ≥90% sequence identity to any of the sequences from ‘higher classes’, then the same was applied to all putative REases from nr and so on. Finally, if several genuine REases exhibited ≥90% sequence identity to each other, only one of them was retained. We have also considered an additional PUTATIVE set, which is a sum of PREDICTED, NR and ENV sets, thus contains all sequences that have NOT been experimentally confirmed to function as REases. Table 2 shows the number of sequences present in each of the original and purged datasets and in each fold. The fractions of enzymes assigned to different folds for the CONFIRMED set, PREDICTED set and for the PUTATIVE set, purged at maximum 90% identity, are shown in Figure 8.

Figure 8.
Fraction of enzymes assigned to different folds, purged at maximum 90% identity. (A) Confirmed REases from REBASE; (B) putative REASES from REBASE; (C) putative REASES from REBASE and all homologs found nonredundant (nr) and environmental samples (env_nr) ...
Table 2.
Number of endonucleases exhibiting different folds and different sources

In all datasets analyzed in this work, the largest number of structurally classifiable enzymes always belong to PD-(D/E)XK superfamily. PD-D(E)XK family is overrepresented in the CONFIRMED set (68%) compared to PREDICTED and PUTATIVE sets (60 and 48%, respectively). This is caused by the fact that this family is the most intensively studied [e.g. almost all enzymes with structures solved by X-ray crystallography belong to the PD-(D/E)XK fold]. On the contrary, HNH superfamily, the second largest in all datasets, is overrepresented in the PUTATIVE set (30%) compared to the CONFIRMED and PREDICTED sets (9 and 8%, respectively). As mentioned earlier, this might be due to the fact that some of the genuine REases from the HNH superfamily (e.g. R.HphI) exhibit similarity to putative nucleases that are in fact unlikely to function as REases, thus distorting the PUTATIVE set by inclusion of potential false positives. In the case of R.HphI family, only 20% of R.HphI homologs had detectable MTase neighbors within 5000 bp (49). On the other hand, virtually all experimentally characterized, ‘orthodox’ Type II REases encoded in completely sequenced genomes, whose sequences are available in REBASE (including all experimentally characterized members of the R.HphI family) do possess a MTase neighbor (8).

Distribution of DNA cleavage preferences among folds of REases

An interesting question to be asked is whether REases from particular folds exhibit preferences for certain DNA sequences and/or cleavage patterns (length of 3′ or 5′ overhangs). Should that be the case, the experimental characterization of products of cleavage could aid the prediction of folds (structure) or vice versa. To answer this question, we have manually aligned DNA recognition sequences for all type II REases from the ‘CONFIRMED’ set (see Supplementary Table 1). The features of DNA sequences taken into account were, in order of importance: the cleavage pattern (in some cases with a tolerance of up to 1 bp), the distance between recognition site and cleavage site, the site of methylation by a cognate MTase (if known) and the DNA sequence. We also made a histogram of cleavage patterns for REases from different folds (Figure 9). It shows that the preferred cleavage patterns are indeed different for REases from different folds. REases from the PD-(D/E)XK family show high preference for 5′ overhangs or blunt ends, while REases from the HNH superfamily prefer to generate 1-nt or 4-nt 3′ ends or 4-nt 5′ ends. Interestingly, in our dataset there is not a single case of REases with the same recognition sequence and cleavage pattern that would have different folds, while the probability of such situation in case of random distribution of known cleavage patterns to Type II REases from different families is <10−6 (data not shown). This finding suggests that the knowledge of the target sequence and cleavage pattern could be used as a predictor for the 3D fold assignment. Interestingly, one of the enzymes for which we failed to predict the structure using bioinformatic methods, i.e. R.HpaII (87), cleaves the same DNA sequence (C′CG,G) as another enzyme of known structure, namely R.HinP1I from the PD-(D/E)XK superfamily (88). Neither secondary structure prediction nor ‘sequence gazing’ allowed us to propose any reliable candidate of the PD-(D/E)XK motif in R.HpaII, therefore we propose it as a valuable target for experimental structure determination by X-ray crystallography.

Figure 9.
Number of Type II REases from different folds leaving 5′ or 3′ overhangs of different length or blunt ends.

CONCLUSIONS

The results of our bioinformatics analysis provide the very first classification of all Type II REase sequences into families and superfamilies, and a comprehensive structural census. We believe that our results will be very useful for experimental researchers. First, a number of particularly interesting candidates for crystallographic analyses are proposed, with two priorities in mind: (i) high-resolution structural characterization of folds that are either completely new or at least have not been reported among Type II REases, and (ii) maximization of structural coverage (availability of structural templates for confident modeling of a possibly largest number of sequences significantly related to these templates). Second, our delineation of sequence-related groups of REases that exhibit differences in substrate specificity suggests that detailed comparative analyses (that are beyond the scope of this article) could provide insight into the molecular basis of different specificity. Such groups of nucleases appear to require a smaller number of mutations to change the substrate preference and therefore they may be particularly useful targets for experimental protein engineering aiming at development of enzymes with new specificities. Finally, the observed correlation between the structural folds and the patterns of cleavage (length of ends) provides evidence to support the earlier prediction that the phenotypes of REases may correlate with their evolutionary relationships (89). Thus, structural predictions for putative REases (e.g. those identified by genome sequencing) may aid in prediction of their cleavage patterns and thereby simplify the planning of experiments to characterize them functionally. Conversely, functional characterization of enzymes with unknown structure may provide hints as to their 3D folds. Indeed, the recently characterized REase R.PabI with unusual DNA recognition sequence and cleavage pattern (39) turned out to exhibit a completely new type of structure. Although these correlations should by no means be taken as a rule, they may help experimentalists in prioritization of experiments, aiming at identification and characterization of proteins with particular features of interest.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

[Supplementary Data]

ACKNOWLEDGEMENTS

We thank Rich Roberts, Alfred Pingoud, Virgis Siksnys, Matthias Bochtler, Mikihiko Kawai, Ichizo Kobayashi and members of the Bujnicki laboratory (in particular Jan Kosinski) for stimulating discussions on structural, functional and evolutionary classification of Type II REases and contributing various unpublished materials during the work on this article. We also thank Jan Kosinski for critical reading of the manuscript. This analysis was funded by the NIH (Fogarty International Center grant R03 TW007163-01). Funding to pay the Open Access publication charges for this paper has been waived by Oxford University Press—NAR Editorial Board members are entitled to one free paper per year in recognition of their work on behalf of the journal.

Conflict of interest statement. None declared.

REFERENCES

1. Roberts RJ, Belfort M, Bestor T, Bhagwat AS, Bickle TA, Bitinaite J, Blumenthal RM, Degtyarev S, Dryden DT, Dybvig K, et al. A nomenclature for restriction enzymes, DNA methyltransferases, homing endonucleases and their genes. Nucleic Acids Res. 2003;31:1805–1812. [PMC free article] [PubMed]
2. Skowronek KJ, Bujnicki JM. In: Industrial Enzymes: Structure, Function and Applications. Polaina J, MacCabe AP, editors. Springer; 2007. Chapter 21.
3. Williams RJ. Restriction endonucleases: classification, properties, and applications. Mol. Biotechnol. 2003;23:225–243. [PubMed]
4. Pingoud AM. Restriction Endonucleases. Berlin, Heidelberg: Springer; 2004.
5. Bickle TA, Kruger DH. Biology of DNA restriction. Microbiol. Rev. 1993;57:434–450. [PMC free article] [PubMed]
6. Sistla S, Rao DN. S-adenosyl-L-methionine-dependent restriction enzymes. Crit. Rev. Biochem. Mol. Biol. 2004;39:1–19. [PubMed]
7. Bourniquel AA, Bickle TA. Complex restriction enzymes: NTP-driven molecular motors. Biochimie. 2002;84:1047–1059. [PubMed]
8. Roberts RJ, Vincze T, Posfai J, Macelis D. REBASE–enzymes and genes for DNA restriction and modification. Nucleic Acids Res. 2007;35:D269–D270. [PMC free article] [PubMed]
9. Greene PJ, Gupta M, Boyer HW, Brown WE, Rosenberg JM. Sequence analysis of the DNA encoding the Eco RI endonuclease and methylase. J. Biol. Chem. 1981;256:2143–2153. [PubMed]
10. Newman AK, Rubin RA, Kim SH, Modrich P. DNA sequences of structural genes for Eco RI DNA restriction and modification enzymes. J. Biol. Chem. 1981;256:2131–2139. [PubMed]
11. Kroger M, Hobom G, Schutte H, Mayer H. Eight new restriction endonucleases from Herpetosiphon giganteus–divergent evolution in a family of enzymes. Nucleic Acids Res. 1984;12:3127–3141. [PMC free article] [PubMed]
12. Mullings R, Bennett SP, Brown NL. Investigation of sequence homology in a group of type-II restriction/modification isoschizomers. Gene. 1988;74:245–251. [PubMed]
13. Wilson GG, Murray NE. Restriction and modification systems. Annu. Rev. Genet. 1991;25:585–627. [PubMed]
14. Kim YC, Grable JC, Love R, Greene PJ, Rosenberg JM. Refinement of Eco RI endonuclease crystal structure: a revised protein chain tracing. Science. 1990;249:1307–1309. [PubMed]
15. Winkler FK, Banner DW, Oefner C, Tsernoglou D, Brown RS, Heathman SP, Bryan RK, Martin PD, Petratos K, Wilson KS. The crystal structure of EcoRV endonuclease and of its complexes with cognate and non-cognate DNA fragments. EMBO J. 1993;12:1781–1795. [PMC free article] [PubMed]
16. Venclovas C, Timinskas A, Siksnys V. Five-stranded beta-sheet sandwiched with two alpha-helices: a structural link between restriction endonucleases EcoRI and EcoRV. Proteins. 1994;20:279–282. [PubMed]
17. Kovall RA, Matthews BW. Type II restriction endonucleases: structural, functional and evolutionary relationships. Curr. Opin. Chem. Biol. 1999;3:578–583. [PubMed]
18. Pingoud A, Fuxreiter M, Pingoud V, Wende W. Type II restriction endonucleases: structure and mechanism. Cell Mol. Life Sci. 2005;62:685–707. [PubMed]
19. Aggarwal AK. Structure and function of restriction endonucleases. Curr. Opin. Struct. Biol. 1995;5:11–19. [PubMed]
20. Bujnicki JM. In: Restriction Endonucleases. Pingoud A, editor. Vol. 14. Berlin: Springer; 2004. pp. 63–87.
21. Niv MY, Ripoll DR, Vila JA, Liwo A, Vanamee ES, Aggarwal AK, Weinstein H, Scheraga HA. Topology of Type II REases revisited; structural classes and the common conserved core. Nucleic Acids Res. 2007;35:2227–2237. [PMC free article] [PubMed]
22. Bujnicki JM, Rychlewski L. Grouping together highly diverged PD-(D/E)XK nucleases and identification of novel superfamily members using structure-guided alignment of sequence profiles. J. Mol. Microbiol. Biotechnol. 2001;3:69–72. [PubMed]
23. Kosinski J, Feder M, Bujnicki JM. The PD-(D/E)XK superfamily revisited: identification of new members among proteins involved in DNA metabolism and functional predictions for domains of (hitherto) unknown function. BMC Bioinformatics. 2005;6:172. [PMC free article] [PubMed]
24. Newman M, Strzelecka T, Dorner LF, Schildkraut I, Aggarwal AK. Structure of restriction endonuclease BamHI and its relationship to EcoRI. Nature. 1994;368:660–664. [PubMed]
25. van der Woerd MJ, Pelletier JJ, Xu S, Friedman AM. Restriction enzyme BsoBI-DNA complex: a tunnel for recognition of degenerate DNA sequences and potential histidine catalysis. Structure. 2001;9:133–144. [PubMed]
26. Bozic D, Grazulis S, Siksnys V, Huber R. Crystal structure of Citrobacter freundii restriction endonuclease Cfr10I at 2.15 A resolution. J. Mol. Biol. 1996;255:176–186. [PubMed]
27. Pingoud V, Kubareva E, Stengel G, Friedhoff P, Bujnicki JM, Urbanke C, Sudina A, Pingoud A. Evolutionary relationship between different subgroups of restriction endonucleases. J. Biol. Chem. 2002;277:14306–14314. [PubMed]
28. Tamulaitiene G, Jakubauskas A, Urbanke C, Huber R, Grazulis S, Siksnys V. The crystal structure of the rare-cutting restriction enzyme SdaI reveals unexpected domain architecture. Structure. 2006;14:1389–1400. [PubMed]
29. Feder M, Bujnicki JM. Identification of a new family of putative PD-(D/E)XK nucleases with unusual phylogenomic distribution and a new type of the active site. BMC Genomics. 2005;6:21. [PMC free article] [PubMed]
30. Orlowski J, Boniecki M, Bujnicki JM. I-Ssp6803I: the first homing endonuclease from the PD-(D/E)XK superfamily exhibits an unusual mode of DNA recognition. Bioinformatics. 2007;23:527–530. [PubMed]
31. Newman M, Lunnen K, Wilson G, Greci J, Schildkraut I, Phillips SE. Crystal structure of restriction endonuclease BglI bound to its interrupted DNA recognition sequence. EMBO J. 1998;17:5466–5476. [PMC free article] [PubMed]
32. Bujnicki JM. Crystallographic and bioinformatic studies on restriction endonucleases: inference of evolutionary relationships in the "midnight zone" of homology. Curr. Protein Pept. Sci. 2003;4:327–337. [PubMed]
33. Sapranauskas R, Sasnauskas G, Lagunavicius A, Vilkaitis G, Lubys A, Siksnys V. Novel subtype of type IIs restriction enzymes. BfiI endonuclease exhibits similarities to the EDTA-resistant nuclease Nuc of Salmonella typhimurium. J. Biol. Chem. 2000;275:30878–30885. [PubMed]
34. Aravind L, Makarova KS, Koonin EV. Survey and summary: Holliday junction resolvases and related nucleases: identification of new families, phyletic distribution and evolutionary trajectories. Nucleic Acids Res. 2000;28:3417–3432. [PMC free article] [PubMed]
35. Bujnicki JM, Radlinska M, Rychlewski L. Polyphyletic evolution of type II restriction enzymes revisited: two independent sources of second-hand folds revealed. Trends Biochem. Sci. 2001;26:9–11. [PubMed]
36. Grazulis S, Manakova E, Roessle M, Bochtler M, Tamulaitiene G, Huber R, Siksnys V. Structure of the metal-independent restriction enzyme BfiI reveals fusion of a specific DNA-binding domain with a nonspecific nuclease. Proc. Natl Acad. Sci. USA. 2005;102:15797–15802. [PMC free article] [PubMed]
37. Saravanan M, Bujnicki JM, Cymerman IA, Rao DN, Nagaraja V. Type II restriction endonuclease R.KpnI is a member of the HNH nuclease superfamily. Nucleic Acids Res. 2004;32:6129–6135. [PMC free article] [PubMed]
38. Ibryashkina EM, Zakharova MV, Baskunov VB, Bogdanova ES, Nagornykh MO, Den'mukhamedov MM, Melnik BS, Kolinski A, Gront D, Feder M, et al. Type II restriction endonuclease R.Eco29kI is a member of the GIY-YIG nuclease superfamily. BMC Struct. Biol. 2007;7:48. [PMC free article] [PubMed]
39. Ishikawa K, Watanabe M, Kuroita T, Uchiyama I, Bujnicki JM, Kawakami B, Tanokura M, Kobayashi I. Discovery of a novel restriction endonuclease by genome comparison and application of a wheat-germ-based cell-free translation assay: PabI (5′-GTA/C) from the hyperthermophilic archaeon Pyrococcus abyssi. Nucleic Acids Res. 2005;33:e112. [PMC free article] [PubMed]
40. Miyazono K, Watanabe M, Kosinski J, Ishikawa K, Kamo M, Sawasaki T, Nagata K, Bujnicki JM, Endo Y, Tanokura M, et al. Novel protein fold discovered in the PabI family of restriction enzymes. Nucleic Acids Res. 2007;35:1908–1918. [PMC free article] [PubMed]
41. Azarinskas A, Maneliene Z, Jakubauskas A. Hin4II, a new prototype restriction endonuclease from Haemophilus influenzae RFL4: Discovery, cloning and expression in Escherichia coli. J. Biotechnol. 2006;123:288–296. [PubMed]
42. Pingoud V, Conzelmann C, Kinzebach S, Sudina A, Metelev V, Kubareva E, Bujnicki JM, Lurz R, Luder G, Xu SY, et al. PspGI, a type II restriction endonuclease from the extreme thermophile Pyrococcus sp.: structural and functional studies to investigate an evolutionary relationship with several mesophilic restriction enzymes. J. Mol. Biol. 2003;329:913–929. [PubMed]
43. Pingoud V, Sudina A, Geyer H, Bujnicki JM, Lurz R, Luder G, Morgan R, Kubareva E, Pingoud A. Specificity changes in the evolution of Type II restriction endonucleases: a biochemical and bioinformatic analysis of restriction enzymes that recognize unrelated sequences. J. Biol. Chem. 2005;280:4289–4298. [PubMed]
44. Kriukiene E, Lubiene J, Lagunavicius A, Lubys A. MnlI–The member of H-N-H subtype of Type IIS restriction endonucleases. Biochim. Biophys. Acta. 2005;1751:194–204. [PubMed]
45. Armalyte E, Bujnicki JM, Giedriene J, Gasiunas G, Kosinski J, Lubys A. Mva1269I: a monomeric type IIS restriction endonuclease from Micrococcus varians with two EcoRI- and FokI-like catalytic domains. J. Biol. Chem. 2005;280:41584–41594. [PubMed]
46. Chmiel AA, Radlinska M, Pawlak SD, Krowarsch D, Bujnicki JM, Skowronek KJ. A theoretical model of restriction endonuclease NlaIV in complex with DNA, predicted by fold recognition and validated by site-directed mutagenesis and circular dichroism spectroscopy. Protein Eng. Des. Sel. 2005;18:181–189. [PubMed]
47. Pawlak SD, Radlinska M, Chmiel AA, Bujnicki JM, Skowronek KJ. Inference of relationships in the ‘twilight zone' of homology using a combination of bioinformatics and site-directed mutagenesis: a case study of restriction endonucleases Bsp6I and PvuII. Nucleic Acids Res. 2005;33:661–671. [PMC free article] [PubMed]
48. Skowronek KJ, Kosinski J, Bujnicki JM. Theoretical model of restriction endonuclease HpaI in complex with DNA, predicted by fold recognition and validated by site-directed mutagenesis. Proteins. 2006;63:1059–1068. [PubMed]
49. Cymerman IA, Obarska A, Skowronek KJ, Lubys A, Bujnicki JM. Identification of a new subfamily of HNH nucleases and experimental characterization of a representative member, HphI restriction endonuclease. Proteins. 2006;65:867–876. [PubMed]
50. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. [PMC free article] [PubMed]
51. Frickey T, Lupas A. CLANS: a Java application for visualizing protein families based on pairwise similarity. Bioinformatics. 2004;20:3702–3704. [PubMed]
52. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–1797. [PMC free article] [PubMed]
53. Pei J, Grishin NV. MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information. Nucleic Acids Res. 2006;34:4364–4374. [PMC free article] [PubMed]
54. Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30:3059–3066. [PMC free article] [PubMed]
55. Do CB, Mahabhashyam MS, Brudno M, Batzoglou S. ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res. 2005;15:330–340. [PMC free article] [PubMed]
56. Marchler-Bauer A, Anderson JB, Cherukuri PF, DeWeese-Scott C, Geer LY, Gwadz M, He S, Hurwitz DI, Jackson JD, Ke Z, et al. CDD: a Conserved Domain Database for protein classification. Nucleic Acids Res. 2005;33:D192–D196. [PMC free article] [PubMed]
57. Soding J, Biegert A, Lupas AN. The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 2005;33:W244–W248. [PMC free article] [PubMed]
58. Kurowski MA, Bujnicki JM. GeneSilico protein structure prediction meta-server. Nucleic Acids Res. 2003;31:3305–3307. [PMC free article] [PubMed]
59. Jones DT. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 1999;292:195–202. [PubMed]
60. Rost B, Yachdav G, Liu J. The PredictProtein server. Nucleic Acids Res. 2004;32:W321–W326. [PMC free article] [PubMed]
61. Ouali M, King RD. Cascaded multiple classifiers for secondary structure prediction. Protein Sci. 2000;9:1162–1176. [PMC free article] [PubMed]
62. Adamczak R, Porollo A, Meller J. Combining prediction of secondary structure and solvent accessibility in proteins. Proteins. 2005;59:467–475. [PubMed]
63. Cuff JA, Barton GJ. Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins. 2000;40:502–511. [PubMed]
64. Meiler J, Baker D. Coupled prediction of protein secondary and tertiary structure. Proc. Natl Acad. Sci. USA. 2003;100:12105–12110. [PMC free article] [PubMed]
65. Pollastri G, McLysaght A. Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics. 2005;21:1719–1720. [PubMed]
66. Cheng J, Randall AZ, Sweredoski MJ, Baldi P. SCRATCH: a protein structure and structural feature prediction server. Nucleic Acids Res. 2005;33:W72–W76. [PMC free article] [PubMed]
67. Karplus K, Karchin R, Draper J, Casper J, Mandel-Gutfreund Y, Diekhans M, Hughey R. Combining local-structure, fold-recognition, and new fold methods for protein structure prediction. Proteins. 2003;53(Suppl. 6):491–496. [PubMed]
68. Soding J. Protein homology detection by HMM-HMM comparison. Bioinformatics. 2005;21:951–960. [PubMed]
69. Tomii K, Akiyama Y. FORTE: a profile-profile comparison tool for protein fold recognition. Bioinformatics. 2004;20:594–595. [PubMed]
70. Kelley LA, MacCallum RM, Sternberg MJ. Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol. 2000;299:499–520. [PubMed]
71. Fischer D. Hybrid fold recognition: combining sequence derived properties with evolutionary information. Pacific Symp. Biocomp. 2000:119–130. [PubMed]
72. Shi J, Blundell TL, Mizuguchi K. FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol. 2001;310:243–257. [PubMed]
73. Jones DT. GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J. Mol. Biol. 1999;287:797–815. [PubMed]
74. Zhou H, Zhou Y. Single-body residue-level knowledge-based energy score combined with sequence-profile and secondary structure information for fold recognition. Proteins. 2004;55:1005–1013. [PubMed]
75. Lundstrom J, Rychlewski L, Bujnicki J, Elofsson A. Pcons: a neural-network-based consensus predictor that improves fold recognition. Protein Sci. 2001;10:2354–2362. [PMC free article] [PubMed]
76. Kosinski J, Kubareva E, Bujnicki JM. A model of restriction endonuclease MvaI in complex with DNA: a template for interpretation of experimental data and a guide for specificity engineering. Proteins. 2007;68:324–336. [PubMed]
77. Dutta R, Inouye M. GHKL, an emergent ATPase/kinase superfamily. Trends Biochem. Sci. 2000;25:24–28. [PubMed]
78. Kosinski J, Cymerman IA, Feder M, Kurowski MA, Sasin JM, Bujnicki JM. A “FRankenstein's monster” approach to comparative modeling: merging the finest fragments of Fold-Recognition models and iterative model refinement aided by 3D structure evaluation. Proteins. 2003;53(Suppl. 6):369–379. [PubMed]
79. Kosinski J, Gajda MJ, Cymerman IA, Kurowski MA, Pawlowski M, Boniecki M, Obarska A, Papaj G, Sroczynska-Obuchowicz P, Tkaczuk KL, et al. FRankenstein becomes a cyborg: the automatic recombination and realignment of fold recognition models in CASP6. Proteins. 2005;61(Suppl. 7):106–113. [PubMed]
80. Sasin JM, Bujnicki JM. COLORADO3D, a web server for the visual analysis of protein structures. Nucleic Acids Res. 2004;32:W586–W589. [PMC free article] [PubMed]
81. Dunin-Horkawicz S, Feder M, Bujnicki JM. Phylogenomic analysis of the GIY-YIG nuclease superfamily. BMC Genomics. 2006;7:98. [PMC free article] [PubMed]
82. Xia Y, Burbank DE, Van Etten JL. Restriction endonuclease activity induced by NC-1A virus infection of a Chlorella-like green alga. Nucleic Acids Res. 1986;14:6017–6030. [PMC free article] [PubMed]
83. Wolanin PM, Thomason PA, Stock JB. Histidine protein kinases: key signal transducers outside the animal kingdom. Genome Biol. 2002;3 REVIEWS3013. [PMC free article] [PubMed]
84. Skirgaila R, Grazulis S, Bozic D, Huber R, Siksnys V. Structure-based redesign of the catalytic/metal binding site of Cfr10I restriction endonuclease reveals importance of spatial rather than sequence conservation of active centre residues. J. Mol. Biol. 1998;279:473–481. [PubMed]
85. Anton BP, Raleigh EA. Transposon-mediated linker insertion scanning mutagenesis of the Escherichia coli McrA endonuclease. J. Bacteriol. 2004;186:5699–5707. [PMC free article] [PubMed]
86. Tkaczuk KL, Dunin-Horkawicz S, Purta E, Bujnicki JM. Structural and evolutionary bioinformatics of the SPOUT superfamily of methyltransferases. BMC Bioinformatics. 2007;8:73. [PMC free article] [PubMed]
87. Kulakauskas S, Barsomian JM, Lubys A, Roberts RJ, Wilson GG. Organization and sequence of the HpaII restriction-modification system and adjacent genes. Gene. 1994;142:9–15. [PubMed]
88. Yang Z, Horton JR, Maunus R, Wilson GG, Roberts RJ, Cheng X. Structure of HinP1I endonuclease reveals a striking similarity to the monomeric restriction enzyme MspI. Nucleic Acids Res. 2005;33:1892–1901. [PMC free article] [PubMed]
89. Jeltsch A, Kroger M, Pingoud A. Evidence for an evolutionary relationship among type-II restriction endonucleases. Gene. 1995;160:7–16. [PubMed]
90. Athanasiadis A, Vlassi M, Kotsifaki D, Tucker PA, Wilson KS, Kokkinidis M. Crystal structure of PvuII endonuclease reveals extensive structural homologies to EcoRV. Nat. Struct. Biol. 1994;1:469–475. [PubMed]
91. Newman M, Strzelecka T, Dorner LF, Schildkraut I, Aggarwal AK. Structure of Bam HI endonuclease bound to DNA: partial folding and unfolding on DNA binding. Science. 1995;269:656–663. [PubMed]
92. Wah DA, Bitinaite J, Schildkraut I, Aggarwal AK. Structure of FokI has implications for DNA cleavage. Proc. Natl Acad. Sci. USA. 1998;95:10564–10569. [PMC free article] [PubMed]
93. Deibert M, Grazulis S, Janulaitis A, Siksnys V, Huber R. Crystal structure of MunI restriction endonuclease in complex with cognate DNA at 1.7 A resolution. EMBO J. 1999;18:5805–5816. [PMC free article] [PubMed]
94. Lukacs CM, Kucera R, Schildkraut I, Aggarwal AK. Understanding the immutability of restriction enzymes: crystal structure of BglII and its DNA substrate at 1.5 A resolution. Nat. Struct. Biol. 2000;7:134–140. [PubMed]
95. Deibert M, Grazulis S, Sasnauskas G, Siksnys V, Huber R. Structure of the tetrameric restriction endonuclease NgoMIV in complex with cleaved DNA. Nat. Struct. Biol. 2000;7:792–799. [PubMed]
96. Huai Q, Colandene JD, Chen Y, Luo F, Zhao Y, Topal MD, Ke H. Crystal structure of NaeI-an evolutionary bridge between DNA endonuclease and topoisomerase. EMBO J. 2000;19:3110–3118. [PMC free article] [PubMed]
97. Zhou XE, Wang Y, Reuter M, Mucke M, Kruger DH, Meehan EJ, Chen L. Crystal structure of type IIE restriction endonuclease EcoRII reveals an autoinhibition mechanism by a novel effector-binding fold. J. Mol. Biol. 2004;335:307–319. [PubMed]
98. Xu QS, Kucera RB, Roberts RJ, Guo HC. An asymmetric complex of restriction endonuclease MspI on its palindromic DNA recognition site. Structure. 2004;12:1741–1747. [PubMed]
99. Kachalova GS, Rogulin EA, Artyukh RI, Perevyazova TA, Zheleznaya LA, Matvienko NI, Bartunik HD. Crystallization and preliminary crystallographic analysis of the site-specific DNA nickase Nb.BspD6I. Acta Crystallogr. Sect. F Struct. Biol. Cryst. Commun. 2005;61:332–334. [PMC free article] [PubMed]
100. Hashimoto H, Shimizu T, Imasaki T, Kato M, Shichijo N, Kita K, Sato M. Crystal structures of type II restriction endonuclease EcoO109I and its complex with cognate DNA. J. Biol. Chem. 2005;280:5605–5610. [PubMed]
101. Joshi HK, Etzkorn C, Chatwell L, Bitinaite J, Horton NC. Alteration of sequence specificity of the type II restriction endonuclease HincII through an indirect readout mechanism. J. Biol. Chem. 2006;281:23852–23869. [PubMed]
102. Kaus-Drobek M, Czapinska H, Sokolowska M, Tamulaitis G, Szczepanowski RH, Urbanke C, Siksnys V, Bochtler M. Restriction endonuclease MvaI is a monomer that recognizes its target sequence asymmetrically. Nucleic Acids Res. 2007;35:2035–2046. [PMC free article] [PubMed]
103. Kong H. Analyzing the functional organization of a novel restriction modification system, the BcgI system. J. Mol. Biol. 1998;279:823–832. [PubMed]
104. Cao W, Barany F. Identification of TaqI endonuclease active site residues by Fe2+-mediated oxidative cleavage. J. Biol. Chem. 1998;273:33002–33010. [PubMed]
105. Dahai T, Ando S, Takasaki Y, Tadano J. Site-directed mutagenesis of restriction endonuclease HindIII. Biosci. Biotechnol. Biochem. 1999;63:1703–1707. [PubMed]
106. Rimseliene R, Janulaitis A. Mutational analysis of two putative catalytic motifs of the type IV restriction endonuclease Eco57I. J. Biol. Chem. 2001;276:10492–10497. [PubMed]
107. Sukackaite R, Lagunavicius A, Stankevicius K, Urbanke C, Venclovas C, Siksnys V. Restriction endonuclease BpuJI specific for the 5′-CCCGT sequence is related to the archaeal Holliday junction resolvase family. Nucleic Acids Res. 2007;35:2377–2389. [PMC free article] [PubMed]
108. Xu SY, Zhu Z, Zhang P, Chan SH, Samuelson JC, Xiao J, Ingalls D, Wilson GG. Discovery of natural nicking endonucleases Nb.BsrDI and Nb.BtsI and engineering of top-strand nicking variants from BsrDI and BtsI. Nucleic Acids Res. 2007;35:4608–4618. [PMC free article] [PubMed]
109. Rodicio MR, Quinton-Jager T, Moran LS, Slatko BE, Wilson GG. Organization and sequence of the SalI restriction-modification system. Gene. 1994;151:167–172. [PubMed]
110. Siksnys V, Timinskas A, Klimasauskas S, Butkus V, Janulaitis A. Sequence similarity among type-II restriction endonucleases, related by their recognized 6-bp target and tetranucleotide-overhang cleavage. Gene. 1995;157:311–314. [PubMed]
111. Stankevicius K, Lubys A, Timinskas A, Vaitkevicius D, Janulaitis A. Cloning and analysis of the four genes coding for Bpu10I restriction- modification enzymes. Nucleic Acids Res. 1998;26:1084–1091. [PMC free article] [PubMed]
112. Advani S, Roy KB. Properties and secondary structure analysis of BanI endonuclease: identification of putative active site. Biochem. Biophys. Res. Commun. 2000;279:11–16. [PubMed]
113. Madsen A, Josephsen J. The LlaGI restriction and modification system of Lactococcus lactis W10 consists of only one single polypeptide. FEMS Microbiol. Lett. 2001;200:91–96. [PubMed]
114. Friedhoff P, Lurz R, Luder G, Pingoud A. Sau3AI, a monomeric type II restriction endonuclease that dimerizes on the DNA and thereby induces DNA loops. J. Biol. Chem. 2001;276:23581–23588. [PubMed]
115. Cesnaviciene E, Petrusyte M, Kazlauskiene R, Maneliene Z, Timinskas A, Lubys A, Janulaitis A. Characterization of AloI, a restriction-modification system of a new type. J. Mol. Biol. 2001;314:205–216. [PubMed]
116. O’Driscoll J, Heiter DF, Wilson GG, Fitzgerald GF, Roberts R, van Sinderen D. A genetic dissection of the LlaJI restriction cassette reveals insights on a novel bacteriophage resistance system. BMC Microbiol. 2006;6:40. [PMC free article] [PubMed]
117. Jakubauskas A, Giedriene J, Bujnicki JM, Janulaitis A. Identification of a single HNH active site in Type IIS restriction endonuclease Eco31I. J. Mol. Biol. 2007;370:157–169. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...