• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Science. Author manuscript; available in PMC Aug 30, 2012.
Published in final edited form as:
PMCID: PMC3431203

De Novo Computational Design of Retro-Aldol Enzymes


The creation of enzymes capable of catalyzing any desired chemical reaction is a grand challenge for computational protein design. Using new algorithms that rely on hashing techniques to construct active sites for multistep reactions, we designed retro-aldolases that use four different catalytic motifs to catalyze the breaking of a carbon-carbon bond in a nonnatural substrate. Of the 72 designs that were experimentally characterized, 32, spanning a range of protein folds, had detectable retro-aldolase activity. Designs that used an explicit water molecule to mediate proton shuffling were significantly more successful, with rate accelerations of up to four orders of magnitude and multiple turnovers, than those involving charged side-chain networks. The atomic accuracy of the design process was confirmed by the x-ray crystal structure of active designs embedded in two protein scaffolds, both of which were nearly superimposable on the design model.

Enzymes are excellent catalysts, and the ability to design new active enzymes could have applications in drug production (1), green chemistry (2), and bioremediation of xenobiotic pollutants (3). To date, most enzyme design efforts have used selection methodologies to retrieve very rare active catalysts from large libraries of candidate protein variants (47). Recent advances in computational protein design have made it possible to design new protein folds (8) and binding interactions (9) and have opened the door to the possibility of computationally designing enzymatic catalysts for any chemical reaction. Despite recent progress (10, 11), creating enzymes for chemical transformations not efficiently catalyzed by naturally occurring enzymes remains a major challenge. Here, we describe (i) general computational methods for constructing active sites for multistep reactions consisting of superimposed reaction intermediates and transition states (TS) surrounded by protein functional groups in orientations optimal for catalysis (Fig. 1) and (ii) the use of this methodology to design novel catalysts for a retro-aldol reaction in which a carbon-carbon bond is broken in a nonnatural (i.e., not found in biological systems) substrate: 4-hydroxy-4-(6-methoxy-2-naphthyl)-2-butanone (Fig. 2A) (12).

Fig. 1
Computational enzyme design protocol for a multistep reaction. The first step is to generate ensembles of models of each of the key intermediates and transition states in the reaction pathway in the context of a specific catalytic motif composed of protein ...
Fig. 2
Retro-aldol reaction and active-site motifs. (A) The retro-aldol reaction. (B) General description of the aldol reaction pathway with a nucleophilic lysine and general acid-base chemistry. Several of the proton transfer steps are left out for brevity. ...

The first step in the computational design of an enzyme is to define one or more potential catalytic mechanisms for the desired reaction. For the retro-aldolase reaction, we focused on mechanisms involving enamine catalysis by lysine via a Schiff base or imine intermediate (13, 14). As shown in simplified form in Fig. 2B, the reaction proceeds in several distinct steps, involving acid-base catalysis by either amino acid side chains or water molecules. First, nucleophilic attack of lysine on the ketone of the substrate forms a carbinolamine intermediate, which eliminates a water molecule to form the imine/iminium species. Next, carbon-carbon bond cleavage is triggered by the deprotonation of the β-alcohol, with the iminium acting as an electron sink. Finally, the enamine tautomerizes to an imine that is then hydrolyzed to release the covalently bound product and free the enzyme for another round of catalysis.

The second step of the design process is the identification of protein scaffolds that can accommodate the designed TS ensemble described above. To account for the multistep reaction pathway, we extended our enzyme design methodology (15) to allow the design of composite TS sites that are simultaneously compatible with multiple TS and reaction intermediates (16). Using this method, we generated design models using the four catalytic motifs shown schematically in Fig. 2C, which apply different constellations of catalytic residues to facilitate carbinolamine formation and water elimination, carbon-carbon bond cleavage, and release of bound product.

Because the probability of accurately reconstructing a given three-dimensional (3D) active site in an input protein scaffold is extremely small, it is essential to consider a very large set of active-site possibilities. We generated such a set by simultaneously varying (i) the internal degrees of freedom of the composite TS (fig. S1B), (ii) the orientation of the catalytic side chains with respect to the composite TS (fig. S3), within ranges that are consistent with catalysis, and (iii) the conformations of the catalytic side chains (fig. S3). For example, in a representative calculation for motif III, we searched for placements of a total 1.4 × 1018 possible 3D active sites (table S3) at all triples or quadruples of backbone positions surrounding binding pockets in 71 different protein scaffolds (table S4). This combinatorial matching resulted in a total of 181,555 distinct solutions for the placement of the composite TS and the surrounding catalytic residues. Through extensive pruning at multiple levels, and by breaking the combinatoric explosion via hashing, the RosettaMatch algorithm (15) is able to rapidly eliminate most active-site possibilities in a given scaffold that are unfavorable as a result of poor catalytic geometry or significant steric clashes with very little computational cost. After optimization of the composite TS rigid body orientation and the identities and conformations of the surrounding residues, a total of 72 designs with 8 to 20 amino acid identity changes in 10 different scaffolds were selected for experimental characterization based on the predicted TS binding energy, the extent of satisfaction of the catalytic geometry, the packing around the active lysine, and the consistency of side-chain conformation after side-chain repacking in the presence and absence of the TS model (16). Genes encoding the designs were synthesized and the proteins were expressed and purified from Escherichia coli; soluble purified protein was obtained for 70 of the 72 expressed designs.

Retro-aldolase activity was monitored via a fluorescence-based assay of product formation (12) for each of the designs, and the results are summarized in Table 1. Our initial 12 designs used the first active site shown in Fig. 2C, which involves a charged side-chain (Lys-Asp-Lys)–mediated proton transfer scheme resembling that in d-2-deoxyribose-5-phosphate aldolase (13). Of these designs, two showed slow enaminone formation with 2,4-pentandione (17), which is indicative of a nucleophilic lysine, but none displayed retro-aldolase activity (16). Ten designs were made for the second, much simpler active site shown in Fig. 2C, which involves a single imine-forming lysine in a hydrophobic pocket similar to aldolase catalytic antibodies; of these designs, one formed the enaminone, but none were catalytically active. The third active site incorporates a His-Asp dyad as a general base to abstract a proton from the β-alcohol; of the 14 designs tested, 10 exhibited stable enaminone formation, and 8 had detectable retro-aldolase activity. In the final active site, we experimented with the explicit modeling of a water molecule, positioned via side-chain hydrogen-bonding groups, which shuttles between stabilizing the carbinolamine and abstracting the proton from the hydroxyl. Of the 36 designs tested, 20 formed the enaminone and 23 (with 11 distinct positions for the catalytic lysine) had significant retroaldolase activity, with rate enhancements up to four orders of magnitude over the uncatalyzed reaction (18).

Table 1
Enaminone formation and enzyme activity for different active-site motifs. NC, not considered.

The active designs occur on five different protein scaffolds belonging to the triose phosphate isomerase (TIM)–barrel and jelly-roll folds. The most active designs exhibited multiple turnover kinetics; the linear progress curves for designs RA60 and RA61, for example, continue unchanged for more than 20 turnovers. Progress curves [Fig. 3A and supporting online material (SOM)] show a range of kinetic behaviors: In some cases (RA45), there is a pronounced lag phase, likely associated with slow imine formation, whereas in others (RA61), there is little or no lag, and for a third set, there is an initial burst followed by a slower steady-state rate (RA22). Notably, simple linear kinetics are observed for the designs in the relatively open jelly-roll scaffold, whereas more complex kinetics are observed for the TIM-barrel designs, which have more enclosed active-site pockets that may restrict substrate access and product release. To obtain kcat and KM estimates for several of the best enzymes (Fig. 3B), we extracted reaction velocities from the steady-state portions of the progress curves and assumed simple Michaelis-Menten kinetics. Given the simplifications, these values are best viewed as phenomenological; future characterization will be required to define rate constants in a particular kinetic model. The apparent kcat and KM values are given in Table 2; kuncat was determined from measurements of the reaction progression in the absence of enzyme and is close to previously determined values (18). kcat/kuncat for the most active designs is 2 × 104. The catalytic proficiency of the designs is far from that of naturally occurring enzymes, which have a kcat/KM of about 1 M−1 s−1 (Table 2); the very low kcat value is probably associated with low reactivity of the imine-forming lysine. Rates for all active designs with 270 µM substrate are reported in table S1. For each of the 11 catalytic lysine positions, a “knockout” mutation to methionine dramatically decreased the activity or, more commonly, abolished catalysis completely, verifying that the observed activity was due to the designed active site.

Fig. 3
Experimental characterization of active enzyme designs. (A) Progress curves for RA61, RA61 K176M, RA22, RA22 S210A, RA22 K159M, RA45, RA45 E233T, and RA45 K180M. The enzymes were tested with 540 µM of the racemic substrate; the reaction was followed ...
Table 2
Kinetic parameters of selected designs. b, burst phase; s, steady state.

Design models for several of the most active designs with catalytic motif IV are shown in Fig. 4, A to C. Design RA60 (Fig. 4A) is on a jelly-roll scaffold, and RA45 (Fig. 4C) and RA46 (Fig. 4B) are on a TIM-barrel scaffold. The imine-forming lysine, the hydrogen-bonding residues coordinating the bridging water molecules, and the designed hydrophobic pocket (which binds the aromatic portion of the substrate) are clearly evident in all three designs.

Fig. 4
Structures of designed enzymes. (A to C) Examples of design models for active designs highlighting groups important for catalysis. The nucleophilic imine-forming lysine is in orange, the TS model is in yellow, the hydrogen-bonding groups are in light ...

To evaluate the accuracy of the design models, we solved the structures of two of the designs by x-ray crystallography (Fig. 4, D and E). The 2.2 Å resolution structure of the Ser210→Ala210 (S210A) variant of RA22 (Fig. 4D) (19) shows that the designed catalytic residues Lys159, His233, and Asp53 superimpose well on the original design model, and the remainder of the active site is nearly identical to the design. The 1.9 Å resolution structure of the M48K variant of RA61 likewise reveals an active site very close to that of the design model, with only His46 and Trp178 in alternative rotamer conformations, perhaps resulting from the absence of substrate in the crystal structure (Fig. 4E). Both crystal structures differ most significantly from the designs in the loops surrounding the active site; explicitly incorporating backbone flexibility in these regions during the design process could yield improved enzymes in the future.

Each proposed catalytic mechanism can be treated as an experimentally testable hypothesis to be tested by multiple independent design experiments. Our lack of success with the first active sites that were tested contrasts markedly with our relatively high success rate with the active site in which proton shuffling is carried out by a bound water molecule rather than by amino acid side chains acting as acid-base catalysts. The charged polar networks in highly optimized naturally occurring enzymes require exquisite control over functional group positioning and protonation states, as well as the satisfaction of the hydrogen-bonding potential of the buried polar residues, which leads to still more extended hydrogen-bond networks. Computational design of such extended polar networks is exceptionally challenging because of the difficulty of accurately computing the free energies of buried polar interactions, particularly the influence of polarizability on electrostatic free energies and the delicate balance between the cost of desolvation and the gain in favorable intraprotein electrostatic and hydrogen-bonding interactions. The sampling problem also becomes increasingly formidable for more complex sites: The side-chain identity and conformation combinatorics dealt with by hashing in RosettaMatch become intractable for sites consisting of five or more long polar side chains, which for accurate representation may require as many as 1000 rotamer conformations each. At the other extreme, bound water molecules offer considerable versatility, because they can readily reorient to switch between acting as hydrogen-bond acceptors and donors and involve neither delicate free-energy tradeoffs nor intricate interaction networks.

It is tempting to speculate that our computationally designed enzymes resemble primordial enzymes more than they resemble highly refined modern-day enzymes. The ability to design simultaneously only three to four catalytic residues parallels the infinitesimal probability that, early in evolution, more than three to four residues would have happened to be positioned appropriately for catalysis; some of the functions played by exquisitely positioned side chains in modern enzymes may have been played by water molecules earlier in evolutionary history.

Although our results demonstrate that novel enzyme activities can be designed from scratch and indicate the catalytic strategies that are most accessible to nascent enzymes, there is still a significant gap between the activities of our designed catalysts and those of naturally occurring enzymes. Narrowing this gap presents an exciting prospect for future work: What additional features have to be incorporated into the design process to achieve catalytic activities approaching those of naturally occurring enzymes? The close agreement between the two crystal structures and the design models gives credence to our strategy of testing hypotheses about catalytic mechanisms by generating and testing the corresponding designs; indeed, almost any idea about catalysis can be readily tested by incorporation into the computational design procedure. Determining what is missing from the current generation of designs and how it can be incorporated into a next generation of more active designed catalysts will be an exciting challenge that should unite the fields of enzymology and computational protein design in the years to come.

Supplementary Material

supplementary materials

supporting data


Kinetic parameters of the designs reported here were determined at the University of Washington. For selected designs, the kinetic parameters were confirmed by independent experiments performed at the Scripps Research Institute. We thank R. Fuller for technical assistance. Thorough testing of the four catalytic motifs was made possible through gene synthesis by Codon Devices. We thank Rosetta@Home participants for their valuable contributions of computer time. E.A.A. is funded by a Ruth L. Kirschstein National Research Service Award. This work was supported by the Defense Advanced Research Projects Agency and HHMI. Coordinates and structure factors for the crystal structures of RA22 variant S210A and RA61 variant M48K were deposited with the Research Collaboratory for Structural Bioinformatics Protein Data Bank (PDB) under the accession numbers 3B5V and 3B5L, respectively. The xyz coordinates of the designs RA22, RA34, RA45, RA46, RA60, and RA61 are included with the SOM as a zipped archive.


Supporting Online Material


Materials and Methods

SOM Text

Figs. S1 to S8

Tables S1 to S8


Design Model Coordinates in PDB Format

References and Notes

1. Ro DK, et al. Nature. 2006;440:940. [PubMed]
2. Kirk O, Borchert TV, Fuglsang CC. Curr. Opin. Biotechnol. 2002;13:345. [PubMed]
3. Janssen DB, Dinkla IJ, Poelarends GJ, Terpstra P. Environ. Microbiol. 2005;7:1868. [PubMed]
4. Hilvert D. Annu. Rev. Biochem. 2000;69:751. [PubMed]
5. Seelig B, Szostak JW. Nature. 2007;448:828. [PubMed]
6. Arnold FH, Volkov AA. Curr. Opin. Chem. Biol. 1999;3:54. [PubMed]
7. Khersonsky O, Roodveldt C, Tawfik DS. Curr. Opin. Chem. Biol. 2006;10:498. [PubMed]
8. Kuhlman B, et al. Science. 2003;302:1364. [PubMed]
9. Looger LL, Dwyer MA, Smith JJ, Hellinga HW. Nature. 2003;423:185. [PubMed]
10. Bolon DN, Mayo SL. Proc. Natl. Acad. Sci. U.S.A. 2001;98:14274. [PMC free article] [PubMed]
11. Kaplan J, DeGrado WF. Proc. Natl. Acad. Sci. U.S.A. 2004;101:11566. [PMC free article] [PubMed]
12. Tanaka F, Fuller R, Shim H, Lerner RA, Barbas CF., III J. Mol. Biol. 2004;335:1007. [PubMed]
13. Heine A, et al. Science. 2001;294:369. [PubMed]
14. Fullerton SW, et al. Bioorg. Med. Chem. 2006;14:3002. [PMC free article] [PubMed]
15. Zanghellini A, et al. Protein Sci. 2006;15:2785. [PMC free article] [PubMed]
16. Materials and methods are available as supporting material on Science Online.
17. Wagner J, Lerner RA, Barbas CF., III Science. 1995;270:1797. [PubMed]
18. Tanaka F, Barbas CF., III J. Am. Chem. Soc. 2002;124:3510. [PubMed]
19. Single-letter abbreviations for the amino acid residues are as follows: A, Ala; C, Cys; D, Asp; E, Glu; F, Phe; G, Gly; H, His; I, Ile; K, Lys; L, Leu; M, Met; N, Asn; P, Pro; Q, Gln; R, Arg; S, Ser; T, Thr; V, Val; W, Trp; and Y, Tyr.
20. Dantas G, Kuhlman B, Callender D, Wong M, Baker D. J. Mol. Biol. 2003;332:449. [PubMed]
21. Meiler J, Baker D. Proteins. 2006;65:538. [PubMed]
22. Press WH, Teukolsky SA, Vetterling WT, Flannery BP. Numerical Recipes in FORTRAN: The Art of Scientific Computing. ed. 2. Cambridge: Cambridge Univ. Press; 1992.
23. Clemente FR, Houk KN. J. Am. Chem. Soc. 2005;127:11294. [PubMed]
24. Porter CT, Bartlett GJ, Thornton JM. Nucleic Acids Res. 2004;32:D129. [PMC free article] [PubMed]
25. Zhong G, et al. Angew. Chem. Int. Ed. Engl. 1998;37:2481.
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...