• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Sep 3, 2002; 99(18): 11589–11592.
Published online Aug 19, 2002. doi:  10.1073/pnas.162369099
PMCID: PMC129313
Physics, Biochemistry

Protein–DNA computation by stochastic assembly cascade


The assembly of RecA on single-stranded DNA is measured and interpreted as a stochastic finite-state machine that is able to discriminate fine differences between sequences, a basic computational operation. RecA filaments efficiently scan DNA sequence through a cascade of random nucleation and disassembly events that is mechanistically similar to the dynamic instability of microtubules. This iterative cascade is a multistage kinetic proofreading process that amplifies minute differences, even a single base change. Our measurements suggest that this stochastic Turing-like machine can compute certain integral transforms.

Every computation requires a reliable recognition of its input data. Any scheme for computation based on protein–DNA binding must attain this recognition within the physical properties of this interaction, its specificity, affinity, and cooperativity. These properties define biochemical networks such as those used by the cell to process information received from stimuli and to compute its response. The resulting computations are inherently stochastic due to the “noisy” nature of biochemical pathways that resemble more a probabilistic pinball machine than a deterministic desktop PC.

So far, artificial, in vitro biomolecular computing strategies relied mainly on Watson–Crick complementarity of DNA (or RNA). These schemes, which were used to solve a certain class of hard-to-compute problems, are almost deterministic due to the relatively high hybridization energy. The typical algorithm, combinatorial search, encodes potential solutions as DNA sequence library and then selects correct solution(s) via parallel filtration, eliminating the wrong solutions by manipulations based on complementarity (1, 2). Another approach constructs finite-state computing machines, the internal states of which are encoded in DNA sequence (3).

Here we report an in vitro stochastic biomolecular computation based on low-specificity protein–DNA binding: An assembly cascade of RecA proteins on single-stranded DNA can discriminate between similar sequences, thus fulfilling a basic computational task that may be one stage in a more complex computation. The assembly process overcomes the error-prone nature of the single protein binding by constructing a multistage cascade, similar to kinetic proofreading (4), in which many proteins bind and unbind collectively. We find that the dynamics of the cascade is mechanistically similar to the dynamic instability of microtubules, which is used as an efficient space search algorithm within the living cell (5). It also resembles a stochastic counter (6), an imperfect digital apparatus that registers the number of certain events (think of a voting machine). The collective, nonlinear mode of operation of the cascade enables sensitive discrimination of minute length and sequence differences including a single base change.

The hardware of our molecular machine comprises a test-tube filled with a solution of single-stranded DNA molecules, RecA proteins, and ATP molecules that fuel the assembly cascade. When the concentration of RecA monomers exceeds some onset value, they start to form helical filaments, one RecA monomer per each base triplet, that envelope and stretch the DNA (7). A filament first forms when a nucleus, a RecA monomer, binds to a random site along the DNA and then extends rapidly by polymerization to the 3′ end of the empty strand. When bound to DNA, RecAs hydrolyze ATP and change their conformation into a less stable state. The RecA that is closest to the 5′ end, with only one neighboring monomer, tends to disassemble back into the solution when hydrolyzing ATP (8). The resulting assembly–disassembly cascade is asymmetric; while nucleation events extend the filament by long chunks, disassembly removes monomers one by one. A graphical manifestation of this stochastic asymmetry is the irregular saw-tooth form of the filament length (or machine state) dependence on time (Fig. (Fig.11A).

Figure 1
(A) Simulation of a 26-state stochastic assembly cascade. State Qn is a RecA filament of length 26n. The assembly machine advances to higher states through protein disassembly at the filament end (open circles) and to lower states by nucleation (solid ...

Rather than further describing the extensively studied biochemistry of RecA assembly (7) we focus on the computational features of this protein–DNA molecular machine, its “software.” We use here the notion of “machine” in the sense of certain physical realization of an abstract computation, sequence discrimination in our case. Nucleation and disassembly are the two basic operations of this machine. They change the machine's internal state, which is determined by the current length of RecA filament. To describe the machine dynamics, we use the traditional state-transition diagram, where circles represent states, and arrows represent transitions between states (Fig. (Fig.11B). In state Qn, n binding sites out of total N sites along the DNA are vacant, and the RecA filament length is therefore Nn (Q0 is a fully covered DNA, and QN is an empty strand). Clearly, this is a finite-state machine with the number of states equal to the number of binding sites, N. The symbols on each arrow represent the probability per unit of time that such transition occurs given that the machine is in the state at the tail of the arrow. Disassembly can take the machine from state Qn to the next state of the cascade Qn+1 at rate κ whereas at nucleation events the machine jumps from Qn to any of the lower states Qm, m < n, at rate κ+. We also need an output device that will report the machine's current state. In the experiment, the molecular machine “reports” its state through a change in the rotational motion of the DNA molecule, which is directly related to the number of bound RecA monomers and measured by fluorescence anisotropy (9).

The stochastic state-transition diagram can be expressed as a set of N differential equations for the probabilities pn that the machine is at state Qn. Summing the incoming (first two terms) and outgoing (last two terms) transitions at each state of the diagram we obtain

equation M1

The state-transition diagram couples each polymerization state Qn with all the lower states Qm, m < n (Fig. (Fig.1),1), and the equivalent master equation (Eq. 1) is therefore integro-differential, with boundary and normalization conditions dpN/dt = κpN−1NpN, ∑equation M2 pn = 1. To reduce the connectivity of the state-transition diagram, we express it in terms of the cumulative Pn = ∑equation M3 pm, the probability to find the machine at a stage higher or equal to Qn. It is also the probability that the filament is shorter than Nn and that site n is empty. Two processes can alter Pn: (i) disassembly fronts vacate site n at a rate proportional to the front position distribution, −pn = PnPn+1, and (ii) nucleation at any of the n vacant sites fills site n. The dynamics is simpler, since any possible nucleation from a higher state, Qm, m > n leaves Pn unchanged. The resulting master equation is local, dPn/dt = −κ(Pn+1Pn) − κ+nPn, with the normalization P0 = 1. Technically, one can obtain this result directly by summing Eq. 1 from n to N with the boundary conditions.

Thinking of n as a spatial coordinate, we approximate the discreet master equation for Pn(t) by a “drift” equation for the continuous cumulative probability P(n, t).

equation M4

The disassembly term is approximated by a gradient, neglecting higher order derivatives in the Kramers–Moyal expansion (10). In particular, we omit the familiar second-order diffusive term that plays a minor role as long as disassembly is much faster than nucleation rate, in the regime, (κ+)N [double less-than sign] 1, which includes the large fluctuation regime, (κ+)N2 ≈ 1, where the RecA-assembly cascade is the most sensitive.

Master equations such as 1 and 2 are generic in stochastic transition processes, especially in chemical kinetics (10). What makes the computing-machines terminology natural in our case is the understanding that the RecA-binding cascade processes information encoded in the DNA sequence. This may be clarified if one considers a concrete machine model of the cascade. This time we think of a Turing-like device, a deterministic machine that is coupled to an infinite tape through a reading head (Fig. (Fig.11C). The internal states of the machine are the same N binding states Qn. The noisy Brownian dynamics of the cascade is embedded in the tape, which is produced by the following procedure: Time is divided into an infinite series of short equal segments that correspond to the squares on the tape. To each square we randomly assign a symbol with a probability that matches the transition rates. We denote disassembly from state Qn by dn, nucleation to state Qn by en, and in the rest of the squares we write x to denote that nothing happens during the corresponding time duration. The machine reads the squares sequentially from, say, left to right and responds according to the symbol written in the current square. Suppose that the machine is at state Qn, then it responds according to a simple set of rules. (i) If it reads dn, it moves to state Qn+1. (ii) If it reads em and m < n the machine moves to state Qm. (iii) In all other cases, if it reads x or dm with mn, or em with mn, then it stays at state Qn. After its state is determined, the reading head moves one square to the right.

Stochastic automata are natural to information processes ever since they emerged in Shannon's classical study of communication channels (14, 15). The notion of stochastic computers was introduced to the molecular realm in Bennett's discussion of DNA translation and replication, where the computational task is sequence copying (16). We show below that rather than Xeroxing the sequence like RNA- and DNA-polymerase, the RecA cascade carries out another type of computation, the discrimination of close-by sequences. Sequence information is encoded in the random tape through the dependence of the probabilities for disassembly (dn) and nucleation (en) events at a certain site n on the specific base triplet. This information can be equivalently encoded as sequence-specific transition rates, κ(n) and κ+(n), in the state-transition diagram and the corresponding master equation. Although RecA is a nonspecific binding protein with similar affinities for many possible triplets, our measurements show (9) that the collective assembly cascade constructed from these low-specificity components is a highly specific detector that can amplify and discriminate even minute sequence differences (17).

The “sequence-detector” machines we construct are assembly cascades on single-stranded DNAs, 39 or 78 bases long (13- and 26-stage machines). Any measurement that tries to “look inside” such a stochastic machine, that is to infer its internal dynamics from observable output, has to rely on statistical analysis (18). One must collect a sufficient set of observations to overcome the noisiness of the output. We resolve this difficulty by simultaneously measuring many identical machines, ≈105–106 fluctuating DNA–RecA complexes that produce a very smooth ensemble-average signal. An alternative approach could be time averaging over a single-molecule signal (19). Our DNAs carry a fluorescent dye attached to their 3′ end. The fluorescence anisotropy of the dye reports RecA binding as it slows down the rotational motion of the DNA (9). The response of the cascade is examined as we tune the interplay between nucleation and disassembly by changing the available amount of RecA in the solution. The nucleation rate at any vacant site increases with RecA concentration, R, like κ+(n) = κT(n)[center dot]R (where κT(n) is the triplet-specific rate constant), while the disassembly rate remains constant as the amount of ATP it consumes is kept at saturation level. When we add RecA to the test tube more monomers bind DNA through nucleation-polymerization process, and the chance to find occupied sites increases. The binding curve is sigmoidal, typical of collective chemical kinetics (Fig. (Fig.22 Inset).

Figure 2
Increase of fluorescence anisotropy, A, upon RecA binding on (TAC)13 measured in steady state as a function of RecA concentration R (Inset). Thermal rotation of naked DNAs decreases the intrinsic anisotropy of the dye, Am ~ 0.25 − 0.31, ...

Our measurement suggests an exponential sensitivity of the assembly cascade to sequence length and RecA-binding rate constants, κ and κ+ (Fig. (Fig.2).2). To understand how such amplified sensitivity is accomplished, we reexamine the master equation (Eq. 2). Motivated by our measurements that indicate fast relaxation of the assembly cascade, we study its steady state. For a uniform sequence, κ(n) = κ, κ+(n) = κ+, the steady-state probability distribution is Gaussian,

equation M5

It follows that even a slight difference in transition rates of two uniform sequences is exponentially amplified as the cascade advances to its higher states (large n). The maximal enhancement increases exponentially like the square of the states number, actually the number of DNA base triplets. Similarly, the cascade can discriminate between lengths, N1 and N2, of two sequences made of the same triplets, since the probability ratio at the highest states is P(N1)/P(N2) = exp[−(κ+/2κ)(Nequation M6Nequation M7)]. The exponential amplification is the result of the iterative, multistage structure of the cascade. It is the same design principle that underlies industrial distillation (20) and the kinetic proofreading pathway of protein synthesis (4). The exponential amplification of the cascade is evident from the behavior of the normalized fluorescence anisotropy of uniform triplet repeats at the saturated regime (Fig. (Fig.2).2). In this regime of high nucleation rate, it becomes harder for the machine to climb up to higher states through successive disassembly steps (Fig. (Fig.11B). However, this helps the cascade to discriminate lengths, because shorter sequences need less disassembly steps to reach higher states, and indeed the curve for the longer (TAC)26 triplet-repeat sequence is steeper than that of the half-size (TAC)13.

We test the sequence-discrimination capability of the cascade by comparing the binding curves of two uniform single-stranded DNA molecules made of very similar triplet repeats, TAC and TCA (Fig. (Fig.33A). The difference between the two binding signals behaves similarly to the relative entropy of the two probability distributions (sometimes called “information for discrimination”; ref. 21) and therefore gives a good idea about their distinguishability. For both lengths n = 13,26 we find that the difference peaks at a certain rate ratio κ+ that corresponds to the maximal slope of the binding curves (Fig. (Fig.22 Inset), where cooperativity is highest (Eq. 2). The peak indicates optimal tuning of the back and forth scanning motion that is used by the stochastic machine to “read” the sequence (a process that was mapped to sequential reading of a random tape). Thinking of the serriform time series (Fig. (Fig.11A) as a “sentence” composed of an N-state alphabet printed by a stochastic typewriter (something like … Q10Q11Q8Q9Q3Q3Q3Q4… ), then the appearance of the “letter” QN corresponds to a completed scan. Interpreting QN as a “space bar,” the maximal rate of completed scans corresponds to the most informative reading with the maximal rate of “words.” This occurs at the “working point” of the cascade, t+/t ~ 1, when the time interval between nucleation events, t+ ~ 1/(Nκ+), is matched with the time required to climb back to state QN, t ~ N/2κ. With the rates measured independently by kinetic assays we find that optimal separation occurs indeed in the optimal regime, t+/t ~ 1–3.

Figure 3
(A) The difference between the fluorescence anisotropy signal of (TAC)N and (TCA)N for n = 13,26. Sequence separation peaks at a lower rate ratio, κ+, for longer sequences in agreement with cascade model. (B ...

The protein assembly cascade dynamics can detect also localized differences in nonuniform sequences. A stringent test for our machine is the discrimination of a single base change. We therefore introduced a change C → G at the seventh triplet of the two uniform (TAC)N sequences and measured the discrimination (Fig. (Fig.33B). Similar to the uniform sequences, the difference in binding between a sequence and its variant peaks at an optimal κ+ that is lower for the longer sequence, consistent with the working point.

The machine's ability to discriminate localized changes suggests a basis for certain mathematical computations, integral transforms. Consider an ensemble of uniform sequences made of N RecA-favored triplets (relatively high κ+; ref. 9). Within each sequence, we encode a “defect” in the form of a single unfavorable triplet placed at one of the N possible sites. Let w(n) designate the fraction of sequences with defect at site n. A test tube with a mixture of all these sequences encodes the vector [w(1), w(2), w(N)]. A monodisperse solution with defect only at site n is one of the N-unit base vectors that span our sequence space. As shown below, the signal from such a base vector is exponential in the position of the defect, S(n, k) ~ exp(−k n), where k = Δ(κ+) is the difference in the ratio of reaction rates at the defect. Since the fluorescence anisotropy is an ensemble average, the signal of mixture is a Laplace-like transform,

equation M8

To account for the nonuniformity of a DNA sequence with site-dependent nucleation and disassembly rates, κ+(n) and κ(n), we modify the continuous master equation to

equation M9

with the inhomogeneous steady-state solution

equation M10

A mutation at site n0 implies a localized change of reaction rates by Δκ+ and Δκ−. When the mutation is in a formerly uniform sequence, variation of the steady-state profile exhibits a change that depends on the position of the site as ΔP(n)/P(n) [similar, equals] − κn0, where the “wave number,” k, is the difference in the reaction rates ratio, equation M11 The resulting relative change in P(n) depends exponentially on the position of the mutation n0. Integrating over P(n) we find that the anisotropy signal scales like S(n0, k) ~ exp(−kn0).

By choice of other types of sequence base vectors, the stochastic cascade machinery, through the ensemble measurement, can encode and decode mixtures in terms of other transforms. It is tempting to speculate that with additional operations to manipulate sequences at hand, such as recombination, one could construct a molecular architecture for more complex computations. The question of whether RecA assembly is used for natural computation requires in vivo testing (22).


We thank K. Adzuma and B. Shraiman for discussions and suggestions and D. Thaler for a fruitful and inspiring collaboration.


We note that the assembly dynamics differs essentially from the Langevin dynamics of a particle diffusing in a one-dimensional random force field (11) or the related asymmetric exclusion process (12): While a diffusing particle travels continuously, the filament end can abruptly jump to a new site by nucleation. The master equation, therefore, does not lead to the familiar Fokker–Planck equation, and an equivalent Langevin formulation would require infinite stochastic forces to enable the nucleation jumps (10). The effect of diffusion remains minor for short enough inhomogeneous sequences such as the sequence with the point mutation measured in the experiment. In contrast, assembly on longer sequences, (κ+)N ≈ 1, is predicted to exhibit a striking randomness effect of both sequence and diffusion, which may lead to anomalous motion and phase transitions (13).


1. Adleman L M. Science. 1994;266:1021–1024. [PubMed]
2. Landweber L F, Kari L. BioSystems. 1999;52:3–13. [PubMed]
3. Winfree E, Liu F, Wenzler L A, Seeman N C. Nature (London) 1998;394:539–544. [PubMed]
4. Hopfield J J. Proc Natl Acad Sci USA. 1974;71:4135–4139. [PMC free article] [PubMed]
5. Mitchison T, Kirschner M. Nature (London) 1984;312:232–242. [PubMed]
6. Killeen P R, Taylor T J. Psychol Rev. 2000;107:430–459. [PubMed]
7. Kowalczykowski S C, Dixon D A, Eggelston A K, Lauder S D, Rehrauer W M. Microbiol Rev. 1994;58:401–465. [PMC free article] [PubMed]
8. Shan Q, Bork J M, Webb B L, Inman R B, Cox M M. J Mol Biol. 1997;265:519–540. [PubMed]
9. Bar-Ziv R, Libchaber A. Proc Natl Acad Sci USA. 2001;98:9068–9073. [PMC free article] [PubMed]
10. van Kampen N G. Stochastic Processes in Physics and Chemistry. Amsterdam: Elsevier; 2002.
11. Bouchaud J P, Comtet A, Georges A, Le Doussal P. Ann Phys. 1990;201:285–341.
12. Schutz G M. J Stat Phys. 1997;88:427–452.
13. Lubensky D K, Nelson D R. Phys Rev Lett. 2000;85:1572–1575. [PubMed]
14. Shannon C E. Bell Syst Tech J. 1948;27:379–423.
15. Shannon C E. Bell Syst Tech J. 1948;27:623–656.
16. Bennett C H. Int J Theor Phys. 1982;21:905–940.
17. von Neumann J. In: Automata Studies. Shannon C E, McCarthy J, editors. Princeton: Princeton Univ. Press; 1956. pp. 43–98.
18. Grenander, U. (1966) Res. Pap. Statist. Testchrift J. Neyman 107–123.
19. Hegner M, Smith S B, Bustamante C. Proc Natl Acad Sci USA. 1999;96:10109–10114. [PMC free article] [PubMed]
20. Lord Rayleigh. Philos Mag. 1896;42:493–498.
21. Cover T M, Thomas J A. Elements of Information Theory. New York: Wiley; 1991.
22. Matic I, Rayssiguier C, Radman M. Cell. 1995;80:507–515. [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...