- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

# Protein–DNA computation by stochastic assembly cascade

^{*}To whom reprint requests should be addressed. E-mail: libchbr/at/mail.rockefeller.edu.

## Abstract

The assembly of RecA on single-stranded DNA is measured and interpreted as a stochastic finite-state machine that is able to discriminate fine differences between sequences, a basic computational operation. RecA filaments efficiently scan DNA sequence through a cascade of random nucleation and disassembly events that is mechanistically similar to the dynamic instability of microtubules. This iterative cascade is a multistage kinetic proofreading process that amplifies minute differences, even a single base change. Our measurements suggest that this stochastic Turing-like machine can compute certain integral transforms.

Every computation requires a reliable recognition of its input data. Any scheme for computation based on protein–DNA binding must attain this recognition within the physical properties of this interaction, its specificity, affinity, and cooperativity. These properties define biochemical networks such as those used by the cell to process information received from stimuli and to compute its response. The resulting computations are inherently stochastic due to the “noisy” nature of biochemical pathways that resemble more a probabilistic pinball machine than a deterministic desktop PC.

So far, artificial, *in vitro* biomolecular computing strategies relied mainly on Watson–Crick complementarity of DNA (or RNA). These schemes, which were used to solve a certain class of hard-to-compute problems, are almost deterministic due to the relatively high hybridization energy. The typical algorithm, combinatorial search, encodes potential solutions as DNA sequence library and then selects correct solution(s) via parallel filtration, eliminating the wrong solutions by manipulations based on complementarity (1, 2). Another approach constructs finite-state computing machines, the internal states of which are encoded in DNA sequence (3).

Here we report an *in vitro* stochastic biomolecular computation based on low-specificity protein–DNA binding: An assembly cascade of RecA proteins on single-stranded DNA can discriminate between similar sequences, thus fulfilling a basic computational task that may be one stage in a more complex computation. The assembly process overcomes the error-prone nature of the single protein binding by constructing a multistage cascade, similar to kinetic proofreading (4), in which many proteins bind and unbind collectively. We find that the dynamics of the cascade is mechanistically similar to the dynamic instability of microtubules, which is used as an efficient space search algorithm within the living cell (5). It also resembles a stochastic counter (6), an imperfect digital apparatus that registers the number of certain events (think of a voting machine). The collective, nonlinear mode of operation of the cascade enables sensitive discrimination of minute length and sequence differences including a single base change.

The hardware of our molecular machine comprises a test-tube filled with a solution of single-stranded DNA molecules, RecA proteins, and ATP molecules that fuel the assembly cascade. When the concentration of RecA monomers exceeds some onset value, they start to form helical filaments, one RecA monomer per each base triplet, that envelope and stretch the DNA (7). A filament first forms when a nucleus, a RecA monomer, binds to a random site along the DNA and then extends rapidly by polymerization to the 3′ end of the empty strand. When bound to DNA, RecAs hydrolyze ATP and change their conformation into a less stable state. The RecA that is closest to the 5′ end, with only one neighboring monomer, tends to disassemble back into the solution when hydrolyzing ATP (8). The resulting assembly–disassembly cascade is asymmetric; while nucleation events extend the filament by long chunks, disassembly removes monomers one by one. A graphical manifestation of this stochastic asymmetry is the irregular saw-tooth form of the filament length (or machine state) dependence on time (Fig. (Fig.11*A*).

*A*) Simulation of a 26-state stochastic assembly cascade. State

*Q*

_{n}is a RecA filament of length 26

*n*. The assembly machine advances to higher states through protein disassembly at the filament end (open circles) and to lower states by nucleation (solid

**...**

Rather than further describing the extensively studied biochemistry of RecA assembly (7) we focus on the computational features of this protein–DNA molecular machine, its “software.” We use here the notion of “machine” in the sense of certain physical realization of an abstract computation, sequence discrimination in our case. Nucleation and disassembly are the two basic operations of this machine. They change the machine's internal state, which is determined by the current length of RecA filament. To describe the machine dynamics, we use the traditional state-transition diagram, where circles represent states, and arrows represent transitions between states (Fig. (Fig.11*B*). In state *Q _{n}*,

*n*binding sites out of total

*N*sites along the DNA are vacant, and the RecA filament length is therefore

*N*−

*n*(

*Q*

_{0}is a fully covered DNA, and

*Q*is an empty strand). Clearly, this is a

_{N}*finite-state machine*with the number of states equal to the number of binding sites,

*N*. The symbols on each arrow represent the probability per unit of time that such transition occurs given that the machine is in the state at the tail of the arrow. Disassembly can take the machine from state

*Q*to the next state of the cascade

_{n}*Q*

_{n+1}at rate κ

_{−}whereas at nucleation events the machine jumps from

*Q*to any of the lower states

_{n}*Q*,

_{m}*m*<

*n*, at rate κ

_{+}. We also need an output device that will report the machine's current state. In the experiment, the molecular machine “reports” its state through a change in the rotational motion of the DNA molecule, which is directly related to the number of bound RecA monomers and measured by fluorescence anisotropy (9).

The stochastic state-transition diagram can be expressed as a set of *N* differential equations for the probabilities *p _{n}* that the machine is at state

*Q*. Summing the incoming (first two terms) and outgoing (last two terms) transitions at each state of the diagram we obtain

_{n} The state-transition diagram couples each polymerization state *Q _{n}* with all the lower states

*Q*,

_{m}*m*<

*n*(Fig. (Fig.1),1), and the equivalent master equation (Eq. 1) is therefore integro-differential, with boundary and normalization conditions

*dp*

_{N}/

*dt*= κ

_{−}

*p*

_{N−1}−

*Np*

_{N}, ∑

*p*

_{n}= 1. To reduce the connectivity of the state-transition diagram, we express it in terms of the cumulative

*P*

_{n}= ∑

*p*

_{m}, the probability to find the machine at a stage higher or equal to

*Q*. It is also the probability that the filament is shorter than

_{n}*N*−

*n*and that site

*n*is empty. Two processes can alter

*P*: (

_{n}*i*) disassembly fronts vacate site

*n*at a rate proportional to the front position distribution, −

*p*−

_{n}= P_{n}*P*

_{n+1}, and (

*ii*) nucleation at any of the

*n*vacant sites fills site

*n*. The dynamics is simpler, since any possible nucleation from a higher state,

*Q*,

_{m}*m*>

*n*leaves

*P*unchanged. The resulting master equation is local,

_{n}*dP*/

_{n}*dt*= −κ

_{−}(

*P*

_{n+1}−

*P*) − κ

_{n}_{+}

*nP*, with the normalization

_{n}*P*

_{0}= 1. Technically, one can obtain this result directly by summing Eq. 1 from

*n*to

*N*with the boundary conditions.

Thinking of *n* as a spatial coordinate, we approximate the discreet master equation for *P _{n}*(

*t*) by a “drift” equation for the continuous cumulative probability

*P*(

*n*,

*t*).

The disassembly term is approximated by a gradient, neglecting higher order derivatives in the Kramers–Moyal expansion (10). In particular, we omit the familiar second-order diffusive term that plays a minor role as long as disassembly is much faster than nucleation rate, in the regime, (κ_{+}/κ_{−})*N* 1, which includes the large fluctuation regime, (κ_{+}/κ_{−})*N*^{2} ≈ 1, where the RecA-assembly cascade is the most sensitive.^{†}

Master equations such as **1** and **2** are generic in stochastic transition processes, especially in chemical kinetics (10). What makes the computing-machines terminology natural in our case is the understanding that the *RecA-binding cascade processes information encoded in the DNA sequence*. This may be clarified if one considers a concrete machine model of the cascade. This time we think of a Turing-like device, a deterministic machine that is coupled to an infinite tape through a reading head (Fig. (Fig.11*C*). The internal states of the machine are the same *N* binding states *Q _{n}*. The noisy Brownian dynamics of the cascade is embedded in the tape, which is produced by the following procedure: Time is divided into an infinite series of short equal segments that correspond to the squares on the tape. To each square we randomly assign a symbol with a probability that matches the transition rates. We denote disassembly from state

*Q*by

_{n}*d*, nucleation to state

_{n}*Q*by

_{n}*e*, and in the rest of the squares we write

_{n}*x*to denote that nothing happens during the corresponding time duration. The machine reads the squares sequentially from, say, left to right and responds according to the symbol written in the current square. Suppose that the machine is at state

*Q*, then it responds according to a simple set of rules. (

_{n}*i*) If it reads

*d*, it moves to state

_{n}*Q*

_{n+1}. (

*ii*) If it reads

*e*and

_{m}*m*<

*n*the machine moves to state

*Q*. (

_{m}*iii*) In all other cases, if it reads

*x*or

*d*with

_{m}*m*≠

*n*, or

*e*with

_{m}*m*≠

*n*, then it stays at state

*Q*. After its state is determined, the reading head moves one square to the right.

_{n}Stochastic automata are natural to information processes ever since they emerged in Shannon's classical study of communication channels (14, 15). The notion of stochastic computers was introduced to the molecular realm in Bennett's discussion of DNA translation and replication, where the computational task is sequence copying (16). We show below that rather than Xeroxing the sequence like RNA- and DNA-polymerase, the RecA cascade carries out another type of computation, the discrimination of close-by sequences. Sequence information is encoded in the random tape through the dependence of the probabilities for disassembly (*d _{n}*) and nucleation (

*e*) events at a certain site

_{n}*n*on the specific base triplet. This information can be equivalently encoded as sequence-specific transition rates, κ

_{−}(

*n*) and κ

_{+}(

*n*), in the state-transition diagram and the corresponding master equation. Although RecA is a nonspecific binding protein with similar affinities for many possible triplets, our measurements show (9) that the collective assembly cascade constructed from these low-specificity components is a highly specific detector that can amplify and discriminate even minute sequence differences (17).

The “sequence-detector” machines we construct are assembly cascades on single-stranded DNAs, 39 or 78 bases long (13- and 26-stage machines). Any measurement that tries to “look inside” such a stochastic machine, that is to infer its internal dynamics from observable output, has to rely on statistical analysis (18). One must collect a sufficient set of observations to overcome the noisiness of the output. We resolve this difficulty by simultaneously measuring many identical machines, ≈10^{5}–10^{6} fluctuating DNA–RecA complexes that produce a very smooth ensemble-average signal. An alternative approach could be time averaging over a single-molecule signal (19). Our DNAs carry a fluorescent dye attached to their 3′ end. The fluorescence anisotropy of the dye reports RecA binding as it slows down the rotational motion of the DNA (9). The response of the cascade is examined as we tune the interplay between nucleation and disassembly by changing the available amount of RecA in the solution. The nucleation rate at any vacant site increases with RecA concentration, *R*, like κ_{+}(*n*) = κ_{T}(*n*)*R* (where κ_{T}(*n*) is the triplet-specific rate constant), while the disassembly rate remains constant as the amount of ATP it consumes is kept at saturation level. When we add RecA to the test tube more monomers bind DNA through nucleation-polymerization process, and the chance to find occupied sites increases. The binding curve is sigmoidal, typical of *collective* chemical kinetics (Fig. (Fig.22 *Inset*).

*A*, upon RecA binding on (TAC)

_{13}measured in steady state as a function of RecA concentration

*R*(

*Inset*). Thermal rotation of naked DNAs decreases the intrinsic anisotropy of the dye,

*A*

_{m}~ 0.25 − 0.31,

**...**

Our measurement suggests an exponential sensitivity of the assembly cascade to sequence length and RecA-binding rate constants, κ_{−} and κ_{+} (Fig. (Fig.2).2). To understand how such amplified sensitivity is accomplished, we reexamine the master equation (Eq. 2). Motivated by our measurements that indicate fast relaxation of the assembly cascade, we study its steady state. For a uniform sequence, κ_{−}(*n*) = κ_{−}, κ_{+}(*n*) = κ_{+}, the steady-state probability distribution is Gaussian,

It follows that even a slight difference in transition rates of two uniform sequences is exponentially amplified as the cascade advances to its higher states (large *n*). The maximal enhancement increases exponentially like the square of the states number, actually the number of DNA base triplets. Similarly, the cascade can discriminate between lengths, *N*_{1} and *N*_{2}, of two sequences made of the same triplets, since the probability ratio at the highest states is *P*(*N*_{1})/*P*(*N*_{2}) = exp[−(κ_{+}/2κ_{−})(*N* − *N*)]. The exponential amplification is the result of the *iterative*, *multistage* structure of the cascade. It is the same design principle that underlies industrial distillation (20) and the kinetic proofreading pathway of protein synthesis (4). The exponential amplification of the cascade is evident from the behavior of the normalized fluorescence anisotropy of uniform triplet repeats at the saturated regime (Fig. (Fig.2).2). In this regime of high nucleation rate, it becomes harder for the machine to climb up to higher states through successive disassembly steps (Fig. (Fig.11*B*). However, this helps the cascade to discriminate lengths, because shorter sequences need less disassembly steps to reach higher states, and indeed the curve for the longer (TAC)_{26} triplet-repeat sequence is steeper than that of the half-size (TAC)_{13}.

We test the sequence-discrimination capability of the cascade by comparing the binding curves of two uniform single-stranded DNA molecules made of very similar triplet repeats, TAC and TCA (Fig. (Fig.33*A*). The difference between the two binding signals behaves similarly to the relative entropy of the two probability distributions (sometimes called “information for discrimination”; ref. 21) and therefore gives a good idea about their distinguishability. For both lengths *n* = 13,26 we find that the difference peaks at a certain rate ratio κ_{+}/κ_{−} that corresponds to the maximal slope of the binding curves (Fig. (Fig.22 *Inset*), where cooperativity is highest (Eq. 2). The peak indicates optimal tuning of the back and forth scanning motion that is used by the stochastic machine to “read” the sequence (a process that was mapped to sequential reading of a random tape). Thinking of the serriform time series (Fig. (Fig.11*A*) as a “sentence” composed of an *N*-state alphabet printed by a stochastic typewriter (something like … *Q*_{10}*Q*_{11}*Q*_{8}*Q*_{9}*Q*_{3}*Q*_{3}*Q*_{3}*Q*_{4}… ), then the appearance of the “letter” *Q _{N}* corresponds to a completed scan. Interpreting

*Q*as a “space bar,” the maximal rate of completed scans corresponds to the most informative reading with the maximal rate of “words.” This occurs at the “working point” of the cascade,

_{N}*t*

_{+}/

*t*

_{−}~ 1, when the time interval between nucleation events,

*t*

_{+}~ 1/(

*N*κ

_{+}), is matched with the time required to climb back to state

*Q*,

_{N}*t*

_{−}~

*N*/

*2*κ

_{−}. With the rates measured independently by kinetic assays we find that optimal separation occurs indeed in the optimal regime,

*t*

_{+}/

*t*

_{−}~ 1–3.

*A*) The difference between the fluorescence anisotropy signal of (TAC)

_{N}and (TCA)

_{N}for

*n*= 13,26. Sequence separation peaks at a lower rate ratio, κ

_{+}/κ

_{−}, for longer sequences in agreement with cascade model. (

*B*

**...**

The protein assembly cascade dynamics can detect also localized differences in nonuniform sequences. A stringent test for our machine is the discrimination of a single base change. We therefore introduced a change C → G at the seventh triplet of the two uniform (TAC)_{N} sequences and measured the discrimination (Fig. (Fig.33*B*). Similar to the uniform sequences, the difference in binding between a sequence and its variant peaks at an optimal κ_{+}/κ_{−} that is lower for the longer sequence, consistent with the working point.

The machine's ability to discriminate localized changes suggests a basis for certain mathematical computations, integral transforms. Consider an ensemble of uniform sequences made of *N* RecA-favored triplets (relatively high κ_{+}; ref. 9). Within each sequence, we encode a “defect” in the form of a single unfavorable triplet placed at one of the *N* possible sites. Let *w*(*n*) designate the fraction of sequences with defect at site *n*. A test tube with a mixture of all these sequences encodes the vector [*w*(*1*), *w*(*2*),… *w*(*N*)]. A monodisperse solution with defect only at site *n* is one of the *N*-unit base vectors that span our sequence space. As shown below, the signal from such a base vector is exponential in the position of the defect, *S*(*n*,*k*) ~ exp(−*k n*), where *k =* Δ(κ_{+}/κ_{−}) is the difference in the ratio of reaction rates at the defect. Since the fluorescence anisotropy is an ensemble average, the signal of mixture is a Laplace-like transform,

To account for the nonuniformity of a DNA sequence with site-dependent nucleation and disassembly rates, κ_{+}(*n*) and κ_{−}(*n*), we modify the continuous master equation to

with the inhomogeneous steady-state solution

A mutation at site *n*_{0} implies a localized change of reaction rates by Δκ_{+} and Δκ_{−.} When the mutation is in a formerly uniform sequence, variation of the steady-state profile exhibits a change that depends on the position of the site as Δ*P*(*n*)/*P*(*n*) − κ*n*_{0}, where the “wave number,” *k*, is the difference in the reaction rates ratio, The resulting relative change in *P*(*n*) depends exponentially on the position of the mutation *n*_{0}. Integrating over *P*(*n*) we find that the anisotropy signal scales like *S*(*n*_{0}, *k*) ~ exp(−*kn*_{0}).

By choice of other types of sequence base vectors, the stochastic cascade machinery, through the ensemble measurement, can encode and decode mixtures in terms of other transforms. It is tempting to speculate that with additional operations to manipulate sequences at hand, such as recombination, one could construct a molecular architecture for more complex computations. The question of whether RecA assembly is used for natural computation requires *in vivo* testing (22).

## Acknowledgments

We thank K. Adzuma and B. Shraiman for discussions and suggestions and D. Thaler for a fruitful and inspiring collaboration.

## Footnotes

^{†}We note that the assembly dynamics differs essentially from the Langevin dynamics of a particle diffusing in a one-dimensional random force field (11) or the related asymmetric exclusion process (12): While a diffusing particle travels continuously, the filament end can abruptly jump to a new site by nucleation. The master equation, therefore, does not lead to the familiar Fokker–Planck equation, and an equivalent Langevin formulation would require infinite stochastic forces to enable the nucleation jumps (10). The effect of diffusion remains minor for short enough inhomogeneous sequences such as the sequence with the point mutation measured in the experiment. In contrast, assembly on longer sequences, (κ_{+}/κ_{−})*N* ≈ 1, is predicted to exhibit a striking randomness effect of both sequence and diffusion, which may lead to anomalous motion and phase transitions (13).

## References

*Res. Pap. Statist. Testchrift J. Neyman*107–123.

**National Academy of Sciences**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (202K)

- Quantitative analysis of the kinetics of end-dependent disassembly of RecA filaments from ssDNA.[J Mol Biol. 1999]
*Arenson TA, Tsodikov OV, Cox MM.**J Mol Biol. 1999 May 7; 288(3):391-401.* - RecA protein filaments disassemble in the 5' to 3' direction on single-stranded DNA.[J Biol Chem. 2001]
*Bork JM, Cox MM, Inman RB.**J Biol Chem. 2001 Dec 7; 276(49):45740-3. Epub 2001 Sep 26.* - Effects of DNA sequence and structure on binding of RecA to single-stranded DNA.[Proc Natl Acad Sci U S A. 2001]
*Bar-Ziv R, Libchaber A.**Proc Natl Acad Sci U S A. 2001 Jul 31; 98(16):9068-73. Epub 2001 Jul 24.* - Geometry of the DNA strands within the RecA nucleofilament: role in homologous recombination.[Q Rev Biophys. 2003]
*Prévost C, Takahashi M.**Q Rev Biophys. 2003 Nov; 36(4):429-53.* - Right or left turn? RecA family protein filaments promote homologous recombination through clockwise axial rotation.[Bioessays. 2008]
*Wang TF, Chen LT, Wang AH.**Bioessays. 2008 Jan; 30(1):48-56.*

- An autonomous molecular computer for logical control of gene expression[Nature. 2004]
*Benenson Y, Gil B, Ben-Dor U, Adar R, Shapiro E.**Nature. 2004 May 27; 429(6990)10.1038/nature02551* - Speed, dissipation, and error in kinetic proofreading[Proceedings of the National Academy of Scie...]
*Murugan A, Huse DA, Leibler S.**Proceedings of the National Academy of Sciences of the United States of America. 2012 Jul 24; 109(30)12034-12039* - Biocomputers: from test tubes to live cells[Molecular bioSystems. 2009]
*Benenson Y.**Molecular bioSystems. 2009 Jul; 5(7)675-685* - Synthesis of crystals with a programmable kinetic barrier to nucleation[Proceedings of the National Academy of Scie...]
*Schulman R, Winfree E.**Proceedings of the National Academy of Sciences of the United States of America. 2007 Sep 25; 104(39)15236-15241* - Fueling protein-DNA interactions inside porous nanocontainers[Proceedings of the National Academy of Scie...]
*Cisse I, Okumus B, Joo C, Ha T.**Proceedings of the National Academy of Sciences of the United States of America. 2007 Jul 31; 104(31)12646-12650*

- Protein–DNA computation by stochastic assembly cascadeProtein–DNA computation by stochastic assembly cascadeProceedings of the National Academy of Sciences of the United States of America. Sep 3, 2002; 99(18)11589PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...