- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

# Efficiency of DNA replication in the polymerasechainreaction

## Abstract

A detailed quantitative kinetic model for the polymerase chain reaction (PCR) is developed, which allows us to predict the probability of replication of a DNA molecule in terms of the physical parameters involved in the system. The important issue of the determination of the number of PCR cycles during which this probability can be considered to be a constant is solved within the framework of the model. New phenomena of multimodality and scaling behavior in the distribution of the number of molecules after a given number of PCR cycles are presented. The relevance of the model for quantitative PCR is discussed, and a novel quantitative PCR technique is proposed.

**Keywords:**polymerization reaction, branching processes, kinetic model, quantitative polymerase chain reaction

The polymerase chain reaction (PCR) is one of the most widely used
techniques in modern molecular biology. It was devised (1) as a method
for amplifying specific DNA sequences (targets), and the scope of its
applications stretches from medicine (2), through *in vitro*
evolution (3), to molecular computers (4, 5). In spite of its ubiquity
in biology, theoretical discussions of PCR are rare. Although kinetic
models of the enzyme-mediated polymerization of single-stranded DNA
have been reported (6–10), none of them were applied to model PCR, and
only recently has a treatment of the rate of mutations arising in PCR
been considered (11, 12).

The main object of our study is the probability that one molecule will
be replicated in one PCR cycle, the so-called efficiency *p*.
In the second section we present a detailed kinetic model of the
polymerization and find *p* as a function of the physical
parameters of the system. This allows us to discuss the range of
validity of the assumption of constant probability of replication, on
which statistical considerations have been based (11, 12). Within that
range, we apply the theory of branching processes in the third section,
Statistical Analysis, to show the existence of new phenomena: the
probability density function (pdf) of the number of molecules after a
given number of cycles of PCR displays scaling behavior, and under some
conditions, this pdf is multimodal. In the fourth section, a novel
method for quantitative PCR is presented, based on the statistical
considerations of the previous sections. In the final section we
summarize our work.

One cycle of PCR consists of three steps. (For a more detailed account
of the PCR technique see, e.g., ref. 13.) In the *denaturing*
step, the two strands of the parent DNA molecule in solution are
separated into single-stranded (ss) templates by raising the
temperature to about 95°C to disrupt the hydrogen bonds. In the
*annealing* step, the solution is cooled down to approximately
50°C to allow the *primers*, present in a high
concentration, to hybridize with the ssDNA. The primers are two
(different) 20- to 30-nucleotide-long molecules which are Watson–Crick
complementary to the 3′ flanking extreme of the templates. Once the
primer-template heteroduplex is formed, it acts as the *initiation
complex* for the *DNA
polymerase*^{*} to recognize
and bind to. This step is crucial for the specificity of the
amplification: only those molecules that have sequences complementary
to the primers will be amplified. The last step is a polymerization
reaction, in which the solution is heated to 72°C, the optimal
working temperature for *Thermus aquaticus* (*Taq*)
DNA polymerase. This enzyme catalyzes the binding of complementary
nucleotides to the template, in the direction that goes from the primer
to the other extreme.^{†} Notice that if
this polymerization proceeded to its end, at the end of the third step
we would have twice as many DNA molecules as we had at the beginning of
step 1. These three steps constitute one cycle of the PCR, which is
usually 30 s to 2 min long. The cycles are repeated a number of
times (typically 30) by varying the temperature in the solution, in
such a way that the DNA molecules that were synthesized in a given
cycle are used as templates in the following one. In this way one gets
an extremely efficient amplification mechanism for DNA.

### Kinetic Model

We will represent the last two steps of a typical cycle of PCR by
means of a kinetic model. Our species will be the primers
(*pr*, of length *L _{p}* nucleotides), the
single-stranded DNA (

*ss*, consisting of

*L*

_{p}+

*N*nucleotides), the heteroduplexes (

*h*

_{i}, formed by one complete

*ss*and the partially assembled complementary strand consisting of the primer and the next

*i*nucleotides), the nucleotides (

*n*, which will be considered identical), the polymerase (

*q*), and the heteroduplexes

*h*

_{i}with the polymerase attached to them (

*qh*

_{i}). Denoting by κ

_{2j−1}and κ

_{2j}the forward and backward chemical reaction rates, the chemical equations are

Other recognizable species might be present in the chain reaction.
This can occur because of substitutions, additions, or deletions of
nucleotides by the polymerase, or because of the presence of
sequence-dependent structures. These will not be taken into account in
our model, for the sake of simplicity. Assuming that the effects of
inhomogeneities in density and temperature are irrelevant, it is well
known that Eqs. **1** lead to a corresponding system of
first-order nonlinear differential equations for the concentrations of
the different species as functions of time, which we are not going to
write here (see, for example, ref. 15). In the above reactions, one
should assign a given duration to step 2 and another to step 3. For the
sake of simplicity, however, we shall consider both step 2 and step 3
as running simultaneously in the simulations to be presented below.
This is a mild simplification which does not alter the conclusions to
be drawn.

The definition of the efficiency (or probability of replication)
*p* implies that it is simply the ratio between the number of
*ss* molecules that were completely replicated at the end of a
given cycle and the initial number of *ss* molecules in that
cycle:

Fig. Fig.11 shows plots of the probability of
replication *p* as a function of time *t* (which is
to be interpreted as the duration of step 3 in a typical PCR cycle),
for different polymerization lengths *N*. Since to the best of
our knowledge the chemical reaction constants κs have not been
measured for *Taq* polymerase, we have assumed some values for
these constants to exemplify the principal characteristics of our
model. It should be stressed, however, that the equivalent to some of
these constants have been measured for other polymerases such as T4
phage DNA polymerase (6), T7 phage DNA polymerase (7), and
*Escherichia coli* DNA polymerase I (Klenow fragment) (8–10).
The values of the chemical reaction constants κ and the initial
conditions used in the simulation of Eqs. **1** are detailed in
the legend to Fig. Fig.1.1. The main features of the curves in Fig. Fig.11 can be
quantitatively understood. It can be observed that the larger
*N*, the flatter the behavior at small times. Indeed it can be
shown from the dynamic equations that *p* ~
*t*^{(2N+1)/3} for *t* sufficiently small. The
time at which *p* has reached about half its asymptotic value,
as well as the width of the rise-time, can be estimated from a further
simplification of our model. Assuming that the time constants
associated with the backwards reactions in Eqs. **1** are large
enough, and that the concentration of primers, polymerase, and
nucleotides are sufficiently large that their relative concentrations
can be considered as constants (or more precisely as slowly varying
parameters) during the process, we can rewrite the reaction as

The time τ needed for this reaction to be completed is simply
the sum of the times corresponding to each link of the chain, τ =
τ_{κ1} + τ_{κ3} +
τ_{κ5}_{,1} + … +
τ_{κ5}_{,N}, where τ_{κ1}
and τ_{κ3} are the times associated with the first
two reactions in Eq. **3**, and
τ_{κ5}_{,j} is the time
associated with the reaction *qh*_{j−1}^{n↓κ5}→
*qh*_{j}. These τs are independent,
exponentially distributed random variables, whose mean values are
τ_{κ1} =
(κ_{1}[*pr*])^{−1}, τ_{κ3}
= (κ_{3}[*q*])^{−1}, and
τ_{κ5}_{,j} =
(κ_{5}[*n*])^{−1}. Therefore, it can be
readily seen that the mean and the standard deviation of τ are

where we used that the variance of an exponentially distributed
random variable is the square of its mean. τ and
σ_{τ} can be used as estimates of mean rise-time and the
rise-time width about the mean for the complete reaction. These
estimates are shown in Fig. Fig.1.1. The abscissa of the solid square on each
curve corresponds to the value predicted by Eq. **4**, and the
arrowheads indicate the values of τ ± σ_{τ}. It
can be safely concluded that Eqs. **4** and **5**,
computed from the simplified chain reactions of Eq. **3**, are
good estimates of the mean rise-time and the rise-time width
corresponding to the full set of reactions.

*p*(

*t*) as a function of time

*t*, for different template lengths

*N*(in number of nucleotides without including the primers) arising from a numerical simulation of Eqs.

**1**, with parameters: κ

_{1}= 10

^{9}M

^{−1}·s

**...**

The last important feature to be extracted from Fig. Fig.11 is the tendency
of *p*(*t*) towards an asymptote *p*_{∞},
which corresponds to the equilibrium of the chemical system. This value
is of importance in PCR, and thus it is worth computing it in terms of
the parameters of our model. The detailed balance equilibrium
conditions for the reactions of Eqs. **1** demand that
[*ss*]_{eq}/[*h*_{0}]_{eq}
= κ_{2}/(κ_{1}[*pr*]_{eq})
α_{1},
[*h*_{i}]_{eq}/[*qh*_{i}]_{eq} = κ_{4}/(κ_{3}[*q*]_{eq})
α_{3} (for 0 ≤ *i* ≤ *N*
− 1),
[*qh*_{i}]_{eq}/([*qh*_{i+1}]_{eq})
= κ_{6}/(κ_{5}[*n*]_{eq})
α_{5} (for 0 ≤ *i* ≤ *N*
− 1) and
[*h*_{N}]_{eq}/[*qh*_{N}]_{eq}
=
κ_{8}/(κ_{7}[*q*]_{eq})
α_{7}. On using Eq. **2** and the conservation
relation [*ss*](*t*) + Σ_{i=0}^{N} {[*h*_{i}](*t*) + [*qh*_{i}](*t*)} = [*ss*](0),
one obtains that

For the purpose of computing *p*_{∞} one
should know the values of [*pr*]_{eq},
[*n*]_{eq}, and [*q*]_{eq}. As
an approximation to these values one can use the initial values of
these species at the beginning of the cycle. This approximation will be
excellent if these initial concentrations are sufficiently large. The
values of *p*_{∞} (computed under this
approximation) corresponding to the conditions of the simulations of
Fig. Fig.11 are 0.87 for *N* = 1 and 0.85 for *N* =
10 and *N* = 45, in perfect agreement with the
complete simulation. It is interesting to notice that from direct
measurements of *p*(*t*) a wealth of information on the rate
constants involved in the polymerization reactions can be inferred
using Eqs. **4–6**.

Of utmost importance in applications of PCR is the number of cycles of
PCR during which the amplifying process is exponential. As will be
discussed later on, the mean number of molecules
*N*_{k+1} at cycle *k* + 1 is
related to the mean number of molecules *N*_{k}
at cycle *k* by the relation *N*_{k+1} =
(1 + *p*_{k})*N*_{k}, where
*p*_{k} is the efficiency during the *k*th
cycle. Therefore the rate of growth will be exponential only when
*p*_{k} is independent of *k*. During how
many cycles can the system maintain *p*_{k} constant?
The answer can be found if we think that during these cycles, both the
concentration of primers and nucleotides will also be decreasing
exponentially, and therefore their concentration at cycle *k*
will be [*pr*]_{k} = [*pr*]_{0} − (1 +
*p*)^{k}[*ss*]_{0} and [*n*]_{k} =
[*n*]_{0} − (1 +
*p*)^{k}*N*[*ss*]_{0}. The mean rise-time and
rise-time width for *p* at cycle *k*,
τ_{k} and σ_{τ,k}, will be given by
Eqs. **4** and **5**, with [*pr*] and
[*n*] replaced by [*pr*]_{k} and
[*n*]_{k}, respectively. If the time for the
reaction is *t*, then the maximum number of cycles ν during
which *p*_{k} can be considered constant will be
given, to a first approximation, by the ν that verifies that
τ_{ν} + σ_{τ,ν} = *t*. This
imposes an equation for ν that can be solved numerically. An
approximation to this solution is

where log_{b} indicates logarithm to the base
*b*. Notice that as *t* becomes larger, the value of
ν predicted in Eq. **7** tends to a constant independent of
*t*, given by the number of cycles that it takes to deplete
the solution of nucleotides or primers, whichever is exhausted first.
Although it might be unrealistic for the conditions used in molecular
biology, it is interesting to notice that if the nucleotides are the
first species to be exhausted, then most of the heteroduplexes will
cease to polymerize before reaching the end, with the outcome that
there will hardly be any complete double helix formed: in this case the
net amplification factor will be close to zero. Fig.
Fig.22 shows the efficiency *p*_{k}
at cycle *k* as a function of the number of cycles, for
different times of polymerization and *N* = 45 (the other
parameters are as in Fig. Fig.1),1), obtained by the integration of the
reactions of Eq. **1**. In this simulation we concatenated
cycles, assuming a perfect melting step, which was done by hand by
setting [*ss*]_{k+1}(0) in the cycle *k* +
1 equal to [*ss*]_{k}(0) +
[*h*_{N}]_{k}(*t*) +
[*qh*_{N}]_{k}(*t*) of the previous cycle and
[*h*_{i}]_{k+1}(0) =
[*qh*_{i}]_{k+1}(0) = 0 (0 ≤ *i* ≤ *N*).
The dynamics of *pr* and *n*, on the other hand, was
followed exactly. It is clear from Fig. Fig.22 that there is a regime for
which *p*_{k} is roughly constant, and that the
extent of this regime tends to decrease with *t*. The values
of ν predicted by the condition τ_{ν} +
σ_{τ,ν} = *t* are 13 for *t* = 0.8 s
and 15 for *t* = 2.0 s (slightly overestimated by Eq.
**7**, whose integer part yields 14 for *t* = 0.8
s and 15 for *t* = 2.0 s), in rough agreement with the
values of about 12 and 14, respectively, obtained from Fig. Fig.2.
2.

At this point a few important considerations are in order. The fraction
1 − *p* of molecules whose replication was incomplete
will give rise to incomplete complementary single strands. Only when
these incomplete replicas are close to completion will they be able to
bind a primer in the next cycle, and thus be replicated. Therefore the
efficiency *p* defined in Eq. **2** is an
underestimation, since *h*_{N−1},
*h*_{N−2},…, *h*_{N−j} as well as
*qh*_{N−1}, *qh*_{N−2},…,
*qh*_{N−j} (for some *j* < *L*_{p},
where *L*_{p} is the length of the primers) will be
part of the pool of templates in subsequent cycles. However, the
dominant process will be the replication of the complete strand, which
justifies the computation of *p* as in Eq. **2**. There
is another issue that needs some discussion. All the complementary
strands arising from both complete and incomplete replication of a
template can anneal to that template in subsequent cycles, and
therefore can act effectively as primers. Strictly speaking, at any
given cycle *k* ≥ 1 there will be a pool of primers of
different lengths. An estimate of the concentration of “primers”
arising from incomplete replication at cycle *k* is
((1 − *p*)/*p*)(1 + *p*)^{k}[*ss*](0). This
amount is always smaller than the concentration (1 +
*p*)^{k}[*ss*](0) of completely replicated single strands
which act also as potential “primers.” As long as the
concentration of incomplete replicas remain much smaller than the
concentration of primers [*pr*], Eqs. **1** will
constitute a good approximation to the PCR process. Recall now that ν
(see Eq. **7**) is equal or smaller than the number of cycles
required for the concentration of primers [*pr*] to match
the concentration of completely replicated single-stranded molecules.
It follows that the approximation given by Eqs. **1** will break
down only after the number of PCR cycles is bigger than ν, and
therefore our basic conclusions, contained in Eqs. **4–7**, are
not altered.

### Statistical Analysis

As seen above, the efficiency *p* can be assumed to be
constant for a number of cycles of PCR. The statistics of PCR can be
readily computed under this assumption. The basic element in the
analysis is the recursive relation that links the number of replicates
after cycle number *n* + 1, *N*_{n+1} in terms of
*N*_{n},

where *B*(*N*_{n};*p*) is a random variable whose
distribution is binomial with parameters *N*_{n} and
*p*. The basis for this relation is that at the (*n*
+ 1)-th cycle there will be not only the *N*_{n}
molecules that were present at the previous cycle but also the number
of successful replication after *N*_{n} Bernoulli
trials (16), each one with probability *p* of success. The
number of molecules in the initial sample will be denoted by
*M*_{0}.

The first moments of *N*_{n} can be easily
computed from Eq. **8**:

Furthermore, using the theory of branching processes (17, 18), a
recursive relation between
*P*_{n}^{M0}(*k*)
(the probability that there are *k* molecules at cycle
*n*, having started with *M*_{0} of them)
and
*P*_{n−1}^{M0}(*k*)
can be obtained

(where *j*_{max} = min
{*M*_{0}2^{n−1}, *k*}, and
[*k*/2] denotes the integer part of
*k*/2), which when supplemented with the initial
condition
*P*_{0}^{M0}(*k*)
= δ_{k,M0} allows us to compute
*P*_{n}^{M0}(*k*)
for any *n*. Fig. Fig.33 shows the form of
these probability functions for *n* = 10 with
*M*_{0} = 1 in Fig. Fig.33*a*, and
*M*_{0} = 50 in Fig. Fig.33*b*, and different
values of *p*. A remarkable resonance-like behavior can be
observed in the curve corresponding to *p* = 0.9 and
*M*_{0} = 1 (wavy curve in Fig. Fig.33*a*. This
phenomenon originates in the discrete nature of the process: if at the
first cycle the system fails in replicating the only original template,
then the subsequent growth of the population will be as if there were
nine cycles instead of ten. The other peaks correspond to the failure
in replication in the first two cycles, three cycles, etc. This trait
is characteristic of values of *p* between, say, 0.8 and 1.
For smaller values of *p* the function looks smoother. A
common feature of the curves in Fig. Fig.33*a* is the existence of
a power law regime in the region of small *N*_{n},
whose origin will be discussed later on. The behavior of the curves
with *M*_{0} = 50 is simpler: they are basically
Gaussian curves, with a mean that increases with *p* and a
variance that first increases and then decreases with *p* (see
Eqs. **9** and **10**.

*a*) The pdf of the number of molecules after

*n*= 10 cycles and

*M*

_{0}= 1 of a branching process with constant efficiency

*p*, in log–log scale. Notice the multimodality for

*p*= 0.9, and the power law regimes (straight lines). (

*b*) Same as in

*a*

**...**

To understand the features described above, it is convenient to use the
formalism of generating functions (16). The generating function of
*P*_{n}^{M0}(*N*_{n})
is simply *g*_{n,M0}(*s*) =
*s*^{Nn}. Using Eq. **8**,
it is clear that *g*_{1,1} = (1 − *p*)*s* +
*ps*^{2}. It can be shown (17) that for a branching process

where we have denoted by *g*_{1,1}^{(n)}(*s*) the
*n*th composition of *g*_{1,1}(*s*) with
itself, and used that *g*_{0,M0}(*s*) =
*s*^{M0} in the last
equality. To proceed, we use the formalism of characteristic functions.
The characteristic function _{n,M0}(ω) of
the distribution of *N*_{n} having started with
*M*_{0} molecules, which is by definition the Fourier
transform of
*P*_{n}^{M0}(*N*_{n})
(ref. 16), is simply _{n,M0}(ω) =
*g*_{n,M0}(*e*^{iω}). In terms of the
characteristic functions, Eq. **12** implies that

The characteristic function of the sum of *M*_{0}*independent* random variables is simply the product of the
characteristic functions of each of them. Therefore, the physical
interpretation of the last equation is that the amplification cascades
produced by each of the *M*_{0} original molecules
proceed independently, without interaction. From this observation and
the central limit theorem it follows that as the number of molecules
*M*_{0} becomes larger, the distribution of
*N*_{n} tends to a Gaussian. This explains the
observed features of the pdfs of Fig. Fig.33*b*.

The behavior of the pdfs for finite *M*_{0} in the
limit of *n* → ∞ is a little bit more interesting. In
fact, it is clear from Eq. **13** that it suffices to study the
case *M*_{0} = 1, which we do next. We should stress
that our study of the asymptotically large *n* regime does not
aim at understanding the behavior of PCR when infinitely many cycles
are performed. In fact, we have shown in the previous section that the
efficiency can be considered as constant only for a finite number of
cycles. Rather, the reason for studying this asymptotic regime is that
the convergence of the finite *n* case to the *n* →
∞ case is fast enough that many of the features arising for
finite *n* are well explained by the study of asymptotically
large *n*, most notably the power law behavior of the low
*N* regime of Fig. Fig.33*a*. It follows from Eq.
**12** that *g*_{n,1}(*s*) =
*g*_{1,1}[*g*_{n−1,1}(*s*)], which in terms of the
characteristic functions and of the explicit expression for
*g*_{1,1}(*s*) becomes

Given that we are going to consider the limit of *n* →
∞ and *N*_{n} = (1 +
*p*)^{n} (see Eq. **9**) diverges in this limit, it
is convenient to use the random variable Ñ_{n} =
*N*_{n}/*N*_{n}. Denote by
θ_{n,1}(ω) its characteristic function. It is
easy to show that θ_{n,1}(ω) =
_{n,1}(ω/(1 + *p*)^{n}), which on
using Eq. **14** yields

Notice that Eq. **15** can be thought of as a dynamical
system, which maps the point *z*_{n} to
*z*_{n+1} *f*(*z*_{n}) = (1 −
*p*)*z*_{n} + *pz*_{n}^{2}. The function
θ_{n,1}(ω) (−∞ ≤ ω ≤ ∞) parametrizes a
curve in the complex plane. In fact, the initial condition
*M*_{0} = 1 determines that
θ_{0,1}(ω) = *e*^{iω}, which
parametrizes the unit circle ζ_{0}. Subsequent applications
of the map *f*(*z*) to ζ_{0} produce the new curves
ζ_{1}, ζ_{2},…, which are parameterized
respectively by θ_{1,1}(ω),
θ_{2,1}(ω),…. The study of the limiting behavior of
the pdf of Ñ_{n} is thus associated with the
study of the invariant curves of the map *f*. Notice that the
map *f* has only two fixed points, one at *z* =
0 (stable) and one at *z* = 1 (unstable). Upon
iteration, all the infinitesimally small straight lines with slope λ
passing through the repelling point *z* = 1 will generate
a curve *C*_{λ} which is invariant under
*f*, that is, *f*(*C*_{λ}) =
*C*_{λ}. On the other hand, for any *z* ≠
1 such that |*z*| ≤ 1, |*f*(*z*)| < |*z*|.
Therefore the dynamics of this map brings all the points of
ζ_{0} (except for *z* = 1) to the origin. In
the neighborhood of *z* = 1, ζ_{0} is locally
a straight line with slope λ = ∞, which upon evolution will become
the invariant manifold *C*_{∞}. It follows that
ζ_{∞} coincides with C_{∞}, and
θ_{∞,1} parameterizes the invariant manifold of the map
*f*, that crosses *z* = 1 parallel to the
imaginary axis. Fig. Fig.44*a* shows half
the invariant manifold *C*_{∞} corresponding to
*p* = 0.9 (the other half is its complex conjugate), and
on the same plot the imaginary part vs. the real part of
θ_{15,1}(ω) (for positive ω). To the level of resolution
of the figure no departures between the two curves are observed,
meaning that the pdf of the number of molecules at 15 PCR cycles is
well approximated by the limiting pdf.

*a*) The invariant manifold that crosses

*z*= 1 tangent to the unit disk, of the map

*z*

_{n+1}= (1 −

*p*)

*z*

_{n}+

*pz*

_{n}

^{2}(with

*z*in the complex plane), for

*p*= 0.9. It is parametrized by θ

_{∞,1}(ω). In the same plot

**...**

This dynamical-system way of looking at the characteristic
function of Ñ_{n} is very useful to
understand the power law behavior of the pdfs of
*N*_{n}. The argument goes as follows. Close to
*z* = 0 (or equivalently, for large values of ω) the
quadratic terms in Eq. **15** can be neglected, and the
resulting approximate relation, θ_{∞,1}(ω) =
(1 − *p*)θ_{∞,1}(ω/(1 + *p*)) accepts as a
solution the ansatz θ_{∞,1}(ω) ≈
*A*(lnω)ω^{ln(1 − p)/ln(1 +
p)}, where *A*(*x*) is in principle any periodic
function with period ln(1 + *p*). The large ω behavior
of the characteristic function θ_{∞,1}(ω) is then a
power law, with logarithmically periodic modulations. That this
is so is shown in Fig. Fig.44*b*, where we have plotted the
absolute value of θ_{15,1}(ω) for *p* = 0.9.
The power law corresponding to the predicted scaling exponent of
ln(1 − *p*)/ln(1 + *p*) is shown as the
straight line close to the curve in the log–log plots of Fig.
Fig.44*b*. The implication of these results for the pdfs can be
readily drawn. Recalling that the characteristic function and pdf are
related through a Fourier transform, and that the Fourier transform of
|ω|^{α} (with an appropriate infrared cutoff) scales
as *x*^{−α−1}, we conclude that the pdf of
Ñ_{n} should exhibit a scaling of the form
*P*_{∞}^{1}(Ñ_{n}) ~
Ñ_{n}^{[−ln(1 −
p)/ln(1 + p) − 1]}. This scaling law is
shown as the straight lines close to the curves plotted in log–log
scale in Fig. Fig.33*a*.

In the following section we use some of the results presented so far to analyze quantitative PCR.

### Quantitative PCR

Although PCR is used mainly in a qualitative fashion, its
potential for becoming an important tool in nucleic acid quantification
in general (19), and in medical research in particular (20), has become
clear in recent years. By quantitative PCR one means the use of the PCR
to measure an unknown initial number of molecules
*M*_{0}. A few techniques have been developed to that
effect in the past, but the most widespread is probably the so-called
competitive PCR (see, e.g., ref. 21). In this technique, the target,
whose initial concentration is unknown, is amplified simultaneously
with a standard, which is flanked by the same primers as the target and
whose initial concentration is known. The standard should have a length
different from that of the target, so that the two can be resolved in
an electrophoretic gel. The basic idea in competitive PCR is that
*if the efficiencies of replication of the target and the standard
are the same* then the ratio of the concentration of target to that
of the standard is constant in the reaction. Measuring that ratio at
cycle *n* (where presumably we have enough concentration to
use densitometric measurements) we can solve for the initial
concentration of target. While this technique is very attractive, the
basic assumption (the equality of the efficiencies in both species) has
some drawbacks (22). Basically, the potential problems arise in the
dependence of the efficiency on the length of the DNA molecule. The
longer molecule will experience a decrease in efficiency before the
shorter one does, as predicted in Eq. **7**. In any case, the
model presented here can be of use to assess the validity of the
assumptions that go into the basics of competitive PCR.

In order for competitive PCR to work, the length of the standards has
to be within a narrow window: it has to be sufficiently different from
the length of the target molecules (to be resolved in a gel) and
sufficiently similar to it in order for the equal efficiency assumption
to work. The design of a good standard requires some ingenuity, and it
has to be done on a case-by-case basis. In what follows we will present
a design for measuring *M*_{0} without the need of a
standard. Suppose we measure the concentration of a given DNA molecule
after a number of PCR cycles on a sample whose
*M*_{0} is unknown. One might think that if we
repeated the same measurement for a reasonable number of times (say
around 100 times, given that PCR equipment with capacity for 96 vials
is not uncommon), so as to measure the mean value and the variance of
the concentration across that number of experiments, we would have two
equations (Eqs. **9** and **10**) that can be solved for
the two unknowns *p* and *M*_{0}. However,
it can be shown that this procedure always yields two possible
solutions for *p* and *M*_{0}, and there is
no possible way, *a priori*, of choosing the right one. The
reason for this is that for *M*_{0} bigger than a few
hundreds (which is nonetheless a small number of molecules), the
distribution of *N*_{n} is Gaussian, and therefore
determined only by the mean and the variance, which give the
above-mentioned ambiguous answer.

Consider instead the following scheme. We prepare two sets of samples
*S*_{1} and *S*_{2}, each with
*K* identical preparations and whose initial concentration of
a given double-stranded DNA molecule is unknown. We run (under
conditions for which *p* can be considered approximately
constant from cycle to cycle) *n*_{1} cycles of PCR
on set *S*_{1}, and *n*_{2} cycles
on set *S*_{2}, after which we measure the number of
molecules in every sample. The averages ν_{1} and
ν_{2} over the *K* preparations in
*S*_{1} and *S*_{2}, are estimates
of the ensemble averages μ_{n1} and
μ_{n2} corresponding to Eq. **9** for
*n* = *n*_{1} and *n* =
*n*_{2}, respectively. We can use that formula to compute
*m*_{0} =
ν_{1}^{−n2}^{/(n1}^{ − n2}^{)} ν_{2}^{n1}^{/(n1}^{ − n2}^{)} as an estimate of the
real *M*_{0} and ρ =
ν_{1}^{1/(n1}^{ − n2}^{)}
ν_{2}^{−1/(n1}^{ −
n2}^{)} − 1 as an estimate of
the real *p*. Of course these estimates make sense only if a
measure of the error involved in the method is provided. It takes a
simple calculation to show that

and

In writing the last two equations we used Eq. **10**. We
tested these expressions in a set of very simple numerical simulations,
whose details we are not going to report here except for saying that
the PCR amplification was represented by the cascade given by Eq.
**8**. Under variations of all the parameters involved, Eqs.
**16**–**18** were in excellent agreement with the
numerical results. To get a flavor of the precision of the method
proposed, assume a simple example with *M*_{0} = 1000,
*p* = 0.8, *n*_{1} = 10, *n*_{2} = 15, and
*K* = 50. Under these conditions the above equations
predict that the estimate of *M*_{0} will be correct
within 0.5% (that is, ±5 molecules) and that of *p* will be
correct within 0.1%! These estimates refer to the purely statistical
errors, and they will be fairly small under typical conditions. In real
experiments they have to be supplemented with the errors involved in
the measurement of the concentrations. If *M*_{0} and
*p* fluctuated from sample to sample (due to inevitable
differences in their preparations), the fact that we are averaging over
*K* samples will screen these fluctuations. In this latter
case, Eqs. **16** will still be in agreement with the average
*M*_{0} and *p*, and Eqs. **17** and
**18**, which can be easily generalized to include these
fluctuations, will give their right order of magnitude.

### Summary

We have presented a kinetic model for the PCR, which can be the
basis for a more accurate application of quantitative techniques, as it
provides a dynamical account of the probability of replication as a
function of the physical parameters involved. These include the rate
constants of the different reactions. Conversely, the model allows us
to extract information on these rates from direct measurements of
*p*. From a theoretical point of view, it can also be used in
the description of *in vivo* and *in vitro* enzymatic
polymerization processes (23). The statistical analysis of PCR under
the assumption of constant replication probability shows new
interesting phenomena. The scaling behavior of the pdf is an effect of
the recursivity of the process, whereas the multimodality is related to
failures in replication during the first cycles. Although the latter is
a phenomenon present only for a small number of initial molecules, it
is not far from actual experimental conditions, and might be of
relevance in quantitative applications.

Finally, we have used the statistical considerations of the fourth
section to devise a method for measuring the initial number
*M*_{0} of molecules in a sample (quantitative PCR).

## Acknowledgments

Many of the ideas in this paper are the product of long and
fruitful discussions with P. Kaplan and M. Magnasco. The
chemical-kinetics simulations were written in the *K*
language, created by M. Magnasco and available at
http://tlon.rockefeller.edu. We thank A. Libchaber and E.
Mesri for a careful reading of the manuscript and useful discussions,
and an anonymous referee for helpful suggestions. Support from the
Mathers Foundation is gratefully acknowledged.

## Footnotes

Abbreviation: pdf, probability density function.

^{*}Polymerases are interesting pieces of machinery (14). They
are responsible for the duplication of genetic information (DNA
polymerases) and its transcription into RNA (RNA polymerases).

^{†}The DNA is a *polar* molecule, and the
polymerase can attach new nucleotides only to the 3′ end of the
molecule that is being extended.

## References

**National Academy of Sciences**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (295K)

- Modeling the polymerase chain reaction.[J Comput Biol. 1995]
*Weiss G, von Haeseler A.**J Comput Biol. 1995 Spring; 2(1):49-61.* - Estimation of the reaction efficiency in polymerase chain reaction.[J Theor Biol. 2006]
*Lalam N.**J Theor Biol. 2006 Oct 21; 242(4):947-53. Epub 2006 Jun 9.* - Statistical inference for quantitative polymerase chain reaction using a hidden markov model: a Bayesian approach.[Stat Appl Genet Mol Biol. 2007]
*Lalam N.**Stat Appl Genet Mol Biol. 2007; 6:Article10. Epub 2007 Mar 19.* - The real-time polymerase chain reaction.[Mol Aspects Med. 2006]
*Kubista M, Andrade JM, Bengtsson M, Forootan A, Jonák J, Lind K, Sindelka R, Sjöback R, Sjögreen B, Strömbom L, et al.**Mol Aspects Med. 2006 Apr-Jun; 27(2-3):95-125. Epub 2006 Feb 3.* - [Quantitative PCR in the diagnosis of Leishmania].[Parassitologia. 2004]
*Mortarino M, Franceschi A, Mancianti F, Bazzocchi C, Genchi C, Bandi C.**Parassitologia. 2004 Jun; 46(1-2):163-7.*

- TrueAllele Casework on Virginia DNA Mixture Evidence: Computer and Manual Interpretation in 72 Reported Criminal Cases[PLoS ONE. ]
*Perlin MW, Dormer K, Hornyak J, Schiermeier-Wood L, Greenspoon S.**PLoS ONE. 9(3)e92837* - A New Method for Quantitative Real-Time Polymerase Chain Reaction Data Analysis[Journal of Computational Biology. 2013]
*Rao X, Lai D, Huang X.**Journal of Computational Biology. 2013 Sep; 20(9)703-711* - Migration of CD11b+ Accessory Cells During Murine Lung Regeneration[Stem cell research. 2013]
*Chamoto K, Gibney BC, Lee GS, Ackermann M, Konerding MA, Tsuda A, Mentzer SJ.**Stem cell research. 2013 May; 10(3)267-277* - Alveolar Epithelial Dynamics in Post-pneumonectomy Lung Growth[Anatomical record (Hoboken, N.J. : 2007). 2...]
*Chamoto K, Gibney BC, Ackermann M, Lee GS, Konerding MA, Tsuda A, Mentzer SJ.**Anatomical record (Hoboken, N.J. : 2007). 2013 Mar; 296(3)495-503* - Simulation of between Repeat Variability in Real Time PCR Reactions[PLoS ONE. ]
*Lievens A, Van Aelst S, Van den Bulcke M, Goetghebeur E.**PLoS ONE. 7(11)e47112*

- PubMedPubMedPubMed citations for these articles
- TaxonomyTaxonomyRelated taxonomy entry
- Taxonomy TreeTaxonomy Tree

- Efficiency of DNA replication in the polymerasechainreactionEfficiency of DNA replication in the polymerasechainreactionProceedings of the National Academy of Sciences of the United States of America. Nov 12, 1996; 93(23)12947PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...