NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Sittampalam GS, Coussens NP, Brimacombe K, et al., editors. Assay Guidance Manual [Internet]. Bethesda (MD): Eli Lilly & Company and the National Center for Advancing Translational Sciences; 2004-.

Cover of Assay Guidance Manual

Assay Guidance Manual [Internet].

Show details

HTS Assay Validation

, PhD, , PhD, , PhD, , MS, , PhD, , PhD, , PhD, , PhD, , PhD, , MS, , PhD, and , PhD.

Author Information

Published ; Last Update: October 1, 2012.


Assays employed in HTS and lead optimization projects in drug discovery should be rigorously validated both for biological and pharmacological relevance, as well as for robustness of assay performance. This chapter addresses the essential statistical concepts and tools needed in assay performance validation developed in the pharmaceutical industry, specifically for higher throughput assays run exclusively in 96-, 384- and 1536-well formats using highly automated liquid handling and signal detection systems. In this context, it is assumed that the biological and pharmacological validation of the assay system has already been established, and the high through put performance characteristics requires validation. This is an essential chapter for these who design and implement HTS and lead optimization assays to support SAR projects in pre-clinical drug discovery.

1. Overview

The statistical validation requirements for an assay vary, depending upon the prior history of the assay. Stability and Process studies (Section 2) should be done for all assays, prior to the commencement of the formal validation studies. If the assay is new, or has never been previously validated, then full validation is required. This consists of a 3-day Plate Uniformity study (Section 3) and a Replicate-Experiment study (Section 4). If the assay has been previously validated in a different laboratory, and is being transferred to a new laboratory, then a 2-day Plate Uniformity study (Section 3) and a Replicate-Experiment study (Section 3) are required. An assay is considered validated if it has previously been assessed by all the methods in this section, and is being transferred to a new laboratory without undergoing any substantive changes to the protocol. If the intent is to store and use data and results from the previous laborarotory, then an assay comparison study (Section 4) should be done as part of the Replicate-Experiment study. Otherwise only the intra-laboratory part of the Replicate-Experiment study (Section 4) is recommended.

If the assay is updated from a previous version run in the same facility then the requirements vary, depending upon the extent of the change in methodology, equipment, operator and reagents. Major changes require a validation study equivalent to a laboratory transfer. Minor changes require bridging studies that demonstrate the equivalence of the assay before and after the change. See Section 5 for examples of major and minor changes.

These techniques are intended to be applied to ≥ 96-well primary and functional assays. Discussion with a statistician is necessary for alternatives for assays that require significant time, resource or expenditure to scientifically balance validation requirements with these constraints.

2. Stability and Process Studies

2.1. Reagent Stability and Storage Requirements

It is important to determine the stability of reagents under storage and assay conditions.

  • Use the manufacturer’s specifications if the reagent is a commercial product.
  • Identify conditions under which aliquots of the reagent can be stored without loss of activity.
  • If the proposed assay will require that the reagent be frozen and thawed repeatedly, test its stability after similar numbers of freeze-thaw cycles.
  • If possible, determine the storage-stability of all commercial and in-house reagents.
  • If reagents are combined and aliquoted together, examine the storage-stability of the mixtures.

2.2. Reaction Stability Over Projected Assay Time

Conduct time-course experiments to determine the range of acceptable times for each incubation step in the assay. This information will greatly aid in addressing logistic and timing issues.

2.2.1. Reagent Stability During Daily Operations; Use Of Daily Leftover Reagents

The stability studies will require running assays under standard conditions, but with one of the reagents held for various times before addition to the reaction. The results will be useful in generating a convenient protocol and understand the tolerance of the assay to potential delays encountered during screening.

If possible, reagents should be stored in aliquots suitable for daily needs. However, appropriate stability information pertinent to storing leftover reagents (particularly expensive ones) for future assays should be obtained.

New lots of critical reagents should be validated using the bridging studies with previous reagent lots.

2.3. DMSO Compatibility

Test compounds are delivered at fixed concentrations in 100% DMSO, thus solvent-compatibility of assay reagents should be determined., The validated assayshould be run in the absence of test compounds and in the presence of DMSO concentrations spanning the expected final concentration. Typically, DMSO concentrations from 0 to 10% are tested. Note that the DMSO compatibility of the assay should be completed early, since remaining validation experiments such as the variability studies, should be performed with the concentration of DMSO that will be used in screening. For cell based assays, it is recommended that the final % DMSO be kept under 1%, unless demonstrated by other experiments that higher concentrations can be tolerated.

3. Plate Uniformity and Signal Variability Assessment

3.1. Overview

All assays should have a plate uniformity assessment. For new assays the plate uniformity study should be run over 3 days to assess uniformity and separation of signals, using DMSO concentration to be used in screening. For the transfer of validated assays to other laboratories, the plate uniformity study can be performed over 2 days to establish that the assay transfer is complete and reproducible (see Section 1 for the definition of an assay transfer). Uniformity assays shoul be performed at the maximum and minimum signal or response levels to ensure that the signa window is adequate to detect active compounds durin the screen.

The variability tests are conducted on three types of signals.

  • “Max” signal: This measures the maximum signal as determined by the assay design. In in-vitro assays that measure receptor-ligand binding or enzyme activity measurement with substrate conversion to products, maximum signal represents a readout signal in the absence test compounds. For cell-based agonist assays this parameter would be the maximal cellular response of an agonist; for potentiator assays “Max”signal is measuredwith an EC10 concentration of a standard agonist plus maximal concentration of a standard potentiator. The observed response is as per validated assay protocol, and may exceed 10% in some cases. For inhibition assays the “Max” signal would be signal response obtained with an EC80 concentration of a standard agonist Again the observed response is as per protocol, and may not be 80%. For inverse agonist assays “Max” signal would be the untreated constitutively active response of the cells in the presence of DMSO alone.
  • “Min” signal: This measures the background signal as determined by the assay s design. In in-vitro assays that measure of receptor-ligand binding or enzyme activity measurement with substrate conversion to products, minimum signal represents a readout signal in the absence test compounds, labeled ligand or the enzyme substrate. For cell-based agonist assays this is the basal signal. For potentiator assays this is an EC10 concentration of agonist. For inhibitor assays, including receptor-binding assays, this is an EC80 concentration of the standard agonist plus a maximally inhibiting concentration of a standard antagonist (preferred) or unstimulated reaction.
  • “Mid” signal: This parameter estimates the signal variability at some point between the maximum and minimum signals. Typically, for agonist assays the mid-point is reached by adding an EC50 concentration of a full agonist/activator compound; for potentiator assays it is an EC10 concentration of agonist plus EC50 concentration of a potentiator; and for inhibitor assays it is an EC80 concentration of an agonist plus an IC50 concentration of a standard inhibitor to each well. In receptor-ligand binding or enzyme activity measurement based assays, this parameter represents the mid-point signal measured using EC50 concentration of a control compound.

    If calibration of the analytical signals is required for compound activity measurements, the Max, Min and Mid signals correspond to calibration curve responses, and not the raw plate reader counts. It is a requirement that the raw signals lie within the range of the calibration curve. Not more than 1-2% of the wells should be outside the calibration range (i.e. above the fitted top or below the fitted bottom of the calibration curve).

Two different plate formats exist for the plate uniformity studies: an Interleaved-Signal format where all signals are on all plates, but varied systematically so that o on all plates, on a given day, each signal is measured in each plate. This approach can also be used in concentration-response curve (CRC) plate formats where a reference compound is tested at multiple concentrations with production control wells (Max and Min). Uniform signal plates for “Max” and “Min” are also included where each signal is run uniformly across entire plates. The Interleaved-Signal format can be used in all instances and requires fewer plates. The CRC format is usually easier to run, since it closely conforms to the production process for the assay, including automated pipeting. It is also useful for detecting non-uniform signals, but usually takes more plates in total and the data analysis is more complex. The uniform-signal plates should be interpreted with caution if signals vary across plates on a given day. See Section 3.3.5 for examples and more information.

3.2. Interleaved-Signal Format

3.2.1. Procedure

The following plate layout is recommended for which the Excel analysis templates have been developed. These layouts have a combination of wells producing “Max”, “Min”, and “Mid” signals on a plate with proper statistical design. Use the same plate formats on all days of the test. Do not change the concentration producing the mid point signal over the course of the test. See Section 3.2.4 for a further discussion about midpoint accuracy. The trials should use independently prepared reagents, and preferably be run on separate days. Data Analysis Templates are available (from the online eBook) for 96- and 384-well plates.

Plate 1


H=Max, M=Mid, L=Min

Plate 2


H=Max, M=Mid, L=Min

Plate 3


H=Max, M=Mid, L=Min

3.2.2. Summary Signal Calculations and Plate Acceptance Criteria

Calculations and acceptance criteria are described below. The overall requirement is that the raw signals are sufficiently tight and that there is significant separation between the “Max” and “Min” signals to conduct screening.


Outliers should be flagged with an asterisk in the plate input section. The outliers should be “obvious”, and the rate of outliers should be less than 2 percent (i.e. on average less than 2 wells on a 96-well plate, 8 wells on a 384-well plate).


Compute the mean (AVG), SD, and CV (of the mean) for each signal (Max, Min and Mid) on each plate. Note that the CV is calculated taking into account the number of wells per test compound per concentration to beused in the production assay. For example, if in the production assay duplicate wells will be run for each concentration of each test substance, then Image htsvalidation-Image001.jpg. More generally, if there will be n wells per test compound per concentration, then Image htsvalidation-Image002.jpg. The acceptance criterion are that the CV’s of each signal be less than or equal to 20%. Note that the Min-signal often fails to meet this criterion, especially for those assays whose Min- signal mean is very low. An alternate acceptance criterion for the Min- signal is SDmin ≤ both SDmid and SDmax. All plates should pass all signal criteria (ie all Max, and Mid- signals should have CV’s less than 20% and all“Min” signals should either pass the CV criteria or the SD criteria).

For each of the “Mid”-signal wells, compute a percent activity for agonist or stimulation assay relative to the means of the (Max and Min- signals on that plate, i.e.,

Image htsvalidation-Image003.jpg.

For inhibition assays compute percent inhibition for each mid-signal well, where %Inhibition = 100 - %Activity.

3. Compute the mean and SD for the “Mid”-signal percent activity values on each plate. The acceptance criterion is SDmid ≤ 20 on all plates.

4. Compute a Signal Window (SW) or Z’ factor (Z’) for each plate, as described below. The acceptance criterion SW ≥ 2 or Z’ ≥ 0.4 on all plates (either all SW’s ≥ 2 or all Z’ ≥ 0.4).

The formula for the signal window is

Image htsvalidation-Image004.jpg ,

where n is the number of replicates of the test substance that will be used in the production assay. Instead of the SW the Z’ factor can be used to evaluate the signal separation, where the only difference is the denominator (AVGmax – AVGmin) is used instead of SDmax. The complete formula is

Image htsvalidation-Image005.jpg

If one assumes that the SD of the “Max”, signal is at least as large as the SD of the “Min” signal, then the Z’ factor will be within a specific range for a given signal window, as illustrated in Figure 1. Note that Z’ values greater than 1 are possible only if AVGmax < AVGmin, and so the templates also check that all Z’ values are less than 1.

Figure 1: . Z-Factor interval versus Signal Window.

Figure 1:

Z-Factor interval versus Signal Window.

The recommended acceptance criterion is Z’ factor ≥ 0.4, which is comparable to a SW ≥ 2. Either measure could be used.

3.2.3. Spatial Uniformity Assessment

A scatter plot (see examples below) can reveal patterns of drift, edge effects and other systematic sources of variability. The response is plotted against well number, where the wells are ordered either by row first, then by column, or by column first, then by row. The overall requirement is that plates do not exhibit material edge or drift effects. In general drift or edge effects < 20% are considered insignificant. Effects seen only on a single or few plates, and not a predominant pattern, are also considered insignificant. Some guidelines for detecting and dealing with these problems follow. No drift or edge effects

Figure 2 (A, B) shows two plots (of the same data) show an example where there are no edge effects or drift.

Figure 2 A and B: . Example with no drift or edge effects.

Figure 2 A and B:

Example with no drift or edge effects. The data in both plots come from the same set. Drift

Use the Max and Mid signals to assess for drift. Look for significant trends in the signal from left-to-right and top-to-bottom. If you observe drift that exceeds 20% then you have material drift effects. In Figure 3 (A, B) the mean of column 1 is 10.6, while the mean of column 10 is 13.8, and the overall mean is 12.2. The drift is 26% [(13.8-10.6)/12.2], and therefore the cause of this drift should be investigated.

Figure 3 A and B: . Example of material drift effects.

Figure 3 A and B:

Example of material drift effects. The data in both plots come from the same set. Edge Effects

Edge effects can contribute to variability, and spotting them can be a helpful troubleshooting technique. Edge effects are sometimes due to increased evaporation from outer wells that are incubated for long periods of time. Edge effects can also be caused either by a short incubation time or by plate stacking – these conditions allow the edge wells to reach the desired incubation temperature faster than the inner wells. Edge effects are demonstrated in Figure 4.

Figure 4 A and B: . Example of edge effects.

Figure 4 A and B:

Example of edge effects. The data in both plots come from the same set. N.B., Because of the vertical axis scale, problems in the Min and even Mid signals may not be visible. Adjusting the scale to highlight the edge effect may be necessary to properly (more...)

3.2.4. Inter-Plate and Inter-Day Tests

The normalized Mid signal should not show any significant shift across plates or days. “Significant shift” depends to a certain extent on the typical slopes encountered in dose response curves. Thus plate-to-plate or day-to-day variation in the mid point percent activity needs to be assessed in light of the steepness of the dose-response curves of the assay. For receptor binding assays, and other assays with a slope parameter of 1, a 15% difference can correspond to a two-fold change in potency. The template will translate the mean normalized Mid signal to potency shifts across plates and days. There should not be a potency shift >2 between any two plates within a day, or >2 between any two average day Mid point % activities. For functional assays whose slopes may not equal 1 you can enter a “typical” slope into the template. This should be derived from the slope of a dose-response curve for the substance used to generate the Mid point signal.

For these calculations to have utility the Mid-point % Inhibition/Activity should be “near” the midpoint. Values within the range of 30-70% are ideal. Studies with mean values outside this range should be discussed with a statistician, especially before any studies are repeated solely for this reason. Also note that the conditions used to obtain the midpoint should not be changed over the course of the plate uniformity study.

3.2.5. Summary of Acceptance Criteria


Intra-plate Tests: Each plate should have a
CVmax and CVmid ≤ 20%,
CVmin ≤ 20% or SDmin ≤ min(SDmid, SDmax),
Normalized SDmid ≤ 20,
SW ≥ 2 or Z’ ≥ 0.4.


No material (or insignificant) edge, drift or other spatial effects. Note that the templates do not check this criterion.


Inter-plate and Inter-Day Tests: The normalized average Mid-signal should not translate into a fold shift
> 2 within days,
> 2 across any two days.

3.2.6. 384-well Plate Uniformity Studies

384-well plates contain 16 rows by 24 columns, and one 384-well plate contains the equivalent of four 96-well plates. Two different formats of interleaved plate uniformity templates have been developed. The first layout expands the 96-well plate format into four squares (Figure 5).

Figure 5: . Standard Interleaved 384-well Format Plate Layouts.

Figure 5:

Standard Interleaved 384-well Format Plate Layouts.

The second is useful for assays using certain automation equipment such as Tecan and Beckman (Figure 6). For these instruments column 1 of the 96-well plate corresponds to columns 1 and 2 of the 384-well plate, and is laid out in 8 pairs of columns.

Figure 6:

Figure 6:

HHMMLL 384-well Plate Uniformity Plate Layouts

The analysis and acceptance criteria are exactly the same as for 96-well format Plate Uniformity Studies. See Section 3.2.5 for a summary of the acceptance criterion.

3.3. Concentration-Response Curve (CRC) plus Uniform-Signal Plate Layouts

CRC plus Uniform-Signal plate layouts are an alternative format to conduct the plate uniformity studies. Their main advantage is easier execution since it is more amenable to automated pipetting per the production process for the assay. In the uniform-signal plates, all wells on each plate are exactly the same, and together with heat maps provide for a straightforward assessment of spatial properties. Optionally, the opposite production controls (Max and Min) can be included on the uniform-signal plates to assess with-in plate Z factors. The CRC plates have a single reference compound serially diluted across every row or every column, along with Max and Min-control wells, per the planned production layout.

3.3.1. Procedure

Max and Min signals are prepared as defined in Section 3.1. Two plates are run for each signal. Two CRC plates are also run, for a total of six plates per day. The number of days required is the same as the Interleaved-Signal layout: three days for new assays, two days for transfers of previously validated assays.

3.3.2. Summary Calculations and Plate Acceptance Criterion

The calculations will be performed by the Excel template provided. Details of the calculations are as follows:


Compute the mean (AVG), standard deviation (SD) and Coefficient of Variation (CV) for the Max and Min plates (as per the Interleaved-Signal format the CV’s should reflect the number of wells per test-condition envisioned in the production assay). Also compute these quantities using the Max and Min control wells on the CRC plates. Requirements are the same as for Interleaved-Signal format: The CV of each plate should be less than 20%. Alternatively, the Min plates should have SD ≤ smaller of SDmid and SDmax.


For the CRC plates, determine which concentration has mean activity across all plates that is closest to 50%. Designate this concentration as the “Mid” signal. Then compute the percent activity for agonist or stimulation assays, and percent inhibition for antagonist or inhibition assays (including binding assays). In this format the calculation is

Image htsvalidation-Image003.jpg

3. where AVGmin is the average of the on-plate Min control wells, and AVGmax is the average of the on-plate Max control wells. Percent Inhibition = 100 - %Activity.

4. Compute the SD of the normalized signals on each Mid signal. The acceptance criterion is SD%mid ≤ 20.

5. Compute the Z’ factor and/or the SW for each plate where Max and Min controls are present, and for the Max and Min uniform plates, if the opposite production control wells were included. Z’ factor and SW can also be calculated within a day across plates, pooling all of the Max and Min wells, but this is only useful if the raw signal levels are consistent between plates. The formulas are the same as in Section 3.2.2. The acceptance criterion is either all Z’ ≥ 0.4 or all SW ≥ 2.

3.3.3. Spatial Uniformity Assessment

Heat maps or similar types of plots can be generated for the raw plate data. The criterion for acceptance is the same as for the interleaved format: No drift or edge effects that exceed 20% of the mean. Also as in the Interleaved-Signal format, the presence of these effects should be apparent as the predominant effect, and not seen just in single isolated plate for the assay to be failed by this criterion.

Using the Max and Min uniform plates, Figure 7 illustrates a spatially uniform result, an edge effect, and a drift effect. Uniform Mid signal plates were used in this example instead of CRC plates. Day 1 shows an acceptably uniform result. Day 2 shows an assay with a significant edge effect (25% from the mean edge value to the mean of the interior), and Day 3 shows an assay with significant drift (25% change in mean value from left to right as compared to the average in the middle). If patterns are similar or worse than those depicted in Day 2 or Day 3 then the assay does not pass the spatially uniform requirement.

Figure 7: . Illustrates a spatially uniform result (Day 1), an edge effect (Day 2), and a drift effect (Day 3).

Figure 7:

Illustrates a spatially uniform result (Day 1), an edge effect (Day 2), and a drift effect (Day 3). Mid signal plates were used in place of CRC plates.

3.3.4. Inter-Plate and Inter-Day Tests

The Inter-plate and inter-day tests are the same as in Section 3.2.4, except the definitions of % Activity and % Inhibition defined above (Section 3.3.1) are used in the tests. In addition, EC/IC50s can be calculated from the CRC data, and fold shifts in mean EC/IC50 between plates and days can be examined directly to assess variability. In addition, a preliminary Minimum Significant Ratio (MSR) can be calculated from the CRC plates, as follows. An EC/IC50 can be calculated for each row on each plate. For higher density plates (384-wells and above), there could be more than one CRC curve fit per row. Compute the standard deviation (S) of all the Log IC50s across plates and days (Alternatively, a variance components analysis could be used to separate within-plate, within-day and between-day variability). The MSR is 10^(2*sqrt(2)*S), and this value is the smallest ratio between two IC50s that is statistically significant. A common criterion is for the MSR to be less than 3. This MSR should be considered a preliminary and optimistic estimate of the assay MSR, because it is based on a single compound and because the IC50s derived from different rows on the same plate, or even different plates on the same day, will most often not reflect all of the sources of variability in the assay.

3.3.5. Impact of Plate Variation on Validation Results

Uniform-Signal plates in the CRC format make the assumption that plate variation within each run day is negligible. If this assumption is not correct, then some of the diagnostic tests described here will be misleading. In that case, one should either include opposite production control wells (Max and Min) on the uniform-signal plates, or use the Interleaved-Signal format instead. In particular, Z’ factors and/or Signal Windows computed across plates or across days may be incorrect in either direction, and the Inter-plate and Inter-Day tests could unnecessarily fail acceptable assays.

The following example illustrates the problem. The raw signals of one day of an Interleaved-Signal format Plate Uniformity Study are shown in Figure 8. The Max and Mid raw signals vary across the 3 plates (Figure 8, Plates 1-3), but note that the % Activity is very stable across the 3 plates (Figure 9, Plates 1-3). The maximum fold shift across plates is 1.2.

Figure 8: . Raw data values for 3 plates of an Interleaved-Signal Plate Uniformity Study.

Figure 8:

Raw data values for 3 plates of an Interleaved-Signal Plate Uniformity Study. Plates 1-3 show the actual plate values obtained on one day of the test.

Table 1: . Reference for the number of replicates necessary for assays with high variability.

Table 1:

Reference for the number of replicates necessary for assays with high variability.

Table 3: . MSR and LsA evaluation of Replicate-Experiment study using both replicates, and then re-calculated using just the first replicate.

Table 3:

MSR and LsA evaluation of Replicate-Experiment study using both replicates, and then re-calculated using just the first replicate.

Figure 9: . Normalized midpoint values for 3 plates of an Interleaved-Signal Plate Uniformity Study.

Figure 9:

Normalized midpoint values for 3 plates of an Interleaved-Signal Plate Uniformity Study. Plates 1-3 show the actual plate midpoints normalized to the on-plate controls. Plates 4-6 show the same mid points all normalized to the Plate 3 Min and Max controls. (more...)

The Midpoint Percent Activity plot (Figure 9) shows what happens when on-plate Max and Min controls are not used. The three left-hand panels show the plates, normalized to their own controls while, to mimic the Uniform-Signal protocol with its off-plate controls, the three right-hand panels of Figure 9 show each plate’s mid signal normalized to the plate 3 controls:

Plate 4 shows the plate 1 mid signal data normalized to the plate 3 Max and Min signals,

Plate 5 shows the plate 2 mid signal data normalized to the plate 3 Max and Min signals and

Table 2: . MSR and LsA evaluation of Replicate-Experiment study using the first 2 replicates, first 3 replicates, and all 4 replicates.

Table 2:

MSR and LsA evaluation of Replicate-Experiment study using the first 2 replicates, first 3 replicates, and all 4 replicates.

Plate 6 is the same as plate 3.

In the presence of variation in the uniform-signal plates, off-plate controls do not effectively normalize the assay. As Figure 9 shows, plate-to-plate variation in the raw signals can induce the appearance of significant mid-point variation when in fact there is little variation in signals properly normalized to on-plate controls. In this example using off-plate controls, Plates 1-3 have a max fold shift of 2.0 which does not pass the inter-plate acceptance criterion.

4. Replicate-Experiment Study

4.1. Overview

It is important to verify that the assay results are reproducible, i.e. that the variability of key end points of the assay are acceptably low. In addition, if the assay is to report results previously reported by another assay, then it is necessary to verify that the two labs produce equivalent results. In this section, we define how to quantify assay variability and determine assay equivalence. It is important to read the entire section below to understand the rationale for the statistical methods employed in calculating reproducibility of potency and efficacy. We strongly recommend consultation with a statistician before designing experiments to estimate variability described below.

4.1.1. Rationale

Replicate-Experiment studies are used to formally evaluate the within-run assay variability and formally compare the new assay to the existing (old) assay. They also allow a preliminary assessment of the overall or between-run assay variability, but two runs are not enough to adequately assess overall variability. Post-production methods (Section 3) are used to formally evaluate the overall variability in the assay. Note that the Replicate-Experiment study is a diagnostic and decision tool used to establish that the assay is ready to go into production by showing that the endpoints of the assay are reproducible over a range of potencies. It is not intended as a substitute for post-production monitoring or to provide an estimate of the overall Minimum Significant Ratio (MSR).

It may seem counter-intuitive to call the differences between two independent assay runs as “within-run” variability. However, the terminology results from how assay runs are defined. Experimental variation is categorized into two distinct components: between-run and within-run sources. Consider the following examples:

  • If there is variation in the concentrations of buffer components between 2 runs, then the assay results could be affected. However, assuming that the same buffer is used with all compounds within one run, each compound will be equally affected and so the difference will only show up when comparing one run to another run, i.e. in two runs, one run will appear higher on average than the other run. This variation is called between-run variation.
  • If the concentration of a compound in the stock plate varies from the target concentration then all wells where that compound is used will be affected. However, wells used to test other compounds will be unaffected. This type of variation is called within-run as the source of variation affects different compounds in the same run differently.
  • Some sources of variability affect both within- and between-run variation. For example, in a FLIPR assay cells are plated and then incubated for 24-72 hours to achieve a target cell density taking into account the doubling time of the cells. For example, if the doubling time equals the incubation time, and the target density is 30,000 cells/well, then 15,000 cells/well are plated. But even if exactly 15,000 cells are placed in each well there won’t be exactly 30,000 cells in each well after 24 hours. Some will be lower and some will be higher than the target. These differences are within-run as not all wells are equally affected. But also suppose in a particular run only 13,000 cells are initially plated. Then the wells will on average have fewer than 30,000 cells after 24 hours, and since all cells are affected this is between-run variation. Thus cell density has both within- and between-run sources of variation.

The total variation is the sum of both sources of variation. When comparing two compounds across runs, one must take into account both the within-run and between-run sources of variation. But when comparing two compounds in the same run, one must only take into account the within-run sources, since, by definition, the between-run sources affect both compounds equally.

In a Replicate-Experiment study the between-run sources of variation cause one run to be on average higher than the other run. However, it would be very unlikely that the difference between the two runs were exactly the same for every compound in the study. These individual compound “differences from the average difference” are caused by the within-run sources of variation. The higher the within-run variability the greater the individual compound variation in the assay runs.

The analysis approach used in the Replicate-Experiment study is to estimate and factor out between-run variability, and then estimate the magnitude of within-run variability.

4.2. Procedure

All assays should have a reproducibility comparison (Steps 1-3). If the assay is to replace an existing assay and combine the data, then an assay comparison study should also be done (Steps 4 and 5).


Select 20-30 compounds that have potencies covering the concentration range being tested and, if applicable, efficacy measures that cover the range of interest. The compounds should be well spaced over these ranges.


All compounds should be run in each of two runs of the assay.


Compare the two runs as per Section 4.3-4.6.


Assay comparison: Same compound set should be run in a single run of the previous assay.


Compare the results of the two assays by analyzing the first run of the new assay with the single run of the previous assay.

4.3. Analysis (Potency)

For the reproducibility comparison, paste potency values from the two runs into the Run-1 and Run-2 data columns. All tests are analyzed and computed by the spreadsheet, which also provides additional plots and diagnostics to assist in judging the results.

For the assay comparison study, paste the potency values for the first run of the new assay into the Run-1 column and the potency values for the (single) run of the previous assay into the Run- column. Potency values should be calculated according to the methods of Section 3. A template for the Replicate-Experiment data analysis is available for download from the online eBook.

The points below describe and define the terms used in the template and the acceptance criterion discussed in the Diagnostic Tests section below.


Compute the difference in log-potency (= first – second) between the first and second run for each compound. Let Image htsvalidation-Image006.jpg be the sample mean and standard deviation of the difference in log-potency. Since ratios of EC50 values (relative potencies) are more meaningful than differences in potency, we take logs in order to analyze ratios as differences. (Note: hypothetical potency values of 1 and 3, 10 and 30, 100 and 300 have the same ratio but not the same difference),


Compute the Mean-Ratio: Image htsvalidation-Image007.jpg. This is the geometric average fold difference in potency between two runs.


Compute the Ratio Limits: Image htsvalidation-Image008.jpg, where n is the number of compounds. This is the 95% confidence interval for the Mean-Ratio.


Compute the Minimum Significant Ratio: Image htsvalidation-Image009.jpg. This is the smallest potency ratio between two compounds that is statistically significant.


Compute the Limits of Agreement: Image htsvalidation-Image010.jpg. Most of the compound potency ratios (approximately 95%) should fall within these limits.


For each compound compute the Ratio (=first/second) of the two potencies, and the Geometric Mean potency: Image htsvalidation-Image011.jpg.

Items 2-6 can be combined into one plot: the Ratio-GM plot. An example is in Figure 10. The points represent the compounds; the blue-solid, green long-dashed and red short-dashed lines represent the MR, RLs and LsA values respectively.

Figure 10: . Potency Ratio versus GM Potency.

Figure 10:

Potency Ratio versus GM Potency. This is a typical example for an acceptable assay: The MR=0.90, RLs=(0.78-1.03) [contains the value 1.0], MSR=1.86 [under 3.0], LsA=(0.48-1.67) [between 0.33 and 3.0]. The blue-solid, green long-dashed and red short-dashed (more...)

Figure 10 shows the desired result of pure chance variation in the difference in activities between runs. The blue solid line shows the geometric mean potency ratio, i.e. the average relationship between the first and second run. The green long-dashed lines show the 95% confidence limits of the mean ratio. These limits should contain the value 1.0, as they do in this case. The red short-dashed lines indicate the limits of agreement between runs. They indicate the individual compound variation between the first and second run. All or almost all, the points should fall within the red dashed lines. The lower line should be above 0.33, while the upper line should be below 3.0, which indicates a 3-fold difference between runs in either direction. The MSR should be less than 3.0, as it is in this example.

4.4. Diagnostic Tests and Acceptance Criterion (Potency)


If the MSR ≥ 3 then there is poor individual agreement between the two runs. This problem occurs when the within-run variability of the assay is too high (Figure 11A). An assay meets the MSR acceptance criterion if the (within-run) MSR < 3.


If Ratio limits do not contain the value 1, then there is a statistically significant average difference between the two runs. Within a lab (Step 3) this is due to high between-run assay variability. Between labs (Step 4), this could be due to a systematic difference between labs, or high between-run variability in one or both labs. See Figure 11B for an illustration. Note that it is possible with a very “tight” assay (i.e. one with a very low MSR) or with a large set of compounds to have a statistically significant result for this test that is not very material, i.e., the actual MR is small enough to be ignorable. If the result is statistically significant then examine the MR. If it is between 0.67 and 1.5 then the average difference between runs is less than 50% and is deemed immaterial. However, in Figure 2(b) the MR=2.01, indicating a 101% difference between runs, which is too high to be considered “equivalent”. Note that there is no direct requirement for the MR, but values that are this extreme are unlikely to pass the Limits of Agreement criterion in step 3 below.


The MR and the MSR are combined into a single interval referred to as the Limits of Agreement. An assay that either has a high MSR and/or an MR different from 1 will tend to have poor agreement of results between the two runs. An assay meets the Limits of Agreement acceptance criterion if both the upper and lower limits of agreement are between 0.33 and 3.0. Note that assays depicted in both Figure 11 A and B do not have Limits of Agreement inside the acceptance region and thus do not meet the acceptance criterion.

Figure 11: . Potency Ratio vs.

Figure 11:

Potency Ratio vs. GM Potency. (A) Shows a case where the within-run variability is too large (MR= 0.8, RLs= (0.61-1.07), MSR= 3.54, and LsA= (0.23-2.84), and (B) shows a case where the LsA are outside the acceptable range because the Mean Ratio is too (more...)

4.5. Analysis (Efficacy)

The points below describe and define the terms used in the template and the acceptance criterion discussed in the Diagnostic Tests section. Note that the methods described here are intended for functional full/partial assays and non-competitive antagonist assays. Some potentiator assays, as well as assays normalized by fold stimulation may best be analyzed with the techniques described in the potency section rather than the methods described here. Consult a statistician for the best method of analysis.


Compute the difference in efficacy (= first – second) between the first and second run for each compound. Let Image htsvalidation-Image006.jpg be the sample mean and standard deviation of the difference in efficacy.


Compute the Mean-Difference: Image htsvalidation-Image012.jpg. This is the average difference in efficacy between the two runs.


Compute the Difference Limits: Image htsvalidation-Image013.jpg, where n is the number of compounds. This is a 95% confidence interval for the Mean-Difference.


Compute the Minimum Significant Difference: Image htsvalidation-Image014.jpg. This is the smallest efficacy difference between two compounds that is statistically significant.


Compute the Limits of Agreement: Image htsvalidation-Image015.jpg. Most of the compound efficacy differences should fall within these limits (approximately 95%).


For each compound compute the Difference (= first-second) of the two efficacies, and the Mean efficacy (average of first and second).

Items 2-6 can be combined onto one plot: the Difference-Mean plot (not shown). The plot is very similar to the Ratio-GM plot except that both axes are on the linear scale instead of the log scale.

4.6. Diagnostic Tests (Efficacy)

Generally the same two problems discussed under potency need to be judged for efficacy as well. However, a general acceptance criterion for efficacy has not been established as there is not a consensus on efficacy standards and for most projects potency is the primary property of interest. As guidelines, the MD should be less than 5 (i.e., less than 5% average difference between runs) and the MSD should be less than 20 (e.g., 20% activity). More importantly, the MD and MSD should be used to judge the appropriateness of any efficacy CSF’s a project may have. For example, if the CSF for efficacy is >80%, and the MSD is 30%, then the assay will fail too many efficacious compounds - a 90%-active compound would fall below the CSF 25% of the time. A more appropriate CSF in this situation would be 70 or even 60%.

4.7. Summary of Acceptance Criteria


In Step 3, conduct reproducibility and equivalence tests for potency comparing the two runs in the new lab. The assay should pass both tests (MSR < 3 and both Limits of Agreement should be between 0.33 and 3.0).


In Step 5, conduct reproducibility and equivalence tests for potency comparing the first run of the new lab to the single run of the old lab. The assays should pass both tests to be declared equivalent (Limits of Agreement between 0.33 and 3.0).


For full/partial agonist assays and non-competitive antagonist assays, repeat points 1 and 2 for efficacy. Use the informal guidelines discussed above, and project efficacy CSFs to judge acceptability of results.

4.8. Notes


If a project is very new, there may not be 20-30 unique active compounds (where active means some measurable activity above the minimum threshold of the assay). In that case it is acceptable to run compounds more than once to get an acceptable sample size. For example, if there are only 10 active compounds then run each compound twice. However, when doing so, (a) it is important to biologically evaluate them as though they were different compounds, including the preparation of separate serial dilutions, and (b) label the compounds “a”, “b” etc. so that it is clear in the test-retest analyses which results are being compared across runs.


Functional assays need to be compared for both potency (EC50) and efficacy (%maximum response). This may well require a few more compounds in those cases.


In binding assays, it is best to compare Ki’s, and in functional antagonist assays it is best to compare Kb’s.


An assay may pass the reproducibility assessment (Steps 1-3 in the procedure [Section 4.2.]), but may fail the assay comparison study (Steps 4-5 in the procedure [Section 4.2]). The assay comparison study may fail either because of a MR different from 1 or a high “MSR” in the assay comparison study. If it’s the former then there is a potency shift between the assays. You should assess the values in the assays to ascertain their validity (e.g. which assay’s results compare best to those reported in the literature?). If it fails because the Lab Comparison study is too large (but the new assay passes the reproducibility study) then the old assay lacks reproducibility. In either case, if the problem is with the old assay, then the team should consider re-running key compounds in the new assay to provide comparable results to compounds subsequently run in the new assay.

5. How to Deal with High Assay Variability

5.1. High Variation in Single Concentration Determinations

Table 1 can be used as a reference to determine the number of replicates necessary for assays with high variability. For a given CV of the raw data values based on 1 well, it shows the number of replicates needed for the CV of a mean to be less than or equal to 10 or 20%. This table does not indicate how the IC50/Ki/Kb variability will be affected (See Section 5.2 for high variation in IC50/Ki/Kb responses).

Adding replicates to reduce variability will also reduce the capacity (i.e., throughput) of the assay to test compounds. Further optimization of the assay could reduce variability and maintain or increase its capacity. The decision to further optimize or add replicates will have to be made for each assay.

5.2. Excess Variation in Concentration-Response Outcomes (EC50, IC50, Ki, or Kb)

If in Section 4 the assay fails either test (MSR > 3 or Limits of Agreement outside the interval 1/3-3) then the variability of the assay is too high. The following options should be considered to reduce the assay variability:


Optimizing the assay to lower variability in the signal (see Section 6) of the raw data values. Check that the concentration range is appropriate for the compound results. Adding more concentrations and/or replicates may improve the results. A minimum of 8 concentrations at half-log intervals is recommended. In general, it is better to have more concentrations (up to 12) rather than more replicates.


Consider adding replicates as discussed below. Note that the impact of adding replication may be minimal, and so the Replicate Experiment Study should be used to assess whether increasing the number of replicates will achieve the objective.


Adopt as part of the standard protocol to re-run results. For example, each compound may be tested once per run on 2 or more runs. Then averaging the results will reduce the assay variability (Note: In such cases the individual run results are stored in the database, and tthe data mining/query tools are used to average the results).

To investigate the impact of adding replicate wells in the concentration-response assay conduct the Replicate-Experiment study with the maximum number of wells contemplated (typically 3-4 wells / concentration). To examine the impact of replication compute the MSR versus number-of-replicates curve. To construct this curve, make all data calculations using only the first replicate of each concentration to evaluate the MSR and Limits of Agreement for 1 well per concentration. Then repeat all calculations using the first two replicates per concentration, and so on until you are using all replicates. If the assay does not meet the acceptance criterion when all replicates are used then replication will not sufficiently impact the assay to warrant the replication. If it does meet the criterion using all replicates, ascertain how many replicates are needed by noting the smallest number of replicates that are required to meet the Replicate-Experiment acceptance criterion. Two examples below will help illustrate the steps.

Example 1: A binding assay was run using 1 well per concentration and the Replicate-Experiment study did not meet the acceptance criterion. To examine if replication would help, a new Replicate-Experiment study was conducted using 4 wells per concentration. Using only the first replicate from each concentration, the results were normalized, curves fit, and Ki’s were calculated for each concentration-response curve. The MSR and LsA were evaluated. The entire calculation steps were repeated using the first 2 replicates, first 3 replicates and all 4 replicates, with the results listed in Table 2.

From Table 2, it is evident that i all 4 replicates are needed to meet the MSR acceptance criterion, and no amount of replication (up to 4 replicates) will meet LsA acceptance criterion.

Example 2: In a second study, a pair of uptake inhibition assays (the project had two targets, each measured by one assay) the Plate Uniformity Study indicated two replicates would be required to meet the Plate Uniformity Signal acceptance criteria in Assay 2. However, plate uniformity criteria concerning replication do not readily translate to dose-response requirements, and so the requirements were investigated in both assays. The Replicate-Experiment Study was conducted using two replicates. The calculations were performed using both replicates, and then re-calculated using just the first replicate. The MSR and LsA are summarized in Table 3.

Using two replicates both assays meet all acceptance criterion. Using only a single replicate, Assay 1 still meets all criteria, while Assay 2 does not. Note that in this instance both assays benefited from increased replication. However, Assay 1 is a very tight assay and hence this benefit is not really needed. So in this example, the replication requirements were the same for both single dose screening and concentration -response studies, but in general this will not be the case.

6. Bridging Studies for Assay Upgrades and Minor Changes

6.1. Overview

Sections 3 and 4 cover the validation of entirely new assays, or assays that are intended to replace existing assays. The replacement assays are “different” from the original assay, either because of facility changes, personnel differences, or substantively different reagents, detection and automation equipment. Assay upgrades and changes occur as a natural part of the assay life cycle. Requiring a full validation for every conceivable change is impractical and would serve as a barrier to implementing assay improvements. Hence full validation following every assay change is not recommended. Instead bridging studies or “mini-validation” studies are recommended to document that the change does not degrade the quality of the data generated by the new assay.

The level of validation recommended has 3 tiers: A small plate uniformity study (Tier I), to an assay comparison portion of the Replicate-Experiment study (Tier II), to a full validation package of Sections 3 and 4 (Tier III). Examples of changes within each Tier are given below, along with the recommended validation study for that Tier. Note that if the study indicates the change will have an adverse impact on assay quality (i.e. the study indicates there are problems), then the cause should be investigated and a full (Tier III) validation should be done. If the results from that study indicate the assays are not equivalent, but the new assay has to be implemented, then the results should not be combined into one set.

The following applies principally to changes in biological components of the protocol. If changes are made to the data analysis protocol then these can ordinarily be validated without generating any new data, by comparing the results using the original and new data analysis protocols on a set of existing data. Discuss any changes with a statistician. If changes are made to both the data analysis and biological components of the protocol, then the appropriate Tier should be selected according to the severity of the biological change as discussed below. The data analysis changes should be validated on the new validation data and any additional validation work may be needed as judged by the statistician.

6.2. Tier I: Single Step Changes to the Assay

Tier I modifications are single changes in an assay such as a change to a reagent, instrumentation, or assay condition that is made either to improve the assay quality or increase the capacity without changing the assay quality. Changes can also be made for reasons unrelated to assay throughput or performance (e.g. change of a supplier for cost savings). Examples of such changes are

  • Changes in detection instruments with similar or comparable optics and electronics. E.g.: plate readers, counting equipment, spectrophotometers. A performance check for signal dynamic range, and signal stability is recommended prior to switching instruments.
  • Changes in liquid handling equipment with similar or comparable volume dispensing capabilities. Volume calibration of the new instrument is recommended prior to switching instruments. [Note that plate and pipette tip materials can cause significant changes in derived results (IC50, EC50). This may be due to changes in the adsorption and wetting properties of the plastic material employed by vendors. Under these conditions a full validation may be required].

The purpose of the validation study is to document the change and not reduce the assay quality (see Figure 12).

Figure 12: . Tier I validation study comparing manual pipetting (plates 1 and 2) versus multidrop pipetting (plates 3 and 4) in GTPγS assay.

Figure 12:

Tier I validation study comparing manual pipetting (plates 1 and 2) versus multidrop pipetting (plates 3 and 4) in GTPγS assay.

6.2.1. Protocol

Conduct a 4-plate Plate Uniformity Study using the layouts in the “2 Plates per Day” tab of the Plate Uniformity Template (the layouts are the same as Plates 1 and 2 of Section 3.2). Plates 1 and 2 should be done using the existing protocol, and Plates 3 and 4 done using the new protocol on the same day using the same reagents and materials (except for the intentional change). Use the 2-Day / 2-Plates per Day template to conduct the analysis.

6.2.2. Analysis

The main analysis is a visual inspection of the “all plates” plots to ensure that the signals have not changed in either magnitude and/or variability. The mean and SD calculations for each plate can help, but visual inspection is usually sufficient.

6.2.3. Example

An assay was changed by replacing a manual pipetting step with a multidrop instrument. A 4-plate Plate Uniformity study was run as per the protocol, with the manual pipetting done in plates 1 and 2, and the multidrop in plates 3 and 4. The results show that the mean percent activity is the same and the multidrop’s varability superior (i.e. lower) to the manual pipetting (Figure 12).

6.3. Tier II: Minor Assay Changes

Tier II changes are more substantive than Tier I changes, and have greater potential to directly impact EC50/IC50 results. Examples of such changes are

  • Changes in dilution protocols covering the same concentration range for the concentration–response curves. A bridging study is recommended when dilution protocol changes are required.
  • Lot changes of critical reagents such as a new lot of receptor membranes or a new lot of serum antibodies.
  • Assay moved to a new laboratory without major changes in instrumentation, using the same reagent lots, same operators and assay protocols.
  • Assay transfer to an associate or technician within the same laboratory having substantial experience in the assay platform, biology and pharmacology. No other changes are made to the assay.

6.3.1. Protocol and Analysis

Conduct the assay comparison portion of the Replicate Experiment Study discussed in Section 4, i.e. compare one run of 20-30 compounds of the assay using the existing assay to one run of the assay under the proposed format and compare the results. If the compound set used in the original validation is available then run the same set again in the new assay protocol, and compare back to Run-1 of the original Replicate-Experiment Study. The acceptance criterion is the same as for the assay comparison study: Both Limits of Agreement should be between 1/3 and 3.0.

6.4. Tier III: Substantive Changes

Substantive changes requiring full assay validation: When substantive changes are made in the assay procedures, measured signal responses, target pharmacology and control compound activity values may change significantly. Under these circumstances, the assay should be re-validated according to methods described in Sections 3 and 4. The following changes constitute substantive changes, particularly when multiple changes in factors listed below are involved:

  • Changes in assay platform: e.g.: Filter binding to Fluorescence polarization for kinase assays.
  • Changes in assay reagents (including lot changes and supplier) that produce significant changes in assay response, pharmacology and control activity values. For example, changes in enzyme substrates, isozymes, cell-lines, label types, control compounds, calibration standards, (radiolabel vs. fluorescent label), plates, tips and bead types, major changes in buffer composition and pH, co-factors, metal ions, etc.
  • Transfer of the assay to a different laboratory location, with distinctly different instrumentation, QB practices or training.
  • Changes in detection instruments with significant difference in the optics and electronics. For example, plate readers, counting equipment, spectrophotometers.
  • Changes in liquid handling equipment with significant differences in volume dispensing capabilities.
  • Changes in liquid handling protocol with significant differences in volume dispensing methods.
  • Changes in assay conditions such as shaking, incubation time, or temperature that produce significant change in assay response, pharmacology and control activity values.
  • Major changes in dilution protocols involving mixed solvents, number of dilution steps and changes in concentration range for the concentration-response curves.
  • Change in analyst/operator running the assay, particularly if new to the job and/or has no experience in running the assay in its current format/assay platform.
  • Making more than one of the above-mentioned changes to the assay protocol at any one time.

Substantive changes require full validation, i.e. a three day Plate Uniformity Study and Replicate Experiment Study. If the intent is to report the data together with the previous assay data then an assay comparison study should be conducted as part of the Replicate Experiment study.

7. References

  1. Sittampalam GS, Iversen PW, Boadt JA, Kahl SD, Bright S, Zock JM, Janzen WP, Lister MD. Design of Signal Windows in High Throughput Screening Assays for Drug Discovery. J Biomol Screen. 1997;2:159–169.
  2. Zhang J-H, Chung TDY, Oldenburg KR. A simple statistical parameter for use in evaluation and validation of high throughput screening assays. J Biomol Screen. 1999;4:67–73. [PubMed: 10838414]
  3. Taylor PB, Stewart FP, Dunnington DJ, Quinn ST, Schulz CK, Vaidya KS, Kurali E, Tonia RL, Xiong WC, Sherrill TP, Snider JS, Terpstra ND, Hertzberg RP. Automated Assay Optimization with Integrated Statistics and Smart Robotics. J. Biomol Screen. 2000;5:213–225. [PubMed: 10992042]
  4. Iversen PW, Eastwood BJ, Sittenpalam GS. A Comparison of Assay Performance Measures in Screening Assays: Signal Window, Z’-Factor and Assay Variability Ratio. J of Biomol Screen. 2006;11:247–252. [PubMed: 16490779]
  5. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;I:307–310. [PubMed: 2868172]
  6. Eastwood BJ, Farmen MW, Iversen PW, Craft TJ, Smallwood JK, Garbison KE, Delapp N, Smith GF. The Minimum Significant Ratio: A Statistical Parameter to Characterize the Reproducibility of Potency Estimates from Concentration-Response Assays and Estimation by Replicate-Experiment Studies. J Biomol Screen. 2006;11:253–261. [PubMed: 16490778]
  7. Eastwood, BJ, Chesterfield, AK, Wolff MC, and Felder CC: Methods for the Design and Analysis of Replicate-Experiment Studies to Establish Assay Reproducibility and the Equivalence of Two Potency Assays, in Gad, S (ed): Drug Discovery Handbook, John Wiley and Sons, New York, 2005, 667-688.




Copyright Notice

All Assay Guidance Manual content, except where otherwise noted, is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported license (CC BY-NC-SA 3.0), which permits copying, distribution, transmission, and adaptation of the work, provided the original work is properly cited and not used for commercial purposes. Any altered, transformed, or adapted form of the work may only be distributed under the same or similar license to this one.

Bookshelf ID: NBK83783PMID: 22553862


Assay Guidance Manual Links

Related information

Similar articles in PubMed

See reviews...See all...

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...