Logo of genoresGenome ResearchCSHL PressJournal HomeSubscriptionseTOC AlertsBioSupplyNet
Genome Res. Nov 2004; 14(11): 2347–2356.
PMCID: PMC525694

A novel, high-performance random array platform for quantitative gene expression profiling


We have developed a new microarray technology for quantitative gene-expression profiling on the basis of randomly assembled arrays of beads. Each bead carries a gene-specific probe sequence. There are multiple copies of each sequence-specific bead in an array, which contributes to measurement precision and reliability. We optimized the system for specific and sensitive analysis of mammalian RNA, and using RNA controls of defined concentration, obtained the following estimates of system performance: specificity of 1:250,000 in mammalian poly(A+) mRNA; limit of detection 0.13 pM; dynamic range 3.2 logs; and sufficient precision to detect 1.3-fold differences with 95% confidence within the dynamic range. Measurements of expression differences between human brain and liver were validated by concordance with quantitative real-time PCR (R2 = 0.98 for log-transformed ratios, and slope of the best-fit line = 1.04, for 20 genes). Quantitative performance was further verified using a mouse B- and T-cell model system. We found published reports of B- or T-cell-specific expression for 42 of 59 genes that showed the greatest differential expression between B- and T-cells in our system. All of the literature observations were concordant with our results. Our experiments were carried out on a 96-array matrix system that requires only 100 ng of input RNA and uses standard microtiter plates to process samples in parallel. Our technology has advantages for analyzing multiple samples, is scalable to all known genes in a genome, and is flexible, allowing the use of standard or custom probes in an array.

Microarray technology has allowed the abundance of thousands of different mRNAs to be measured simultaneously and efficiently from a single biological sample (Schena et al. 1995; Lockhart et al. 1996; Lockhart and Winzeler 2000). As a result, the analysis of individual genes has given way to the analysis of large sets of genes and the discovery of patterns and relationships in their expression. This has spawned a myriad of exciting new applications that are helping to shape the emerging field of systems biology (Marton et al. 1998; Golub et al. 1999; Hughes et al. 2000; Ideker et al. 2001; van't Veer et al. 2002; Yvert et al. 2003). The microarrays that have spurred these advances can be manufactured by a variety of techniques, including spotting (Schena et al. 1995), photolithographic synthesis (Fodor et al. 1991), and inkjet synthesis (Blanchard 1998). In each case, individual probes are placed or synthesized at predefined locations on the substrate. However, conventional arrays can suffer from one or more limitations, including poor data quality, as a result of high intra- and interarray variability, often associated with spotted arrays.

We describe here a powerful and intrinsically robust alternative that substantially overcomes these limitations. Our gene-expression profiling system is based on randomly assembled arrays of beads in wells (Michael et al. 1998). Following random assembly, the location and identity of each bead, bearing an oligonucleotide probe, is determined via a sequential decoding process (Gunderson et al. 2004). An advantage of this approach is that dense packing can be achieved using simple and efficient bulk processes. Furthermore, the technology is intrinsically scalable; the arrays described in this study use beads with diameters of three microns, producing a packing density ~400 times that of a conventional spotted microarray. Elsewhere, packing densities ~40,000 times that of a conventional array have been achieved through the assembly of 300-nm beads (Michael et al. 1998).

The BeadArray technology has previously been shown to be a robust readout platform for single nucleotide polymorphism (SNP) genotyping, where it has demonstrated very high accuracy, call rate, and reproducibility at high multiplexing levels (Fan et al. 2003). It is being used to generate over half the genotyping data for the International HapMap Project (www.hapmap.org), which will derive a detailed map of common genetic variation across the human genome (The International HapMap Consortium 2003). In addition, the BeadArray platform has been effective for gene-expression profiling using PCR-based assays in combination with universal arrays (Yeakley et al. 2002; Fan et al. 2004). Both of these applications made use of universal arrays containing up to 1536 usable capture sequences.

Despite these successes, the use of the BeadArray technology for quantitative gene-expression profiling from complex samples by hybridization to gene-specific probes has not previously been demonstrated. Although some preliminary data suggested that the platform is capable of high sensitivity (Epstein et al. 2002), a number of significant challenges had to be overcome in order to create a robust, quantitative, high-performance system suitable for use with biological samples such as mammalian poly(A+) mRNA. We have now developed such a system. We show that it is capable of accurately and robustly reporting mRNA abundance for hundreds of genes. Our methods can be applied to many thousands of samples, a scale of experimentation that has been impractical with other technologies, and can be extended to develop arrays designed to analyze all known genes in a genome. The technology is also compatible with our SNP genotyping system, enabling genotyping and gene-expression profiling on the same platform.


Design of a gene-expression probe array based on random assembly of beads in wells

The arrays used in the experiments reported here are described in Figure 1. Typically, each array has up to 1536 different bead types, each represented on average by ~30 copies in any array. Each bead type has ~700,000 copies of a particular oligonucleotide probe covalently attached to it (Fig. 1). Because the population of beads in an array is a random sampling of a starting bead pool containing 1536 bead types, the representation of the bead types in the array is effectively Poisson. That is, there is a variable number of each of the 1536 bead types both within and between arrays (Gunderson et al. 2004). Thus, two important issues must be addressed to ensure that the random arrays can be used for quantitative measurements of mRNA abundance.

Figure 1.
Design of a randomly assembled gene-specific probe array. (A) Representation of an individual bead lodged in a well. Attached to the bead by its 5′ end is a chimeric oligonucleotide ~75 nucleotides in length, comprising an ~25-nucleotide ...

Firstly, because each array is unique, how can we compare results from array to array? By virtue of the ~30-fold oversampling (50,000 beads/1536 bead types), we can ensure that decoded arrays have greater than or equal to five beads of each type in the array, so that all sequences are represented (Gunderson et al. 2004). Furthermore, the randomness and redundancy provide us with considerable advantages; randomness minimizes the effects of spatially localized artifacts, and redundancy increases measurement precision and robustness. These factors combine to increase measurement accuracy.

Secondly, because each probe has associated with it an identifier sequence (Fig. 1), how do we ensure that this sequence doesn't interfere with the analysis of the target mRNA? This is done in two ways. The identifier sequences are computationally screened to avoid similarity to the human and mouse genomes. The probability of cross-hybridization to other genomes is also low, and for the analysis of any particular genome, it is simple to omit a small number of identifier sequences if needed. Also, the identifier sequences are only half the length of the gene-specific probes and have correspondingly lower Tm's (52.0 ± 2.3°C vs. 70.7 ± 1.7°C). By hybridizing labeled total mammalian poly(A+) mRNA samples to arrays containing the identifier sequences but lacking the gene-specific probe sequences, we estimated that the identifier sequences contribute an average of up to five counts over background, with only a few sequences giving higher signals (data not shown). This is a small amount of signal relative to the gene-specific probes and is not expected to have any significant effect on the analysis.

Array formats designed for a variety of gene-expression applications

The experiments described in this study all make use of the Sentrix array matrix format shown in Figure 1. However, the basic concept of placing beads in wells to form a randomly ordered array can be used to create a variety of array formats suitable for a range of applications. In addition to the format shown in Figure 1, which is read using a custom high-resolution reader (Barker et al. 2003), we have developed silicon substrates that have the dimensions of a 2.5 × 7.5-cm microscope slide and can be read on a 5-μm resolution Axon GenePix scanner by virtue of larger well spacing (T. Dickinson, G. Smith, H. Bennett, and R. Barrett, unpubl.). Yet other silicon substrates have been used to develop two designs of whole-genome array, with probes for ~24,000 and ~48,000 gene sequences (G. Wang, G. Smith, S. Barnard, and D. Che, unpubl.; further information is available on www.illumina.com). These higher density arrays can be read on a BeadArray scanner. All of these formats make use of 3-μm silica beads; the same bead pools can be loaded into the different substrates, and give similar quantitative performance. Therefore, substantially similar results to those obtained below can be obtained using a variety of bead-based array formats suitable for a range of experimental designs and detection systems.

Dose-response study using spiked mRNAs of known concentration

We designed a dose-response study to estimate the limit of detection, dynamic range, and precision of the 96-array matrix gene-specific probe system for the analysis of a mammalian mRNA sample. We prepared a series of samples that consisted of labeled human liver cell line RNA spiked with known quantities of individually labeled mRNAs synthesized in vitro. This approach has been described previously for microarray performance characterization (Lockhart et al. 1996). We used as spikes nine mRNAs, produced by in vitro transcription (IVT) of cloned bacterial and viral genes whose sequences are absent from the human genome. Twelve samples, representing 12 concentrations, were each replicated eight times to give a total of 96 samples. Each sample contained all nine spikes at a given concentration ranging from zero to 200 pM. (Fig. 2).

Figure 2.
Arrangement of spiked samples for hybridization. Each sample was produced by adding labeled spike controls to labeled complex RNA derived from human HepG2 poly(A+) RNA. The spike controls were added at the pM concentrations indicated in the figure. All ...

Each sample was hybridized to eight different arrays in a 96-array matrix. This provided eight technical replicates, sufficient data to allow a statistical analysis of noise in the quantitative readout step. The dose response curves and the resolvable fold change across the tested concentration range, generated for each of the nine genes, are shown in Figure 3.

Figure 3.
Dose-response curves. Data points represent the mean of eight arrays. Signal intensities are plotted in blue vs. target concentration. Error bars represent the two-sided symmetric 90% confidence intervals for a single reading, calculated on the basis ...

Reproducibility of quantitative measurements and dependence on sample input

The dose-response results were reproducible across different manufacturing lots of array matrices and hybridization days. We obtained similar results from 15 independent trials of the experiment, hybridized on five separate days using a total of 720 arrays manufactured on seven different dates (Fig. 4). The quantitative performance of the system based on this significant amount of replication is summarized in Table 1.

Figure 4.
Dynamic range, detectable fold change, and limit of detection for 15 array matrices. The array matrices, manufactured on five separate days, were used to perform dose-response experiments identical to that described above, except in these experiments, ...
Table 1.
Performance metrics

In addition to the probes used to measure the dose responses, the arrays used in the experiments summarized in Figure 4 contained probes for 587 human genes. We analyzed the data generated by these experiments to assess array-to-array hybridization signal variation and how it is influenced by gene intensity. We selected the 380 genes that were reproducibly expressed at detectable levels and plotted their coefficient of variation (standard deviation divided by intensity, abbreviated as CV) as a function of hybridization signal. As shown in Figure 5, and consistent with expectations, the CV increases inversely as gene signals approach the limit of detection. The median CV for background-subtracted, un-normalized intensity across 48 arrays in a representative experiment was 6.5%.

Figure 5.
Array signal variation as a function of gene hybridization intensity. Each blue dot represents a gene and the red line represents a smoothed function for the data on the basis of a robust best-fit function for standard deviation vs. intensity. All values ...

Additional performance measurements of a microarray platform include (1) reproducibility across multiple sample labeling reactions, and (2) sensitivity to sample input variation. To test these aspects of our system's reproducibility, we performed 20 sample labeling reactions, four each, using 10, 20, 50, 150, or 500 ng of total RNA derived from mouse spleen. For the 10- and 20-ng inputs, only three of the four replicates produced adequate material for array hybridization. One microgram of biotinylated cRNA from each successful reaction was hybridized to a separate array in an array matrix. Each array in the matrix contained probes to 540 mouse genes. Each cRNA sample was present at a final concentration of 25 ng/μL.

To obtain a quantitative estimate of reproducibility, linear correlations were calculated for all pairwise combinations of the replicates at each input concentration. The means and ranges of these correlations are plotted in Figure 6A. All correlations (R2) exceeded 0.99. As further evidence of robustness, the scatter plot in Figure 6B shows the correlation for signals between sample labeling replicates using 50 and 500 ng of starting material; the high correlation (R2 > 0.99) demonstrates the reproducibility of the assay even with input material concentrations differing by 10-fold.

Figure 6.
Sample labeling reproducibility. (A) Twenty sample labeling reactions were processed using our standard conditions with 10, 20, 50, 150, or 500 ng of total mouse spleen RNA as input material (four replicates each). For each input amount, correlation values ...

Concordance with real-time quantitative PCR

The experiments described above measured the quantitative performance of the system and demonstrated that we could obtain quantitative data in a reproducible way. We next wanted to perform measurements on a true biological sample and to evaluate these results by comparison with a different technology. Concordance with measurements obtained using a different technology is a strong indicator that measurements are correct. Therefore, we performed an experiment that compared differential expression patterns obtained on the randomly assembled arrays with those obtained from TaqMan quantitative real-time PCR (qPCR).

The genes selected for this analysis came from a comparison of human liver with human brain. Labeled cRNA from both tissues was hybridized to separate arrays containing probes for 633 human genes. From the hybridization results, we selected a panel of 21 genes for the comparison, using the following criteria: (1) the genes showed a range of liver/brain expression ratios ranging from 0.005 to 175; and (2) every gene was expressed significantly over background, even in the tissue showing the lower amount of expression. This second criterion was necessary to avoid inaccurate expression ratios resulting from the influence of system noise.

For each of these 21 genes, we performed qPCR assays on aliquots of the same starting material. Twenty of the 21 primer pairs gave products and the log-transformed expression ratios obtained for each of the 20 genes were plotted against the corresponding values obtained on the randomly assembled arrays (Fig. 7). The measurements determined by the two systems showed good correlation (R2 = 0.98 for log-transformed ratios). Furthermore, the slope of the best-fit line was 1.04, indicating that the ratios obtained by the two methods are similar in magnitude. For highly expressed genes, the array produced somewhat compressed fold-change ratios compared with those produced by qPCR. For the five genes whose array intensities exceeded 10,000 counts in either tissue, the array-measured ratio was 0.77 ± 0.24 versus 1.04 ± 0.35 for all genes. This compression is likely due to probe saturation of highly expressed targets, a predicted feature, as the array platform has a dynamic range of ~3 logs compared with ~5 logs for qPCR. (Heid et al. 1996) This overall high level of concordance with qPCR validated the performance of the randomly assembled array system.

Figure 7.
Correlation of array matrix data to quantitative real-time PCR. Labeled RNA samples were made from human and brain total RNA. These were hybridized to separate array matrices containing 633 human genes. Six technical replicates were included for each ...

Validation of results in a model biological system

Finally, we assessed the ability of the random arrays to generate data consistent with results previously published for a well-characterized biological system. The model system we selected was mouse B and T cells, both of which contain large numbers of cell-type specific transcripts documented in the biological literature. Our experimental design was to make a series of seven samples containing different ratios of R1.1 (T cell lymphoma) and A20 (B cell lymphoma) mRNA mixed together. This series ranged from 100% B/0% T to 0% B/100% T. Each of the seven samples was independently labeled six times, and the resulting 42 cRNA samples were hybridized to separate arrays of an array matrix, each containing probes to 540 different mouse genes. After hybridization and analysis, we identified 59 genes that were determined as detected in the 100% B cell sample, but not in the 100% T cell sample or vice versa (Fig. 8). Upon generating this list of 59 genes, we performed literature searches to establish whether there was prior evidence of T- or B-specific expression. Forty three of the 59 genes had prior literature support for their tissue specificity. We found no genes miscategorized by our array. Table 2 shows a list of all tissue-specific genes identified in our analysis.

Figure 8.
B Cell/T Cell experimental results. Seven RNA samples were prepared containing mixtures of B- and T-cell lymphoma cell-line mRNA. The samples contained 0%, 5%, 25%, 50%, 95%, and 100% B-cell RNA, with the balance in all cases being T-cell RNA. These samples ...
Table 2.
Array-based determination of tissue-specific gene expression


We developed a powerful and robust new microarray technology for gene-expression profiling on the basis of randomly assembled arrays of beads in wells. The high information density of these arrays (~50,000 beads/~1.4-mm diameter array) reduces sample consumption and makes them well suited for integration into sophisticated systems such as the array matrix device described herein. Each probe is replicated a minimum of five times and on average ~30 times on every array. This built-in redundancy increases measurement precision and makes for an intrinsically robust measurement platform. We optimized the system for hybridization specificity and sensitivity, integrated the various components into a scalable system for gene-expression quantitation, and showed that accurate and reproducible data are generated from complex biological samples.

The 96-array matrix format and associated protocols make it straightforward to analyze many samples with relatively little labor and high reproducibility. We consider this a significant advance because sources of noise and error, such as intra- and interarray variability, process variability, and biological sample variability, can confound microarray experiments (Brody et al. 2002). An effective way of identifying, characterizing, and minimizing variation is to apply well-known statistical tools. Unfortunately, the ease-of-handling, and in many cases, the reproducibility of current microarray technologies makes it difficult to replicate experiments adequately. This has severely limited the ability to generate and analyze large data sets. As a consequence, the use of microarrays in applications requiring the analysis of large numbers of samples, such as epidemiological, toxicological, and pharmacological screening, has been limited mostly to proof-of-concept studies. Meaningful application of high-throughput microarray technology to large sample sets is now more practical as a result of the system described here.

Samples can be processed in standard micro-plate formats, either manually or robotically. The entire system is designed for compatibility with automation and LIMS tracking, and hence, is suitable for use in applications that require a highly reproducible process with accurate sample tracking throughout. The technology is flexible. It can be used to analyze the expression of hundreds of genes, as described in this study, as well as whole-genome sets of many thousands of genes, which will be described elsewhere. The ability to assemble large numbers of arrays from a single bead pool on the basis of a common chemistry helps to minimize interarray variability. Flexibility in array design is provided by the ability to supplement standard bead pools with sequences of the user's choosing or to make custom bead pools.3

We also developed software for array imaging and gene-expression data analysis (E. Chudin and I. Mikouliteh, unpubl.). Because of the robustness of the system, the user has to pay less attention to the data extraction process than typical with spotted arrays, and can instead focus on analysis of results. AnEx, a gene-expression data analysis program that organizes sample data and incorporates statistical and visualization tools, is commercially available as part of the gene-expression analysis system. AnEx is MIAME-compliant (www.mged.org) and generates a flat-file format that is accepted by many third-party analysis software applications.

Finally, an advantage of the system we have developed is that it uses the same technology platform as our SNP genotyping system (Fan et al. 2003) and our PCR-based gene-expression assay system (Fan et al. 2004). As a result, SNP genotyping and gene-expression profiling can now be carried out on a single microarray platform, scalable from the analysis of hundreds of genes to all known genes in a genome.



Human brain and liver total RNA were purchased from Ambion (Cat. #7962, Brain; 7960, Liver). Human HepG2 Poly(A+) mRNA was purchased from Ambion (Cat. #7849). Mouse spleen total RNA was purchased from Ambion (Cat. #7920). A20 and R1.1 cell lines were purchased from the American Type Culture Collection (ATCC; A20, Cat. #TIB-208, R1.1, Cat. #TIB-42, R1.1) and were grown according to supplier's recommendations. A20 cells were grown in RPMI 1640 medium with 2 mM L-glutamine, and supplemented with 1.5 g/L NaHCO3, 1.0 mM Na pyruvate, 10 mM HEPES, and 10% fetal bovine serum (Hyclone). R1.1 cells were grown in DMEM high-glucose medium with glutamate supplemented with 1.5 g/L NaHCO3 and 10% horse serum. Total RNA was harvested from ~108 cells using the RNeasy Midi kit (QIAGEN) according to the manufacturer's instructions.


Although our platform is amenable to a number of standard sample labeling techniques, our preferred approach is based on the modified Eberwine protocol (Eberwine et al. 1992), by which messenger RNA is converted to cDNA, followed by an amplification/labeling step mediated by T7 DNA polymerase. The linear amplification step reduces the amount of starting material needed. We adapted the protocol to a microtiter plate format in order to match the array matrix format, which permits 96 array hybridizations to be performed in parallel. Labeling and amplification of the total RNA samples were performed according to the MessageAmp aRNA kit (Ambion Cat. #1750) with the following modifications. Because the hybridization requirements are so modest (1 μg labeled cRNA), the standard reaction was cut down to 1/4 size and total RNA inputs were generally limited to 100 ng. The use of smaller reactions allowed us to perform 80 reactions per kit as opposed to the standard 20 reactions. This necessitated the use of additional cleanup columns for both the RT and IVT steps. QIAquick PCR Purification and RNeasy 96 well kits (QIAGEN) were used according to the manufacturer's instructions for RT and IVT cleanup, respectively. Additionally, all components of the first-strand cDNA synthesis were combined in a single step, because a separate annealing of the T7 oligo(dT) primer was found to be unnecessary (data not shown). During the IVT reaction, a 1:1 ratio of labeled bio-16-UTP (Roche Cat. #1388908) to unlabeled UTP was used with a final combined concentration of 7.5 mM.

Preparation of labeled spikes

Nine bacterial and viral genes were used to prepare RNA controls as follows: bla (pBluescriptSK+; Stratagene) cat (pCAT3-control; Promega), cre (Escherichia coli DH10B-Zip; Life Technologies), e1a (Homo sapiens HEK-293; ATCC), gfp (pEGFP; Clontech), gst (pGEX-5x-3; Amersham-Pharmacia), gus (E. coli GM48), lux (E. coli GM48), and neo (pGT-N28; New England Biolabs). The genes were cloned into the PCRII cloning vector using the TA Cloning kit (Invitrogen, Cat. #K205001-TA). Full-length sense transcripts were generated using the MEGA-script T3 kit from Ambion (Cat. #1338). Labeled antisense targets were then generated using the MessageAmp aRNA kit and were spiked into labeled Human HepG2 cRNA at the 12 concentrations shown in Figure 2.

Hybridization/washing/signal detection

All steps of hybridization, washing, blocking, and signal generation were performed by sequential transfer of a Sentrix array matrix from one 384-well plate (ThermoLab Systems; Cat. #95040000) to the next with the wells of each step containing 40 μL of the appropriate solution. All incubations were carried out without agitation and, with the exception of the hybridization, at room temperature. Amplified, biotin-labeled human or mouse RNA samples were prepared in a solution of Hyb E1 buffer (Illumina, Part #11166381) and 25% (v/v) formamide at a final concentration of 25 ng/μL. An array matrix was then mated to the hybridization plate using a sealed alignment fixture. Hybridization proceeded at 55°C, for 16 to 20 h. After hybridization, the array matrix was washed by a 5-min incubation in Illumina Wash E1 buffer, followed by a 10-min wash in fresh Wash E1 buffer (Illumina, Part #11165898). Arrays were then blocked for 5 min in 1% (w/v) casein-PBS, Hammerstein grade (Pierce, Cat. #37528). Array signal was developed by a 10-min incubation in a 1-μg/mL solution of Streptavidin-Cy3 (Amersham; Cat. #PA43001) in 1% casein-PBS blocking solution. The array matrix was washed a final time for 5 min in Wash E1 buffer. Each array was then dried with an air gun.

Imaging and signal extraction

Arrays were scanned on the BeadArray Reader, a confocal-type imaging system with ~0.8 μm resolution and 532 and 635 nm laser illumination (Barker et al. 2003). Scans were performed in the 532-nm channel. The total scan time per array matrix (i.e., 96 arrays) was 1.5 h, roughly 1 min per array. Image analysis and data extraction software were as described previously (Fan et al. 2003). Briefly, each sequence type is represented by an average of 30 beads on the array. Bead signals were computed with weighted averages of pixel intensities, and local background was subtracted. Array images are registered by a previously described algorithm (Galinsky 2003). This algorithm supplies the position of a bead center that serves as a center for a virtual pixel. To compute bead signal, we use four real pixels covering the virtual one and combine their signals in the following way: S = A1S1+A2S2+A3S3+A4S4, where S is bead signal, Ai is area of overlap between ith pixel and the virtual pixel, and Si is 3 × 3 average taken around ith pixel after sharpening with following Laplacian:

equation M1

Here x,y are pixel coordinates and Ix,y are pixel intensities. The choice of coefficient in front of Laplacian was made after optimization of data obtained with calibrated set of Spherotech 3-micron rainbow beads (Cat. #RCP-30-5, Spherotech, Inc.). Finally, we subtract local background as average of five dimmest pixels in the 17 × 17 box centered in the pixel having maximum overlap with the virtual pixel. Sequence-type signal was calculated by averaging corresponding bead signals with outliers removed (using median absolute deviation).

Data analysis

We developed a suite of algorithms for analysis of gene expression data from microarrays (E. Chudin and I. Mikoulitch, pers. comm.). These have been incorporated into AnEx, a commercial software package for gene-expression data analysis. Array data were normalized using quantiles to fit a cubic spline. The approach is similar to a previously reported method (Workman et al. 2002). Alternatively, a robust least-squares fit (iteratively re-weighted least squares using Tukey's biweight functions) of intensities of a rank invariant set of probes (relative rank change of <0.05) was used. Detection p-values were computed using a dynamically constructed normal model based on intensities of 20 negative controls. To determine minimal resolvable fold change, we used piecewise linear approximation of intensity versus concentration. Concentration levels were considered resolvable if corresponding one-sided 95th percent confidence intervals, as computed from t-distribution did not overlap. Piecewise linear interpolation was used for both intensities and standard deviations.

Array design

Probes were designed by a custom-built pipeline that will be described in detail elsewhere (P. Rigault, in prep.). Each gene sequence for which probes were to be synthesized was subjected to a filtering process that masked regions unsuitable for probe design, based on complexity and cross-homology thresholds, as determined by DUST (D. Lipman, National Center for Biotechnology Information, pers. comm.) and BLAST (Altschul et al. 1990) algorithms, respectively. All possible 50-mer probes were identified within unmasked regions, and these were ranked by a formula that takes into account distance from the 3′ end of the transcript, melting temperature, and self-complementarity. The two highest scoring probes were then linked to 23-nt identifier sequences by use of a sequence-matching program that minimizes the probability of interactions between the probe and identifier sequence and prevents the creation of junction sequences with cross-homology to the genome in question.

Our use of two probes per gene was based on the results of pilot experiments, in which five informatically chosen probes were synthesized for each of 10 in vitro-synthesized genes. Dose response was determined for each synthetic gene using all five probes or four, three, two, or one arbitrarily selected probes. We found that we could reach our targeted performance metrics (Table 1) with two or more probes per gene, but not one (data not shown). The results of recent functional screening suggests that one probe per gene is sufficient if the probes are selected with a functional screen (T. McDaniel, B. Kermani, S. Baker, S. Oeser, and S. Kruglyak, unpubl.).

Quantitative PCR

Assays-on-Demand quantitative gene expression primers and TaqMan universal PCR master mix (Cat. #4304437) were purchased from Applied Biosystems. All PCR reactions were performed following the manufacturer's instructions.


We thank Steven Barnard, Chanfeng Zhao, Paul Kitabjian, Michael Graige, and Semyon Kruglyak for devising methods to prepare beads with gene-specific probes and for providing bead pools used in these experiments. We also thank Chan Tsan for technical assistance, Lixin Zhou for help with analysis of the B and T cell experiments, and the array manufacturing group at Illumina for providing gene-specific probe arrays.


Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2739104.


3Standard, semicustom and fully custom bead sets are provided commercially by Illumina. Semicustom bead sets are made by supplementing standard sets with sequences of the customer's choosing. Fully custom bead sets can be made from any desired set of sequences.


  • Abbas, A.K., Lichtman, A.H., and Pober, J.S. 2003. Cellular and molecular immunology. W.B. Saunders, Philadelphia, PA.
  • Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403-410. [PubMed]
  • Barker, D.L., Therault, G., Che, D., Dickinson, T., Shen, R., and Kain, R. 2003. Self-assembled random arrays: High-performance imaging and genomics applications on a high-density microarray platform. Proc. SPIE 4966: 1-11.
  • Blanchard, A. 1998. Synthetic DNA arrays. Plenum Press, New York.
  • Brody, J.P., Williams, B.A., Wold, B.J., and Quake, S.R. 2002. Significance and statistical errors in the analysis of DNA microarray data. Proc. Natl. Acad. Sci. 99: 12975-12978. [PMC free article] [PubMed]
  • Dickinson, L.A., Joh, T., Kohwi, Y., and Kohwi-Shigematsu, T. 1992. A tissue-specific MAR/SAR DNA-binding protein with unusual binding site recognition. Cell 70: 631-645. [PubMed]
  • Eberwine, J., Yeh, H., Miyashiro, K., Cao, Y., Nair, S., Finnell, R., Zettel, M., and Coleman, P. 1992. Analysis of gene expression in single live neurons. Proc. Natl. Acad. Sci. 89: 3010-3014. [PMC free article] [PubMed]
  • Epstein, J.R., Lee, M., and Walt, D.R. 2002. High-density fiber-optic genosensor microsphere array capable of zeptomole detection limits. Anal. Chem. 74: 1836-1840. [PubMed]
  • Fan, J.-B., Oliphant, A., Shen, R., Kermani, B.G., Garcia, F., Gunderson, K.L., Hansen, M., Steemers, F., Butler, S.L., Deloukas, P., et al. 2003. Highly parallel SNP genotyping. Cold Spring Harbor Symp. Biol. 68: 69-78. [PubMed]
  • Fan, J.B., Yeakley, J.M., Bibikova, M., Chudin, E., Wickham, E., Chen, J., Doucet, D., Rigault, P., Zhang, B., Shen, R., et al. 2004. A versatile assay for high-throughput gene expression profiling on universal array matrices. Genome Res. 14: 878-885. [PMC free article] [PubMed]
  • Fingeroth, J.D. 1990. Comparative structure and evolution of murine CR2. The homolog of the human C3d/EBV receptor (CD21). J. Immunol. 144: 3458-3467. [PubMed]
  • Fodor, S.P.A., Read, J.L., Pirrung, M.C., Stryer, L., Lu, A.T., and Solas, D. 1991. Light-directed, spatially addressable parallel chemical synthesis. Science 251: 767-773. [PubMed]
  • Galinsky, V.L. 2003. Automatic registration of microarray images. II. Hexagonal grid. Bioinformatics 19: 1832-1836. [PubMed]
  • Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., et al. 1999. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286: 531-537. [PubMed]
  • Gunderson, K., Kruglyak, S., Graige, M.S., Garcia, F., Kermani, B.G., Zhao, C., Che, D., Milewski, M., Yang, R., Siegmund, C., et al. 2004. Decoding randomly ordered arrays. Genome Res. 14: 870-877. [PMC free article] [PubMed]
  • Heid, C.A., Stevens, J., Livak, K.J., and Williams, P.M. 1996. Real time quantitative PCR. Genome Res. 6: 986-994. [PubMed]
  • Hermanson, G.G., Eisenberg, D., Kincade, P.W., and Wall, R. 1988. B29: A member of the immunoglobulin gene superfamily exclusively expressed on β-lineage cells. Proc. Natl. Acad. Sci. 85: 6890-6894. [PMC free article] [PubMed]
  • Hughes, T.R., Marton, M.J., Jones, A.R., Roberts, C.J., Stoughton, R., Armour, C.D., Bennett, H.A., Coffey, E., Dai, H., He, Y.D., et al. 2000. Functional discovery via a compendium of expression profiles. Cell 102: 109-126. [PubMed]
  • Hughes, T.R., Mao, M., Jones, A.R., Burchard, J., Marton, M.J., Shannon, K.W., Lefkowitz, S.M., Ziman, M., Schelter, J.M., Meyer, M.R., et al. 2001. Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat. Biotechnol. 19: 342-347. [PubMed]
  • Ideker, T., Thorsson, V., Ranish, J.A., Christmas, R., Buhler, J., Eng, J.K., Bumgarner, R., Goodlett, D.R., Aebersold, R., and Hood, L. 2001. Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292: 929-934. [PubMed]
  • The International HapMap Consortium. 2003. The International HapMap Project. Nature 426: 789-796. [PubMed]
  • Isakov, N. and Altman, A. 2002. Protein kinase C(theta) in T cell activation. Annu. Rev. Immunol. 20: 761-794. [PubMed]
  • Lockhart, D.J. and Winzeler, E.A. 2000. Genomics, gene expression and DNA arrays. Nature 405: 827-836. [PubMed]
  • Lockhart, D.J., Dong, H., Byrne, M.C., Follettie, M.T., Gallo, M.V., Chee, M.S., Mittmann, M., Wang, C., Kobayashi, M., Horton, H., et al. 1996. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol. 14: 1675-1680. [PubMed]
  • Marton, M.J., DeRisi, J.L., Bennett, H.A., Iyer, V.R., Meyer, M.R., Roberts, C.J., Stoughton, R., Burchard, J., Slade, D., Dai, H., et al. 1998. Drug target validation and identification of secondary drug target effects using DNA microarrays. Nat. Med. 4: 1293-1301. [PubMed]
  • Michael, K.L., Taylor, L.C., Schultz, S.L., and Walt, D.R. 1998. Randomly ordered addressable high-density optical sensor arrays. Anal. Chem. 70: 1242-1248. [PubMed]
  • Nocentini, G., Giunchi, L., Ronchetti, S., Krausz, L.T., Bartoli, A., Moraca, R., Migliorati, G., and Riccardi, C. 1997. A new member of the tumor necrosis factor/nerve growth factor receptor family inhibits T cell receptor-induced apoptosis. Proc. Natl. Acad. Sci. 94: 6216-6221. [PMC free article] [PubMed]
  • Park, C.G., Lee, S.Y., Kandala, G., and Choi, Y. 1996. A novel gene product that couples TCR signaling to Fas(CD95) expression in activation-induced cell death. Immunity 4: 583-591. [PubMed]
  • Schall, T.J., Jongstra, J., Dyer, B.J., Jorgensen, J., Clayberger, C., Davis, M.M., and Krensky, A.M. 1988. A human T cell-specific molecule is a member of a new gene family. J. Immunol. 141: 1018-1025. [PubMed]
  • Scheijen, B., Jonkers, J., Acton, D., and Berns, A. 1997. Characterization of pal-1, a common proviral insertion site in murine leukemia virus-induced lymphomas of c-myc and Pim-1 transgenic mice. J. Virol. 71: 9-16. [PMC free article] [PubMed]
  • Schena, M., Shalon, D., Davis, R.W., and Brown, P.O. 1995. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270: 467-470. [PubMed]
  • Siliciano, J.D., Morrow, T.A., and Desiderio, S.V. 1992. itk, a T-cell-specific tyrosine kinase gene inducible by interleukin 2. Proc. Natl. Acad. Sci. 89: 11194-11198. [PMC free article] [PubMed]
  • van't Veer, L.J., Dai, H., van de Vijver, M.J., He, Y.D., Hart, A.A., Mao, M., Peterse, H.L., van der Kooy, K., Marton, M.J., Witteveen, A.T., et al. 2002. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415: 530-536. [PubMed]
  • Workman, C., Jensen, L.J., Jarmer, H., Berka, R., Gautier, L., Nielser, H.B., Saxild, H.H., Nielsen, C., Brunak, S., and Knudsen, S. 2002. A new non-linear normalization method for reducing variability in DNA microarray experiments. Genome Biol. 3: research0048. [PMC free article] [PubMed]
  • Yamanashi, Y., Kakiuchi, T., Mizuguchi, J., Yamamoto, T., and Toyoshima, K. 1991. Association of B cell antigen receptor with protein tyrosine kinase Lyn. Science 251: 192-194. [PubMed]
  • Yeakley, J.M., Fan, J.B., Doucet, D., Luo, L., Wickham, E., Ye, Z., Chee, M.S., and Fu, X.D. 2002. Profiling alternative splicing on fiber-optic arrays. Nat. Biotechnol. 20: 353-358. [PubMed]
  • Yvert, G., Brem, R.B., Whittle, J., Akey, J.M., Foss, E., Smith, E.N., Mackelprang, R., and Kruglyak, L. 2003. Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors. Nat. Genet. 35: 57-64. [PubMed]
  • Zhang, W., Sloan-Lancaster, J., Kitchen, J., Trible, R.P., and Samelson, L.E. 1998. LAT: The ZAP-70 tyrosine kinase substrate that links T cell receptor to cellular activation. Cell 92: 83-92. [PubMed]


Articles from Genome Research are provided here courtesy of Cold Spring Harbor Laboratory Press
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...