![]() | ![]() |
Formats:
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2007 The Author(s) Improving comparability between microarray probe signals by thermodynamic intensity correction 1Niels Bohr Institute, Blegdamsvej 17, 2100 Copenhagen, Denmark and 2Center for Biological Sequence Analysis, BioCentrum-DTU, Danish Technical University, 2800 Lyngby, Denmark *To whom correspondence should be addressed. Phone: +45 45252489, Fax: +45 45931585, Email: hbjorn/at/cbs.dtu.dkReceived May 24, 2006; Revised December 1, 2006; Accepted December 1, 2006. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. This article has been cited by other articles in PMC.Abstract Signals from different oligonucleotide probes against the same target show great variation in intensities. However, detection of differences along a sequence e.g. to reveal intron/exon architecture, transcription boundary as well as simple absent/present calls depends on comparisons between different probes. It is therefore of great interest to correct for the variation between probes. Much of this variation is sequence dependent. We demonstrate that a thermodynamic model for hybridization of either DNA or RNA to a DNA microarray, which takes the sequence-dependent probe affinities into account significantly reduces the signal fluctuation between probes targeting the same gene transcript. For a test set of tightly tiled yeast genes, the model reduces the variance by up to a factor ~1/3. As a consequence of this reduction, the model is shown to yield a more accurate determination of transcription start sites for a subset of yeast genes. In another application, we identify present/absent calls for probes hybridized to the sequenced Escherichia coli strain O157:H7 EDL933. The model improves the correct calls from 85 to 95% relative to raw intensity measures. The model thus makes applications which depend on comparisons between probes aimed at different sections of the same target more reliable. INTRODUCTION Signals from oligonucleotide microarrays have proven highly reproducible and the great majority of the stochastic variation seen typically originates from differences in the samples measured. The high reproducibility of the signal, however, breaks down when signals from different probes, targeted against the same target, are compared (1,2). Hence, probes measuring the same gene transcript, in the same sample, present on the same oligonucleotide array, typically result in a wide range of signal intensities. The microarray community have in large avoided this problem by restricting comparisons to be between identical probes. Even where multiple probes are targeted against a given transcript, the comparisons are done probe wise (3) or they are based on so-called expression index calculations (1,4) that carefully avoid comparisons across different probes. Comparisons between different probes, however, are of great interest because they allow detection of differences along a sequence. Microarray detection of intron/exon architecture, transcription boundary, the methylation state of genomic regions, etc. depends on such comparisons. Ultimately, probe comparisons will allow absent/present calls. Substantial amounts of data using tiling arrays are available (5–7) as well as data on exon/intron detection (8). At present, analyses hereof have relied on statistical or rule-based approaches, exploiting the continuation of the signal levels along a sequence or elevated signal within a window (5,6). The relative high signal variation between probes restricts such methods from detecting short stretches of shuttle differences. Importantly, much of the probe variation is sequence dependent (9). Hence, correcting for the sequence-dependent variation among probes should compensate for the intensity fluctuations of probes targeting the same gene. Here we present a thermodynamic model for the microarray hybridization, taking the sequence-dependent hybridization affinities into account. We use the model to analyze two different microarray experiments: one based on DNA–RNA hybridization and one based on DNA–DNA hybridization. The main purpose of the present article is to demonstrate how such a model can be used to improve the analysis of experiments which rely on comparisons between individual probes aimed at different sections of the same target. This is because the model takes into account the different binding affinities of the probes thereby compensating partially for the signal intensity fluctuations of probes with the same target. The model thus has the important advantage that it allows a more quantitative comparisons between signals from different probes. We describe two applications of this. First, we use the model to determine the position of transcriptions start sites (TSS) with greater accuracy than is possible using the raw signals. We then use the model to identify the presence/absence of DNA segments in a cross-strain DNA hybridization between two sequenced Escherichia coli strains. Again, the model yields significantly more reliable results than when using the raw intensities. MATERIALS AND METHODS Array design The genome sequence for Sacchoromyces cerevisiae was downloaded from the SGD FTP site (ftp://genome-ftp.stanford.edu/pub/yeast/), on the 18th of September 2004, and all CDSs were extracted (DNA and Intron/Exon annotation), using the FeatureExtract software (10). Using this sequence information as a basis, the following probe-sets were designed: Up to 20 probes per gene for S. cerevisiae genes (n = 5866) were selected using OligoWiz 2 (11,12). In addition to the exon probes, probes with a minimum distance of 25 bp were placed targeting the regions 300 bp upsteam and 100 bp downsteam of each gene (using OligoWiz 2).About 5000 random probes of length 25 bp where generated, using 25% probability of each of the four nucleotides: A, T, G and C.About 28 genes (12 of these in duplicate, yielding 40 in total) covering the range from low to high expression, according to de Lichtenberg et al. (13) were densely tiled with probes (see table below). 23, 25 and 27 bp probes were designed, with 10 bp between the midpoints of the probes. This means all three length-variants of the probes are centered on the same position. In total 18 759 tiling probes were designed for each of the three probe lengths, 23, 25 and 27 bp. The data can be found at http://www.cbs.dtu.dk/suppl/probes/.
Experimental procedures The RNA used in the experimental part of this publication, was extracted from a S. cerevisiae CDC15-2 strain 30 min after release from a temperature induced arrest of the cell cycle in late mitotic phase. See (13) for strain and growth condition details. Total RNA was extracted using the FastRNA pro red kit from Qbiogene, according to the manufacturers description—for the lysis step the samples were processed for 40 s at speed 6.0 in the FastPrep apparatus. Quality and quantity of total RNA was assessed using spectrophotometer readings at 260 and 280 nm and using an Agilent Bioanalyzer. aRNA was synthesized using the Message Amp II Biotin Enhanced kit (Ambion), using oligo-dT primers, and aRNA fragmentation was done by heating the aRNA to 94°C for 35 min in a MgCl2 buffer. Hybridization was performed according to the standard Affymetrix protocol. Raw probe intensity values for our custom-designed NimbleExpress chip were obtained using the makecdfenv and affy packages from Bioconductor (14).Intensities were taken from whole chromosomal DNA hybridizations of E. coli strain O157:H7 EDL933 (15,16) and K-12 W3110 (17) to custom-designed NimbleExpress arrays covering seven E. coli genomes including EDL933 and W3110 (18). In short, independent biological triplicates of each strain were grown overnight in Luria–Bertani (LB) broth with continuous agitation (19), and DNA was isolated using the Qiagen Genomic Tip 500/G (Qiagen, Hilden, Germany) and the Genomic DNA Buffer set (Qiagen). Seven microgram of genomic DNA was fragmented with 0.7 Units of DNAse 1 (Amersham Biosciences, Piscateway, NJ) for 10–12 min at 37°C in 1× One-Phor All Plus buffer (Amersham Biosciences) to obtain fragments of 50–200 bp. Fragmented DNA was labeled according to the manufacturers instructions for labeling fragmented cDNA derived from mRNA for prokaryotic arrays (Affymetrix Inc., Santa Clara, CA). The labeled DNA was hybridized to custom made NimbleExpress arrays (Affymetrix Inc.) for 15–17 h at 45°C. Standard protocols from Affymetrix for hybridization, washing and staining were followed using a hybridization oven, a Fluidics Station 450 and a GeneChip® Scanner 3000 (Affymetrix Inc.). Custom-designed probes were mapped to the EDL933 and W3110 genomes for which probes were included on the array. Hereby, we could determine to which extend W3110 probes theoretically should hybridize to the EDL933 samples and vice versa.Physicochemical model The results presented in this article are based on a physical model for the binding of fluorescently labeled RNA/DNA strands to the oligonucleotides on the DNA chip. The model is similar to the ones presented in Ref. (9,20–22). It is based on equilibrium thermodynamics and assumes that the observed intensity variations between probes for the same gene are due to differences in the binding energies between the probes and the RNA/DNA strands. The model applies to both RNA and DNA strands in solution but we will for brevity refer to the case of RNA strands in the rest of this section. For DNA strands, the model is completely analogous. For simplicity, the model neglects effects such as secondary structures and cross-hybridization and assumes that a given probe is either completely bound to one RNA strand or unbound (free). The basic process for the probe-RNA hybridization is
[0,1] denotes the fraction of probes with target RNA of type p which are bound to a RNA strand, equilibrium thermodynamics predicts (23)
γ (p,p′) with p′ ≠ p. The equilibrium constants are given by (23)
The observed intensity I(p) for a given probe is proportional to the fraction of probes f(p) bound to a RNA strand, i.e. I(p) = κ f(P). We can rewrite Equation (2) as
A major goal for a model for the hybridization process on DNA chips is to yield information on the concentration of the RNA strands in the solution. This is given by the inverse of Equation (4):
Model for probe intensity parameters To proceed, we need a model for how the parameters a(p), b(p) and c(p) depend on the probe sequence. We will use a position dependent nearest neighbor model to describe the dependence of the hybridization energies on the probe sequence writing
The fitting procedure is based on the least squares method. For instance, to find the εXX and ω(i) parameters for a(p) we minimize
RESULTS Fluctuation reduction between probes of the same target The main goal of the present article is to enable a more reliable comparison between intensity data coming from probes targeting different sections of the same target. In this section, we therefore analyze the models ability to correct for fluctuations between probes targeted against the same gene transcript. An oligonucleotide microarray holding probes densely tiling 28 yeast genes, were hybridized (for details see Materials and methods, Array design section). Here, we analyze the data from the resulting 18 759 tiling probes targeting 28 genes with unknown concentration in the yeast data set. We write for probes which target a RNA sequence from a gene . The intensities I(p) vary strongly within this probe set even though they probe the same gene. Since we in the yeast experiment do not have (as opposed to the Spike-In data) intensity data for the same probes at different known target concentrations, we cannot use the full non-linear Langmuir form Equation (4). We therefore linearize Equation (4) obtaining
, the concentration x(p) is constant and the right-hand side of Equation (9) only depends on the probe sequence through c(p)/a(p). In analogy with the Spike-In analysis (see supplementary notes), we therefore fit (least squares) the observed intensity from probes to the PD model using Equation (7) with ln a(p) replaced by ln [c(p)x(p)/a(p)]. To minimize the uncertainty for the fitting parameters, we pick the gene targeted by the most probes (489 probes). The result of the fit is shown in Figure 1 . For clarity, we plot only the first 100 of the 489 probes in the plot. As we see from Figure 1
With the fitting parameters determining the probe sequence dependence of ln [c(p)x(p)/ a(p)] for probes [with constant x(p)] obtained, one can use Equation (9) to predict the concentration of other genes relative to . We rewrite Equation (9) for a probe p′ probing RNA strands from gene ( ) with unknown concentration x(p′) as
has the same concentration as the product of the gene used for the fitting of Equation (9), i.e. x(p′) = x(p). The difference between the prediction of the fit and the observed intensity should be constant for all probes targeting a given gene product and yields from Equation (10) the relative concentration of gene product compared to . To illustrate this, we plot in Figure 1 and the model prediction for the intensity given the probe sequence. Again, we only show the first ~ 100 probes. We see that the model prediction for the intensity is consistently lower than the observed intensity. From Equation (10), this corresponds to a higher concentration of gene product as compared to gene product . Furthermore, the difference between the observed intensity and the prediction is approximately constant in agreement with Equation (10). By performing a probe average of this difference for probes , we obtain from Equation (10) an estimate of the concentration of gene product relative to . We denote this by . For the specific gene plotted in Figure 1 , i.e. the concentration of gene is 7.6 times higher than . Using this approach, one can obtain the concentration of all the gene products not used for fitting relative to the gene product used for fitting.As stated above, a major purpose of our theory is to explain the large variations in the raw intensity between probes targeting the same gene. To illustrate this variation, we present in Figure 2 not used for determining the fitting parameters. A box plot of the corresponding predicted concentration obtained from the PD model is also shown. We see that variation of the concentration predictions is significantly lower than for the raw intensities. For the specific gene in Figure 2
A way to improve the performance of the model could be to add random probes with the same base-pair content as the probes targeting the yeast genomes. Providing one can neglect the position dependence of the binding process, such probes would give more information on the background (non-specific) binding contribution to the signal which could be used by our model. Application: determining TSS from probe signals One of the main motivations for this work is to facilitate a present/absent call as well as to decide the boundary of transcripts along a genomic sequence through hybridization of probes targeted along a genome sequence. Since the RNA is not expressed from regions upstream of the transcription start sites (TSS), there is no specific binding to probes targeting these regions. Consequently, we expect the intensity of such probes to be smaller than those targeting the transcribed regions. However, this tendency is often distorted by the large variations in the observed intensity due to affinity differences between probes. As demonstrated above, the physicochemical model presented in this article can partially compensate for these variations. The approach is as follows. We use Equation (10) to extract RNA concentrations relative to the tiled gene for which the model was fitted. We then fit both the observed intensity and the corresponding predicted concentrations [using Equation (10)] from probes around an expected TSS to the functional form
To illustrate the performance of the model, we show in Figures 3 In total, 529 TSS where analyzed. Including probes positioned ± 300 base positions around the gene translational-start positions, we obtain an average χ
Absent/present call on DNA–DNA hybridization We now analyze the microarray data from genomic DNA hybridizations of E. coli O157:H7 EDL933 to a custom-designed microarray covering seven E. coli genomes including the K-12 W3110 strain. By mapping to the known sequence of W3110, we identified probes that should hybridize to the EDL933 sample, in theory. These probes experience specific binding to their target DNA strands and we denote them as present-probes. The rest of the probes in general have a lower intensity since they do not experience specific binding; we denote them absent-probes. This is illustrated in Figure 5
The present model compensates partially for these fluctuations yielding more narrow distributions for the present-/absent-probes with less overlap. This is illustrated in Figure 5 In Figure 6
DISCUSSION DNA microarray hybridization signals are distorted by various factors. A significant part of the distortion can be attributed to the base sequence dependence of the probe affinity. We presented a physicochemical theory for the hybridization process on microarrays using a position dependent nearest neighbor model for the binding energies. In this way, we take stacking energies and positional effects within the probes into account when analyzing the hybridization signal. The main purpose of the article is to demonstrate that such a model allows the signals from different probes with the same target to be compared more accurately, as the conversion renders the signal less dependent on the probe affinity. We demonstrated that the model reduces the signal variance up to 64\ % for probes with the same target. It thus enables a more quantitative comparison of signals from different probes. Two applications of this were presented. First, we demonstrated that our model provides a more accurate estimate of the position of TSS as compared to using raw intensities. The probe data were fitted to a hyperbolic tangent to model the TSS. Not surprisingly, the reduction in the signal variation by our model improves the fit significantly. This result does not depend on the specific functional form (hyperbolic tangent) used for the fit; others may want to model the TSS, other part of the gene structures or absence of transcription all together, by other means. We expect that most methods should benefit from using signals corrected for probe affinity effects by thermodynamic intensity correction similar to the one presented here. Second, as a benchmark for ability of the model to separate signal from no signal, commonly referred to as absent/present call, we turned to a data set where the result is known a priory. Genomic DNA from the E. coli strain EDL933 for which the genomic sequence have previously been determined, was hybridized to a microarray containing probes for another E. coli strain, namely W3110. The correct call (absent/present) could be determined for 85\% of the probes when the raw signals were used, whereas 95\% correct calls could be made when using probe affinity corrected signals. This demonstrated a very useful application of our model. Also, it shows that the model works for DNA–DNA hybridization as well as RNA–DNA hybridization. A software implementation of the model presented in this article together with a description on how to use it is available at http://www.cbs.dtu.dk/suppl/probes/. In the future, one could improve the performance of the model even further by taking into account additional aspects of the hybridization such as probe and/or target folding and sandwich hybridization. SUPPLEMENTARY DATA Supplementary Data is available at NAR online. ACKNOWLEDGEMENTS A grant from The Danish Technical Research Council (STVF) for the ‘Systemic Transcriptomics in Biotechnology’ (#26-03-0147) financed this work as well as the Open Access publication charge. Conflict of interest statement. None declared. REFERENCES 1. Li C, Wong WH. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc. Natl. Acad. Sci. USA. 2001;98:31–36. [PubMed] 2. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003;4:249–264. [PubMed] 3. Lemon WJ, Liyanarachchi S, You M. A high performance test of differential gene expression for oligonucleotide arrays. Genome Biol. 2003;4:R67. [PubMed] 4. Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 2003;31:e15. [PubMed] 5. Shoemaker DD, Schadt EE, Armour CD, He YD, Garrett-Engele P, McDonagh PD, Loerch PM, Leonardson A, Lum PY, et al. Experimental annotation of the human genome using microarray technology. Nature. 2001;409:922–927. [PubMed] 6. Stolc V, Samanta MP, Tongprasit W, Sethi H, Liang S, Nelson DC, Hegeman A, Nelson C, Rancour D, et al. Identification of transcribed sequences in Arabidopsis thaliana by using high-resolution genome tiling arrays. Proc. Natl. Acad. Sci. USA. 2005;102:4453–4458. [PubMed] 7. Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H, et al. Transcriptional maps of 10 human chromosomes at 5 nucleotide resolution. Science. 2005;308:1149–1154. [PubMed] 8. Clark TA, Sugnet CW, Ares M., Jr Genomewide analysis of mRNA processing in yeast using splicing-specific microarrays. Science. 2002;296:907–910. [PubMed] 9. Binder H, Kirsten T, Loeffler M, Stadler PF. Sensitivity of microarray oligonucleotide probes: variability and effect of base composition. J. Phys. Chem. B. 2004;108:18003. Binder,H., Kirsten,T., Hofacker,I.L., Stadler,P.F. Loeffler,M. (2004) Interactions in oligonucleotide hybrid duplexes on microarrays. J. Phys. Chem. B, 108, 18015. 10. Wernersson R. FeatureExtract–extraction of sequence annotation made easy. Nucleic Acids Res. 2005;33:W567–W569. [PubMed] 11. Wernersson R, Nielsen HB. OligoWiz 2.0–integrating sequence feature annotation into the design of microarray probes. Nucleic Acids Res. 2005;33:W611–W615. [PubMed] 12. Nielsen HB, Wernersson R, Knudsen S. Design of oligonucleotides for microarrays and perspectives for design of multi-transcriptome arrays. Nucleic Acids Res. 2003;31:3491–3496. [PubMed] 13. de Lichtenberg U, Wernersson R, Jensen TS, Nielsen HB, Fausboll A, Schmidt P, Hansen FB, Knudsen S, Brunak S. New weakly expressed cell cycle-regulated genes in yeast. Yeast. 2005;22:1191–1201. [PubMed] 14. Gautier L, Cope L, Bolstad BM, Irizarry RA. Affy–analysis of Affymetrix GeneChip data at the probe level. Bioinformatics. 2004;20:307–315. [PubMed] 15. O'Brien AD, Newland JW, Miller SF, Holmes RK, Smith HW, Formal SB. Shiga-like toxin-converting phages from Escherichia coli strains that cause hemorrhagic colitis or infantile diarrhea. Science. 1984;226:694–696. [PubMed] 16. Perna NT, Plunkett G, III, Burland V, Mau B, Glasner JD, Rose DJ, Mayhew GF, Evans PS, Gregor J, et al. Genome sequence of enterohaemorrhagic Escherichia oli O157:H7. Nature. 2001;409:529–533. [PubMed] 17. Hayashi K, Morooka N, Yamamoto Y, Fujita K, Isono K, Choi S, Ohtsubo E, Baba T, Wanner BL, et al. Highly accurate genome sequences of Escherichia coli K-12 strains MG1655 and W3110. Mol. Syst. Biol. 2006;2 2006.0007. 18. Willenbrock H, Petersen A, Sekse C, Kiil K, Wasteson Y, Ussery DW. Design of a 7 Escherichia coli Genomes Microarray for Comparative Genomic Profiling. J. Bacterio. 2006;188:7713–7721. The data is available from the Gene Expression Omnibus database (GEO: http://www.ncbi.nlm.nih.gov/geo/) with the series accession number GSE4690. 19. Sambrook J, Fritsch EF, Maniatis T. Molecular Cloning: A Laboratory Manual. 2nd ed. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press; 1989. 20. Held GA, Grinstein G, Tu Y. Modeling of DNA microarray data by using physical properties of hybridization. PNAS. 2003;100:7575. [PubMed] 21. Held GA, Grinstein G, Tu Y. Relationship between gene expression and observed intensities in DNA microarrays – a modeling study. Nucleic Acids Res. 2006;34:e70. [PubMed] 22. Hekstra D, Taussig AR, Magnasco M, Naef F. Absolute mRNA concentrations from sequence-specific calibration of oligonucleotide arrays. Nucleic Acids Res. 2003;31:1962. [PubMed] 23. Silbey RJ, Alberty RA. Physical Chemistry. West Sussex, UK: John Wiley; 2000. 24. Available from http://www.netaffx.com. 25. Zhang L, Miles MF, Aldape FD. A model of molecular interactions on short oligonucleotide microarrays. Nature Biotechnol. 2003;21:818. [PubMed] 26. Wu Z, Irizarry RA, Gentleman R, Martinez-Murillo F, Spencer F. A model based background adjustment for oligonucleotide expression arrays in the journal of the american statistical association. 2004;99:909. 27. Wu Z, Irizarry RA. Nat. Biotechnol. 2004;22:656. [PubMed] 28. Choe1 SE, Boutros M, Michelson AM, Church GM, Halfon MS. Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset. Genome Biol. 2005;6:R16. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Proc Natl Acad Sci U S A. 2001 Jan 2; 98(1):31-6.
[Proc Natl Acad Sci U S A. 2001]Biostatistics. 2003 Apr; 4(2):249-64.
[Biostatistics. 2003]Genome Biol. 2003; 4(10):R67.
[Genome Biol. 2003]Nucleic Acids Res. 2003 Feb 15; 31(4):e15.
[Nucleic Acids Res. 2003]Nature. 2001 Feb 15; 409(6822):922-7.
[Nature. 2001]Nucleic Acids Res. 2005 Jul 1; 33(Web Server issue):W567-9.
[Nucleic Acids Res. 2005]Nucleic Acids Res. 2005 Jul 1; 33(Web Server issue):W611-5.
[Nucleic Acids Res. 2005]Nucleic Acids Res. 2003 Jul 1; 31(13):3491-6.
[Nucleic Acids Res. 2003]Nucleic Acids Res. 2003 Jul 1; 31(13):3491-6.
[Nucleic Acids Res. 2003]Yeast. 2005 Nov; 22(15):1191-201.
[Yeast. 2005]Yeast. 2005 Nov; 22(15):1191-201.
[Yeast. 2005]Bioinformatics. 2004 Feb 12; 20(3):307-15.
[Bioinformatics. 2004]Science. 1984 Nov 9; 226(4675):694-6.
[Science. 1984]Nature. 2001 Jan 25; 409(6819):529-33.
[Nature. 2001]Proc Natl Acad Sci U S A. 2003 Jun 24; 100(13):7575-80.
[Proc Natl Acad Sci U S A. 2003]Nucleic Acids Res. 2006 May 24; 34(9):e70.
[Nucleic Acids Res. 2006]Nucleic Acids Res. 2003 Apr 1; 31(7):1962-8.
[Nucleic Acids Res. 2003]Nucleic Acids Res. 2003 Feb 15; 31(4):e15.
[Nucleic Acids Res. 2003]Nucleic Acids Res. 2003 Apr 1; 31(7):1962-8.
[Nucleic Acids Res. 2003]Nat Biotechnol. 2003 Jul; 21(7):818-21.
[Nat Biotechnol. 2003]Nat Biotechnol. 2004 Jun; 22(6):656-8; author reply 658.
[Nat Biotechnol. 2004]Nucleic Acids Res. 2003 Jul 1; 31(13):3491-6.
[Nucleic Acids Res. 2003]Science. 2005 May 20; 308(5725):1149-54.
[Science. 2005]