• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Jan 26, 2010; 107(4): 1482–1487.
Published online Jan 8, 2010. doi:  10.1073/pnas.0913883107
PMCID: PMC2824413
Genetics

Optimized detection of sequence variation in heterozygous genomes using DNA microarrays with isothermal-melting probes

Abstract

The use of DNA microarrays to identify nucleotide variation is almost 20 years old. A variety of improvements in probe design and experimental conditions have brought this technology to the point that single-nucleotide differences can be efficiently detected in unmixed samples, although developing reliable methods for detection of mixed sequences (e.g., heterozygotes) remains challenging. Surprisingly, a comprehensive study of the probe design parameters and experimental conditions that optimize discrimination of single-nucleotide polymorphisms (SNPs) has yet to be reported, so the limits of this technology remain uncertain. By targeting 24,549 SNPs that differ between two Saccharomyces cerevisiae strains, we studied the effect of SNPs on hybridization efficiency to DNA microarray probes of different lengths under different hybridization conditions. We found that the critical parameter for optimization of sequence discrimination is the relationship between probe melting temperature (Tm) and the temperature at which the hybridization reaction is performed. This relationship can be exploited through the design of microarrays containing probes of equal Tm by varying the length of probes. We demonstrate using such a microarray that we detect >90% homozygous SNPs and >80% heterozygous SNPs using the SNPScanner algorithm. The optimized design and experimental parameters determined in this study should guide DNA microarray designs for applications that require sequence discrimination such as mutation detection, genotyping of unmixed and mixed samples, and allele-specific gene expression. Moreover, designing microarray probes with optimized sensitivity to mismatches should increase the accuracy of standard microarray applications such as copy-number variation detection and gene expression analysis.

Keywords: DNA/DNA hybridization, sequence discrimination, single-nucleotide polymorphisms, melting temperature, probe design

The original motivation for the development of DNA microarrays by the group of Edwin Southern was the identification of DNA sequence variation (1). Early studies by Southern and others showed that when short single-stranded DNA probes are affixed to a solid surface, the efficiency with which they form duplexes with single-stranded DNA in free solution is sensitive to the presence of single-base-pair mismatches. This made it feasible to detect the presence of single-nucleotide polymorphisms (SNPs) in a DNA sample on the basis of hybridization efficiency to DNA probes of known sequence. The ability to discriminate DNA sequence using microarrays of many hundreds of thousands of oligonucleotide probes underpins a number of DNA microarray applications, including multiplex genotyping of SNPs (2), mutation detection (35), and resequencing by hybridization (6). Mutation detection using microarrays remains a cheap and simple means of characterizing nucleotide variation in small genomes (1, 4); however, the extent to which this approach is extensible to more complex genomes or the detection of heterozygous mutations remains unclear. Additional emerging applications make use of SNP-specific DNA probes including global studies of allele-specific gene expression (7) and quantitative genotyping of pooled samples for bulk-segregant genetic mapping (810).

Despite the myriad applications of DNA sequence discrimination using DNA microarrays, a comprehensive empirical study of the parameters important for optimizing sequence discrimination on microarrays has not been performed. Furthermore, probe design rules that are relevant for DNA microarrays intended for other uses, such as gene expression analysis (11), DNA barcode measurements (12), or detection of copy-number variation, are not necessarily relevant for sequence discrimination applications.

A large body of literature regarding the thermodynamics of duplex formation (1318) is relevant to DNA microarray design. The free energy of duplex formation (ΔG°) is best estimated by a nearest-neighbor (NN) model, which assumes that the stability of a given base pair depends on the identity and orientation of the neighboring base pair (17). Empirically determined enthalpic (ΔH°) and entropic (ΔS°) values have been determined for all 10 NNs, and therefore ΔG° is readily determined using the relationship ΔG° = ΔH° − TΔS°. The ability to discriminate DNA sequence on the basis of hybridization requires that the difference between the free energy of hybridization of perfectly matched duplex (ΔG°PM) be significantly less than the free energy of hybridization of mismatched DNA (ΔG°MM). For duplex formation, a useful metric is the melting temperature of the duplex (Tm), which is the temperature at which half the DNA strands are in a double-helix state. The Tm of a given sequence is calculated according to the relationship Tm = ΔH° × 1000/(ΔS° + R × ln(CT/x)) − 273.15, where R is the gas constant (1.9872 cal/K mol), CT is the total molar strand concentration, and x = 4 for non-self-complementary duplexes (19). Thermodynamic parameters have been determined for all possible mismatches, the majority of which are destabilizing of duplex formation (19) and thus decrease the Tm of the mismatched duplex. Thus, in principle, it should be possible to estimate the ideal probe design such that the Tm of the matched duplex is much greater than the Tm of the mismatched duplex. In practice, however, for multiplex scenarios, other factors must be considered: in particular the specificity of the probe within a genomic context and the fact that there are additional reactions competing with the bimolecular reaction necessary for duplex formation. Furthermore, the vast majority of thermodynamic studies of duplex formation has been performed in solution, whereas microarrays involve one strand affixed to a solid substrate and one strand in free solution. The effects of this asymmetry, and the importance of hybridization conditions, substrate concentration, and signal to noise, require empirical determination.

The purpose of this study was to identify the optimized microarray design and hybridization conditions for discriminating sequence variation on microarrays. We sought especially to determine whether an optimized microarray design makes it feasible to detect heterozygous mutations in a diploid genome on the basis of hybridization efficiency. We made use of two fully sequenced yeast genomes that contain 24,549 sequence-verified SNPs to test the effect of single-nucleotide mismatches on hybridization efficiency. We identified a relationship between the hybridization temperature and the Tm of the probe, regardless of probe length, that maximizes the sensitivity of a DNA probe to mismatches. We used this finding to guide construction of a DNA microarray in which probes are designed to have a homogeneous Tm of ≈57 °C by varying their length between 16 and 35 nucleotides. Using this isothermal microarray design, we demonstrate the sensitivity of the SNPScanner algorithm for the detection of homozygous and heterozygous mutations. The optimized design parameters identified in this study should prove useful for guiding future microarray design for a variety of sequence-specific applications.

Results

To study the parameters that are important for sequence discrimination using DNA microarrays, we made use of the complete genome sequences available for two strains of Saccharomyces cerevisiae: the S288c reference sequence (hereafter, the reference genome) and the RM11-1a sequence (hereafter, the nonreference genome). Our previous analysis had identified 24,549 sequence-verified SNPs between these two strains that are separated by at least 25 nucleotides (4). To study the effect of microarray probe length on sensitivity to mismatches, we designed three different test microarrays each containing DNA oligonucleotides of length 20, 25, or 30 bases that were tiled in an overlapping manner across SNP sites. The ~240,000 DNA probes were designed to be perfectly complementary to the reference genome. The position of each probe relative to the SNP was systematically altered so that all possible mismatched positions within the probe were equally represented across the array. In addition, two probes were designed to flank, but not cover, each SNP. These probes covered regions that have identical sequence in the reference and nonreference genomes (see Methods).

Probe Tm Determines Optimal Conditions for Sensitivity to Mismatches.

To test systematically the effect of probe length and hybridization conditions on sequence discrimination, we cohybridized reference (Cy5-labeled) and nonreference (Cy3-labeled) genomic DNA to the three test microarrays containing probes of 20-, 25-, or 30-nucleotide length. Hybridization experiments were performed at 5 °C increments from 45 to 65 °C (Table S1). For all experiments, the set of probes that spanned nonpolymorphic regions of the genome (between 42,867 and 49,587 probes depending on the microarray) was used to normalize the microarrays.

We first performed experiments on test microarrays using DNA from haploid yeast strains. To assess the sensitivity of probes for each microarray to mismatches, we determined the median ratio (expressed as a log2 value) of all probes that contain a polymorphic site in the genomic DNA sample regardless of the exact position of the SNP in the probe (Fig. 1A). These experiments demonstrate that the sensitivity of duplex formation to mismatches increases with decreasing probe length under identical hybridization conditions. Furthermore, performing hybridization reactions at higher temperatures increases the sensitivity of hybridization to mismatches for all probe lengths. We performed the same series of experiments using DNA from a diploid produced by mating the reference and nonreference strains together, which ensures that the genomic DNA is heterozygous at all 24,549 SNP sites. At heterozygous sites, one allele is perfectly complementary to the probe and one allele contains a mismatch. The optimal ratio that can be expected in these cases is 0.5 (log2 = −1). As with haploid DNA, we observed increased sensitivity to mismatches with both decreased probe length and increased hybridization temperature for heterozygous DNA (Fig. 1B).

Fig. 1.
Sequence discrimination depends on probe length and hybridization temperature. (A) When hybridization is performed at a given temperature, probes of length 20 nucleotides (plus signs) exhibit enhanced sensitivity to mismatches over probes of 25 nucleotides ...

Previous studies have found that where the mismatch occurs in the probe is a major determinant of the perturbation on hybridization (4, 20). Namely, more central mismatches have much great effect on hybridization than those occurring at the terminal positions. We found that this effect holds for all probe lengths (Fig. S1). In free solution, terminal mismatches have been reported to have a stabilizing effect on duplex formation (19). In contrast, we found that all terminal mismatches on a microarray result in decreased hybridization.

Another well-known parameter affecting duplex formation is the proportion of bases in the probe that are either guanines or cytosines (%GC). This metric is often used as a proxy for probe Tm. We computed the melting temperatures for all probes on the three microarrays using NN parameters (18). Tm and %GC content are correlated for all probe lengths and, in general, probe-melting temperatures increase with probe length (Fig. S2). Within each microarray probe, Tms are widely distributed with a standard deviation of ≈5 °C (Table 1).

Table 1.
Melting temperature (Tm) for microarrays

We determined the relationship between sensitivity to mismatches and Tm at different hybridization temperatures for each microarray (Fig. 2). For this purpose we used only those central positions of DNA probes that are most sensitive to mismatches. In general, there is reduced sensitivity to sequence mismatches with increased probe Tm for all hybridization temperatures. We also observed that discrimination is reduced at probe Tms well below the hybridization temperature. The optimal relationship between probe Tm and hybridization temperature occurs where the smoothed curve reaches a minimum. This optimum, which is clearest for hybridization temperatures greater than 50 °C, appears to occur when probes have a melting temperature ≈5 °C lower than the temperature at which the hybridization reaction is performed. This relationship is independent of probe length, as it is observed in microarray experiments using probes of 20- (Fig. 2A), 25- (Fig. 2B), and 30- (Fig. 2C) nucleotide length.

Fig. 2.
The relationship between sensitivity to sequence mismatches and DNA probe melting temperature for probes of length (A) 20, (B) 25, and (C) 30 nucleotides. For each hybridization experiment performed at increasing temperatures from 45 to 65 °C ...

A comparison of hybridization efficiencies for reference and nonreference DNA makes clear the basis of this relationship: for any given hybridization temperature and probe length, as the probe melting temperature increases, total hybridization increases (Fig. S3). The thermodynamic cost of a mismatch is maximized when the hybridization temperature is 5 °C higher than the melting temperature of the probe. This penalty is reduced for probes of higher melting temperature. For probes with Tm much lower than the hybridization temperature, the hybridization efficiency of perfectly matched DNA is reduced, and thus the thermodynamic cost of a mismatch is less pronounced.

Performance of Microarrays with Isothermal-Melting Variable-Length Probes.

To exploit this newly discovered relationship between probe Tm and sensitivity to mismatches, we designed a DNA microarray for which we aimed to establish uniform probe melting temperatures by varying the length of the probes. We designed probes with a target Tm of 57 °C, computed using NN parameters (17), by varying the probe length between 16 and 35 nucleotides (see Methods), and tiled them across the 24,549 SNPs that differ between the reference and nonreference genomes. The modal probe length for this microarray is 24 nucleotides (Fig. S4) and the Tms for all probes on the array are tightly distributed with a standard deviation nearly one order of magnitude less than that of arrays with fixed probe length (Table 1). By cohybridizing reference and nonreference DNA at 55, 60, and 65 °C, we found that the best temperature for hybridization was 60 °C, consistent with our observations using fixed probe length microarrays (Table S2). We infer, from the fixed length data, that hybridization at 62 °C might be slightly better still.

We performed four hybridization experiments using haploid DNA and four hybridizations using heterozygous diploid DNA at 60 °C. Experimental results were highly reproducible (pairwise correlations > 0.87). We confirmed that the relationship between a mismatched position within a probe and sensitivity holds for an isothermal probe design for both haploid (Fig. S5A) and heterozygous (Fig. S5B) DNA. Consistent with an optimization of sequence discrimination by isothermal probes, ratios are very close to the theoretical optimum of log2 ratios of −1 for heterozygous genomes (Fig. S5B). For both haploid and heterozygous samples, probes behave predictably for lengths between 19 and 30 nucleotides. Shorter probes (16–18 nucleotides) have extremely high %GC, whereas longer probes (31–35 nucleotides) have extremely high %AT content (Fig. S6A), and in both cases behave less predictably. Using our Tm-matched design parameters with a target Tm of 57 °C, 225,100 (95.5%) probes are between 19 and 30 nucleotides in length.

We examined the effect of each possible mismatched base pairing on hybridization efficiency (Fig. 3). The maximal effect of a mismatch occurs when a mutation in the sample DNA, present in free solution, results in a mismatch with a cytosine in the probe. C-C mismatches have the greatest effect, followed by C-T and C-A mismatches, which have a greater effect than all other possible mismatches. This is consistent with known thermodynamic properties of mismatches in which mismatches with C are the weakest (19). This effect is not symmetrical, as T-C and A-C mismatches in which the A or T are in the DNA probe result in significantly less perturbation of hybridization. In fact, this is true of all pairs of symmetrical mismatches due to fact that the computed ratio is dependent on both the identity of the mismatched base pair and the perfectly matched base pair determined by the base in the probe. The smallest effect of mismatches is observed when a T or A is mismatched with G. This is consistent with the known promiscuity of G, as it forms the strongest mismatches (19).

Fig. 3.
Differential effect of mismatches on annealing efficiency at isothermal probes. The median ratio for the interquartile region of probes of all lengths is plotted for each possible mismatch. The first nucleotide is the base present in the probe. The second ...

Microarrays with Isothermal-Melting Probes Efficiently Detect Heterozygous Mutations.

Previously, we developed the SNPScanner algorithm, which accurately detects the presence of >85% of SNPs in haploids using an array of fixed probe length (25 nucleotides) with an average of 4-base-pair spacing between probes (4). We tested the performance of the SNPScanner algorithm on isothermal arrays by performing holdout analyses from individual hybridization experiments. We performed multiple tests in which we trained a model using data from 23,549 randomly selected SNPs and tested the detection ability of the algorithm on sets of 1,000 random test SNPs held out from the training set. The SNPScanner algorithm computes the likelihood that a site in the genome is polymorphic. For ratiometric data that are log2-transformed, the likelihood calculation reduces to

equation image

The variance of ratios differs for different probe lengths for cohybridized identical DNA sequences on isothermal arrays (Fig. S6B). Therefore, we employed a probe-length-specific variance (σ2) measure for the likelihood calculation determined from a microarray to which reference DNA had been hybridized in both channels. The likelihood is computed for site k in the genome using the experimentally determined intensity (x) in probe i and summed for all probes containing site k. μp is the modeled value for a SNP in probe i complementary to site k. For each training/test set, we estimated our false negative rate to be the fraction of the 1,000 test SNPs that we failed to detect. To estimate our false positive rate, we used the same holdout procedure for training but tested detection of the 1,000 SNPs in a cohybridization experiment of differentially labeled reference DNA in which we expect to detect no SNPs.

We performed 10 independent tests per hybridization experiment on 4 replicate hybridizations. Our 40 tests used an average of 165,744 probes to train the algorithm and an average of 9,033 probes to predict the presence of 1,000 SNPs. Of these probes, 2,000 flanked but did not cover an SNP. Therefore, our test set comprised an average of 7 probes per SNP, which is the same probe density as the genome-wide tiling array used in our initial study (4). We observed an average true positive rate of 92.3%. Over 90% of the SNPs were detected with a log10 likelihood score greater than 2 and the magnitude of likelihoods ranged to values over 100 (red line in Fig. 4). These results held for data from four independent hybridization experiments (Fig. S7A).

Fig. 4.
SNPScanner performance using ratiometric data from an isothermal microarray. Using an average of 7 probes per SNP site, the SNPScanner algorithm accurately predicts 93% of haploid SNPs, with a log10 likelihood value greater than 0 (red line). Decreasing ...

We performed the same holdout procedure in which we applied the trained algorithm to random selections of 1,000 known SNP sites from a self-self hybridization. From 10 tests of 1,000 sites, we found one example of a log10 likelihood value greater than 0, indicating that our false positive rate is of the order 10−4. Using a threshold log10 likelihood value of 2 results in 0 false positives and >90% true positives.

We sought to determine the required number of probes for accurate detection of SNPs by excluding subsets of probe data from the test set of SNPs. For this purpose, the training set contained the full complement of probe information. As expected, reducing the number of independent measurements reduced the magnitude of the total likelihood values (Fig. 4). However, we found that we were still able to detect over 90% of test SNPs using as few as two probes per SNP. To assess the effect of the placement of SNPs within probes on the detection quality, we constrained mismatched sites for test SNPs to increasingly terminal regions of probes. We found that a significant decrease in the fraction of SNPs detected only occurs once mismatches are constrained to the outer 30th percentile of probes (83% true positives; 836/1,000 SNPs predicted; see Fig. 4 Inset).

Previously, our attempts to detect heterozygous SNPs with the SNPScanner algorithm using a tiling array with a fixed probe length of 25 nucleotides had proven unsuccessful. We investigated whether the increased specificity of Tm-matched probes makes it feasible to detect the presence of heterozygous SNPs. We performed the same test procedure by withholding data for 1,000 SNPs and training the algorithm with the rest of the data. We were able to predict an average of 829.5/1,000 SNPs from 10 independent tests each from four independent hybridizations. Likelihood scores were much smaller in magnitude than haploid SNP prediction (Fig. S7B), consistent with the decreased difference in log2 ratio between expected polymorphic and nonpolymorphic values. Whereas over 80% of heterozygous SNPs are predicted from a single hybridization, the false positive rate when the same method is applied to self-self hybridization data is around 8% (Fig. 5). The false positive rate can be reduced by imposing higher cutoffs: Using a cutoff score of 2 results in 75% true positive calls with a 2.3% false positive rate. Although the false positive rate is prohibitively high on a genome-wide scale, the use of additional heuristic criteria to filter SNP calls such as those used in our original report of SNPScanner (4) can potentially reduce the total number of SNP calls to a more manageable number when applied on a whole-genome scale.

Fig. 5.
The SNPScanner algorithm is able to correctly predict the presence of 83% heterozygous SNPs, with a 7.2% false positive rate. The number of false positives rapidly decreases with increasing log10 likelihood cutoffs (numerical values adjacent to points). ...

Discussion

We have discovered that the sensitivity of a DNA probe to a single mismatch is maximized when hybridization is performed at a temperature ≈2–5 °C higher than the probe Tm. We used this discovery to guide the design and construction of an isothermal probe design in which probe length is varied between 16 and 35 nucleotides to ensure a homogeneous melting temperature of duplex DNA. We have shown that this array design universally increases the sensitivity of DNA probes, enabling accurate SNP detection in haploid and heterozygous DNA samples.

Our study is not the first to make use of isothermal probe designs. Previously, mutation detection using isothermal microarrays has been attempted in Helicobacter pylori (5), Escherichia coli (21), and Plasmodium falciparum (22); however, these studies have been characterized by high false negative rates (22, 23). The findings from this study make clear why that is the case; first, for comprehensive mutation detection it is essential that probes overlap, as the ability to detect mismatches that occur at the termini of probes is poor, and second, it is essential to consider the relationship between probe Tm and hybridization temperature. Previous studies (22) using “isothermal” probe designs have allowed for a far greater distribution of probe Tms (60–80 °C), and hybridization has been performed at a temperature (42 °C) far from the optimal relationship discovered in this study.

The thermodynamics of duplex formation are affected by other factors, including salt concentration and the presence of denaturants such as formamide or urea. Therefore, it should be noted that the details of the relationship identified in our study might only apply to the specific composition of hybridization solution used. We expect, however, that the general relationship should hold, although its refinement of the optimal conditions may have to be empirically determined for other buffer compositions.

As well as identifying the appropriate relationship between probe Tm and hybridization temperature, we discovered asymmetries in the effect of mismatches. The most extreme of these is the effect of mismatches with C, which is greatest when the C occurs in the probe. This discovery has practical implications for designing microarrays that interrogate double-stranded DNA for applications such as genotyping: Where possible, a probe containing a C should be preferred over a probe containing a G.

Our isothermal microarray was designed with a target Tm of 57 °C and provided best discrimination when DNA samples were hybridized at 60 °C. Although it remains untested, it seems probable that designing microarrays with a higher target probe Tm and hybridizing at temperatures 2–5 °C higher should provide equal sensitivity. This has the advantage of allowing the design of longer probes, which increases their specificity in a genomic context. Hybridization temperatures above 65 °C are generally avoided due to limitations of standard hybridization ovens and, thus, this is likely to be the upper bound. Further enhancements of stringency by using denaturants may make it possible to increase sensitivity without increasing the hybridization temperature.

Design Guidelines for Isothermal Microarrays.

The design and experimental guidelines derived from this study that are relevant to either genotyping or mutation detection microarrays can be summarized as follows:

  1. Design probes with a target Tm of 57 °C and perform hybridization experiments at 60–62 °C.
  2. Exclude probes that are shorter than 19 bp or longer than 30 bp.
  3. When assaying double-stranded DNA for genotyping or other applications, use the relevant strand such that:
    1. C, not G, occurs in the probe.
    2. An A-C mismatch is formed instead of T-G.
    3. A T-C mismatch is formed instead of A-G.
  4. For mutation detection arrays, overlap probes so that every nucleotide position falls within the inner 70th percentile of at least one probe.

These rules should be employed in conjunction with standard probe design rules including the use of unique sequences and an absence of repetitive sequences and sequences with predicted secondary structures (11). Clearly, for genotyping microarrays, inclusion of isothermal probes perfectly complementary to each allele for a given SNP will ensure high-confidence genotyping. Using this approach, cross-hybridization of the two alleles will be minimized, enabling accurate determination of the proportion of each allele in the sample for applications such as bulk-segregant mapping and allele-specific expression. It is possible that using these design guidelines will also improve the accuracy of quantification of copy-number variation and gene expression, as cross-hybridization to off-target DNA should be greatly reduced.

Comprehensive mutation detection using microarrays enables the global analysis of large numbers of samples to study intraspecific variation (24), the products of evolution experiments (25), and genetic selections (26). Dense SNP genotyping has enabled high-resolution global studies of recombination (27) and allele-specific expression (7). Although it is conventional to believe that high-throughput sequencing will overtake DNA microarrays for all applications (28, 29), we believe that optimized microarrays designed following our guidelines will find many applications. One reason may be cost, but others include the possibility of accurate determination of allele frequencies in mixed-DNA samples for applications such as bulk-segregant mapping (8) and allele-specific expression. These methods and others that require quantitative allele-specific information should be greatly enhanced by the optimized design parameters identified in this study.

Methods

Microarray Design and Manufacture.

Probes for DNA microarrays were designed complementary to genomic loci containing the 24,549 SNPs that differ between the S288c (reference) and RM11-1a (nonreference) genomes and are spaced at least 25 nucleotides apart. Probes were tiled across each SNP with their position relative to the SNP systematically varied. For each SNP, two flanking probes were designed that flank the SNP. To design isothermal microarrays, we used custom scripts to calculate Tms using NN parameters (17). Stilts of either 6 or 10 monomeric dT were added to each probe.

Hybridization Conditions.

Genomic DNA was fragmented using a sonicator and labeled with Cy3 or Cy5 using random primed Klenow enzyme labeling at 24 °C, resulting in labeled fragments of ≈100 bases. For initial experiments using test microarrays we used 2000 ng, but all subsequent isothermal array hybridizations were performed using 200 ng of each labeled DNA sample corresponding to ≈50 amoles of DNA. Each microarray feature is about 60 amoles. As there are approximately five features competing for each target, the molar ratio of probe:target is ≈3:1. Samples were cohybridized in Agilent Hi-rpm 2× hybridization buffer, with a final concentration of 750 mM Li+. Microarrays were hybridized at the specified temperature for 16 h. Arrays were washed with a low-stringency buffer followed by a high-stringency buffer and finally by immersion in acetonitrile. Microarrays were scanned using an Agilent DNA microarray scanner at 5 μm pixel size using the XDR setting.

Probe Melting Temperature Calculations.

Tm was calculated using the relationship Tm = ΔH° × 1000/(ΔS° + R × ln(CT/x)) − 273.15. For enthalpic calculations, we used the NN parameters of ref. 18 and then computed Tm using R = 1.9872 cal/Kmol, x = 4, and a strand concentration of 0.6 × 10−12 M.

Data Processing.

Microarrays were normalized using the set of ~48,000 probes that targeted identical sequences in the reference and nonreference genomes. A linear-lowess normalization method implemented in the Agilent Feature Extractor software was used.

SNPScanner Algorithm.

The SNPScanner algorithm was implemented in R. For each probe length, we modeled the log2 ratio for a SNP at each site in each probe as An external file that holds a picture, illustration, etc.
Object name is pnas.0913883107i1.jpg

The coefficients are the position of the mismatch in the probe (α), the %GC of the probe, and the identity of the base in the probe (A, C, T, or G). These parameters differ from those used in our original implementation of the SNPScanner algorithm, in which we included the triplet sequence at each site and the intensity measure at the corresponding mismatched probe on the Affymetrix tiling microarray (4). We partitioned the data according to probe length and applied this same model for each subset of data. We did not include interaction terms.

Supplementary Material

Supporting Information:

Acknowledgments

Research was supported by an NIH research grant (GM046406) and the NIGMS Center for Quantitative Biology (GM071508). All microarray data have been deposited in GEO under series number GSE19319.

Footnotes

Conflict of interest statement: B.C., D.B.G. and L.B. are, as indicated, Agilent employees. They performed research as part of this employment.

Database deposition: The sequence reported in this paper has been deposited in the Gene Expression Omnibus (GEO) at NCBI under series number GSE19319.

This article contains supporting information online at www.pnas.org/cgi/content/full/0913883107/DCSupplemental.

References

1. Gresham D, Dunham MJ, Botstein D. Comparing whole genomes using DNA microarrays. Nat Rev Genet. 2008;9:291–302. [PubMed]
2. Lipshutz RJ, et al. Using oligonucleotide probe arrays to access genetic diversity. Biotechniques. 1995;19:442–447. [PubMed]
3. Winzeler EA, et al. Direct allelic variation scanning of the yeast genome. Science. 1998;281:1194–1197. [PubMed]
4. Gresham D, et al. Genome-wide detection of polymorphisms at nucleotide resolution with a single DNA microarray. Science. 2006;311:1932–1936. [PubMed]
5. Albert TJ, et al. Mutation discovery in bacterial genomes: Metronidazole resistance in Helicobacter pylori. Nat Methods. 2005;2:951–953. [PubMed]
6. Wong CW, et al. Tracking the evolution of the SARS coronavirus using high-throughput, high-density resequencing arrays. Genome Res. 2004;14:398–405. [PMC free article] [PubMed]
7. Gagneur J, et al. Genome-wide allele- and strand-specific expression profiling. Mol Syst Biol. 2009;5:274. [PMC free article] [PubMed]
8. Borevitz JO, et al. Large-scale identification of single-feature polymorphisms in complex genomes. Genome Res. 2003;13:513–523. [PMC free article] [PubMed]
9. Segrè AV, Murray AW, Leu JY. High-resolution mutation mapping reveals parallel experimental evolution in yeast. PLoS Biol. 2006;4:e256. [PMC free article] [PubMed]
10. Brauer MJ, Christianson CM, Pai DA, Dunham MJ. Mapping novel traits by array-assisted bulk segregant analysis in Saccharomyces cerevisiae. Genetics. 2006;173:1813–1816. [PMC free article] [PubMed]
11. Hu G, Llinás M, Li J, Preiser PR, Bozdech Z. Selection of long oligonucleotides for gene expression microarrays using weighted rank-sum strategy. BMC Bioinformatics. 2007;8:350. [PMC free article] [PubMed]
12. Xu Q, Schlabach MR, Hannon GJ, Elledge SJ. Design of 240,000 orthogonal 25mer DNA barcode probes. Proc Natl Acad Sci USA. 2009;106:2289–2294. [PMC free article] [PubMed]
13. Alemayehu S, et al. Influence of buffer species on the thermodynamics of short DNA duplex melting: Sodium phosphate versus sodium cacodylate. J Phys Chem B. 2009;113:2578–2586. [PubMed]
14. Fish DJ, et al. DNA multiplex hybridization on microarrays and thermodynamic stability in solution: A direct comparison. Nucleic Acids Res. 2007;35:7197–7208. [PMC free article] [PubMed]
15. Fish DJ, Horne MT, Searles RP, Brewood GP, Benight AS. Multiplex SNP discrimination. Biophys J. 2007;92:L89–L91. [PMC free article] [PubMed]
16. Horne MT, Fish DJ, Benight AS. Statistical thermodynamics and kinetics of DNA multiplex hybridization reactions. Biophys J. 2006;91:4133–4153. [PMC free article] [PubMed]
17. SantaLucia J., Jr A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc Natl Acad Sci USA. 1998;95:1460–1465. [PMC free article] [PubMed]
18. SantaLucia J, Jr, Allawi HT, Seneviratne PA. Improved nearest-neighbor parameters for predicting DNA duplex stability. Biochemistry. 1996;35:3555–3562. [PubMed]
19. SantaLucia J, Jr, Hicks D. The thermodynamics of DNA structural motifs. Annu Rev Biophys Biomol Struct. 2004;33:415–440. [PubMed]
20. Ronald J, et al. Simultaneous genotyping, gene-expression measurement, and detection of allele-specific expression with oligonucleotide arrays. Genome Res. 2005;15:284–291. [PMC free article] [PubMed]
21. Herring CD, et al. Comparative genome sequencing of Escherichia coli allows observation of bacterial evolution on a laboratory timescale. Nat Genet. 2006;38:1406–1412. [PubMed]
22. Tan JC, et al. Optimizing comparative genomic hybridization probes for genotyping and SNP detection in Plasmodium falciparum. Genomics. 2009;93:543–550. [PMC free article] [PubMed]
23. Herring CD, Palsson BØ. An evaluation of comparative genome sequencing (CGS) by comparing two previously-sequenced bacterial genomes. BMC Genomics. 2007;8:274. [PMC free article] [PubMed]
24. Schacherer J, Shapiro JA, Ruderfer DM, Kruglyak L. Comprehensive polymorphism survey elucidates population structure of Saccharomyces cerevisiae. Nature. 2009;458:342–345. [PMC free article] [PubMed]
25. Gresham D, et al. The repertoire and dynamics of evolutionary adaptations to controlled nutrient-limited environments in yeast. PLoS Genet. 2008;4:e1000303. [PMC free article] [PubMed]
26. Ho CH, et al. A molecular barcoded yeast ORF library enables mode-of-action analysis of bioactive compounds. Nat Biotechnol. 2009;27:369–377. [PMC free article] [PubMed]
27. Mancera E, Bourgon R, Brozzi A, Huber W, Steinmetz LM. High-resolution mapping of meiotic crossovers and non-crossovers in yeast. Nature. 2008;454:479–485. [PMC free article] [PubMed]
28. Kahvejian A, Quackenbush J, Thompson JF. What would you do if you could sequence everything? Nat Biotechnol. 2008;26:1125–1133. [PubMed]
29. Shendure J. The beginning of the end for microarrays? Nat Methods. 2008;5:585–587. [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...