• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. 2005; 33(17): e142.
Published online Sep 30, 2005. doi:  10.1093/nar/gni142
PMCID: PMC1240117

PPC: an algorithm for accurate estimation of SNP allele frequencies in small equimolar pools of DNA using data from high density microarrays

Abstract

Robust estimation of allele frequencies in pools of DNA has the potential to reduce genotyping costs and/or increase the number of individuals contributing to a study where hundreds of thousands of genetic markers need to be genotyped in very large populations sample sets, such as genome wide association studies. In order to make accurate allele frequency estimations from pooled samples a correction for unequal allele representation must be applied. We have developed the polynomial based probe specific correction (PPC) which is a novel correction algorithm for accurate estimation of allele frequencies in data from high-density microarrays. This algorithm was validated through comparison of allele frequencies from a set of 10 individually genotyped DNA's and frequencies estimated from pools of these 10 DNAs using GeneChip 10K Mapping Xba 131 arrays. Our results demonstrate that when using the PPC to correct for allelic biases the accuracy of the allele frequency estimates increases dramatically.

INTRODUCTION

Mapping the genetic basis underlying common multifactorial diseases such as cancer through whole genome association studies has attracted much attention in recent years. The discovery and characterization of millions of single nucleotide polymorphism (SNP) markers throughout the human genome (1) and the development of genotyping technology means that genome wide association studies are becoming technically possible. Estimates hold that whole genome association studies using SNPs require genotyping of hundreds of thousands to a million markers in large groups of cases and controls (2,3). The number of cases and controls needed varies depending on disease of interest and the level of linkage disequilibrium in the study population but to achieve adequate power, sample sizes will be counted in the hundreds or thousands even in the most favorable scenarios. While genotyping cost is decreasing, the high costs associated with genotyping large numbers of individuals for this large number of markers may prove to be rate limiting for this type of study (4,5). As a means of reducing the effort and costs involved, estimating allele frequencies in pools of equimolar amounts of DNA have been explored as an alternative to individual genotyping (6,7). This strategy has successfully been used in a number of candidate gene case-control studies using both SNPs (8,9) and microsatellites (10). Many different strategies for pooling have been suggested (11) but most researchers view pooling in combination with whole genome analysis as a first screening tool to identify markers with potential interest that can be chosen for subsequent individual genotyping (4,12,13). The simplest strategy is the two-pool design where all cases are collected in the first pool and all controls are collected in a second pool but other more elaborate pooling schemes have also been suggested. For instance, creating sets of sub-pools allows stratification, not only on the basis of the disease trait but also on secondary and tertiary traits as well. This might for instance capture effects of environmental factors that are known to affect the disease in question (14).

High density microarrays capable of parallel genotyping of tens to hundreds of thousands of SNPs currently provide the strongest candidate technology for large scale genome wide genotyping (1517). Recently, four loci associated with mild mental impairment were identified in the first pool based genome wide screen using GeneChip 10K Mapping Xba 131 arrays (13). Although these microarrays are primarily designed for individual genotyping, we and others (9,1821) have successfully explored the possibility of using the quantitative nature of the signal hybridization intensities from such microarrays for making allele frequency estimations in pooled DNA. To improve the accuracy of allele frequency estimates from pooled DNA, corrections have to be made to account for biases in allelic representation (22,23). This allelic representation bias is mainly caused by allele specific preferential amplification of the genomic DNA and/or differences in hybridization properties for the different probe sequences. The most common correction method is k-correction which uses a correction factor k that is empirically derived from the signal intensity pattern of heterozygote individuals (24,25). k-correction was recently adapted for high-density microarray data resulting in a substantial improvement in accuracy of the allele frequency estimates (20). Here, we describe a novel correction algorithm which we call the polynomial based probe specific correction (PPC) and we show that it further increases the accuracy of the allele frequency estimates compared with previously described algorithms. PPC is based on a probe pair specific hybridization profile that was empirically derived from studying unique probe responses in a reference set of 26 GeneChip 10K Mapping Xba 131 arrays. The algorithm was subsequently utilized to estimate the allele frequencies in a pool of 10 individuals that had been genotyped previously using GeneChip 10K Mapping Xba 131 arrays.

MATERIALS AND METHODS

DNA samples

The DNA samples are fully described in L. M. FitzGerald, J. Stankovich, G. Price, J. Brohede, S. Quinn, R. Thomson, D. Challis, M. Challis, C. R. Wilkinson, J. Slavin, A. Banks, K. Hazelwood, D. Mackey, G. N. Hannan, T. Dwyer, J. L. Dickinson, D. Venter and J. D. McKay (manuscript submitted). Written informed consent was obtained from all participating individuals and ethics approval was obtained from the Southern Tasmanian Human Research Ethics Committee.

Genotyping

Genotyping of individual and DNA pools were made using the GeneChip 10K Mapping Xba 131 assay according to the GeneChip Mapping Assay Manual (Affymetrix) and all reagents were supplied by the manufacturer if not stated otherwise. Briefly, 250 ng of DNA was digested by Xba I (New England Biolabs) and Xba adaptors were subsequently ligated to the ends of all fragments using T4 DNA ligase (New England Biolabs). This was used as template in a PCR amplification using AmpliTaqGold (Applied Biosystems) and a single primer complementary to the adaptor sequence. PCR products were purified from excess primer and salts by QIAquick spin-columns (QIAGEN) and a 20 µg aliquot was fragmented using DNase I. An aliquot of the fragmented DNA was separated and visualized in a 2% agarose gel in 1× TBE buffer to ensure that the bulk of the product had been properly fragmented to a size <200 bp. The fragmented samples were end-labeled with biotin using terminal deoxynucleotidyl transferase before each sample was allowed to hybridize to a GeneChip 10K Mapping Xba 131 array for 16 h.

Following hybridization the arrays were washed and stained using an Affymetrix Fluidics Station 450. Most stringent wash was 0.6× SSPE, 0.01% Tween-20 at 45°C and the samples were stained with R-phycoerythrin (Molecular Probes). Imaging of the microarrays was performed using either a GeneArray (Agilent) or a GCS3000 (Affymetrix) high-resolution scanner. Genotype calls and probe intensity data were extracted with the GeneChip DNA Analysis Software (GDAS) (Affymetrix) using default parameters.

DNA pooling

Pools were constructed from equal amounts of DNA from 10 individuals that had been genotyped previously by the GeneChip 10K Mapping Xba 131 assay. To ensure that equimolar amounts of DNA were pooled, accurate quantifications were made using PicoGreen assay (Molecular Probes) against a standard curve of λDNA.

To assess the variability in the pool construction, DNA quantification, dilution, pooling and GeneChip assay steps were performed independently three times creating pooled samples or ‘true replicas’: p10_rep1, p10_rep2 and p10_rep3. To capture the variation introduced by the GeneChip Mapping assay alone, technical replicas were made by independently amplifying and hybridizing the pooled DNA's of p10_rep1 two additional times (creating pooled samples or ‘technical replicas’: p10_rep1_tech_2 and p10_rep1_tech_3). For the purpose of evaluating the results a measure of accuracy was defined as the absolute difference between the allele frequency deduced from the individual genotyping and the corresponding estimate made using pooled DNA, averaged over all available SNPs.

Rationale for algorithm

The basis for the PPC algorithm is to correct the raw signal intensity data in a probe pair specific manner. The algorithm only utilizes the signal intensity values from the 20 probes that constitute perfect matches for every SNP that can be interrogated on a GeneChip 10K Mapping Xba 131 array. These 20 probes are divided into 10 probe pairs where each pair consists of one probe that perfectly matches the A allele (PMA) and one probe that perfectly matches the B allele (PMB). The difference between the 10 pairs is which strand (sense versus antisense) is interrogated and the position of the polymorphism relative to the centre of the 25mer that constitutes a probe (15). If considering the signal intensity from one probe pair j for a given SNP let:

xj=Aj(Aj+Bj),
1

where Aj and Bj are the observed signal intensity values for PMA and PMB, respectively. However, the hybridization affinities of any probe pair will be unique owing to sequence specific hybridization properties. Variation in the amplification efficiency between the two different alleles and background hybridization are two additional factors that must be taken into account to make an accurate allele frequency estimate. Together this suggests that all probe pairs have distinctly different hybridization profiles which is our rationale for making a unique correction for each probe pair. In mathematical terms the hybridization profile for any given probe pair could be best described by a second-degree polynomial. In order to obtain the unique hybridization profile for the majority of probe pairs on the GeneChip 10K Mapping Xba 131 array we capitalized on a set of 26 reference microarrays that had been used for individual genotyping. For a second-degree polynomial to be derived for a particular probe pair the minimum requirement was that at least one individual homozygous for the A allele (AA), one individual homozygous for the B allele (BB) and one heterozygote (AB) individual were present among the 26 reference microarrays. The true allele frequency (defined to be 1.0 for genotype AA, 0.5 for genotype AB and 0.0 for genotype BB) was plotted against the corresponding allele frequencies estimates made using Equation 1. A script designed in R derived the coefficients for the second-degree polynomial that describes the relationship between the true allele frequency and the estimated allele frequency. In this way the second-degree polynomial coefficients for 80 660 probe pairs (10 for each of 8066 SNPs) were successfully obtained.

Allele frequency estimates in pools of DNA

To estimate the allele frequencies from the microarrays hybridized with pooled DNA, signal intensity values from all probes were extracted by the export function of GDAS 2.0 (Affymetrix). Using only the perfect match probes, the frequency of allele A was estimated for each probe pair individually by Equation 1 followed by correction by its unique second-degree polynomial:

f(Aj)=β0+β1×xj+β2×xj2.
2

Then a median value of the 10 estimates corresponding to one particular SNP was used to represent the allele frequency for that SNP.

RESULTS

Validation of PPC

Equimolar amounts of DNA from 10 individuals were pooled before being assayed on a GeneChip 10K Mapping Xba 131 array according to manufacturer's instruction. Signal intensity data was extracted and allele frequencies were estimated according to the algorithm described above. All measurements of DNA concentration, all pooling procedures and all GeneChip assays were replicated, independently, three times in order to study the variation introduced by the pooling procedure. To study the assay variation, technical replicas were made from one of the pools by repeating the GeneChip assays and hybridizations three times for one of the pools of DNA. In order to evaluate the accuracy of the allele frequency estimations from these pooling experiments all 10 DNA samples were individually genotyped. This provided a more accurate measure of the alleles that made up the pools than using population frequencies. Using only SNPs for which genotype information was available for all 10 individuals the expected allele frequencies in the pools could be deduced for 8180 SNPs. Accuracy was defined as the absolute difference between the allele frequency estimate and the allele frequency deduced from individual genotyping. An average accuracy was calculated for each replica and the results was summarized in Table 1. Computer scripts for deriving the second-degree polynomials and the subsequent allele frequency estimation were written in Perl (http://www.perl.org) or R (http://www.r-project.org<http://www.perl.org>) and are available at http://www.bioinformatics.csiro.au/publications.shtml.

Table 1
Accuracy measures for allele frequency estimates using PPC

True replicas versus technical replicas

For the pooled sample GeneChip replicas, the average accuracy was 0.054 and ranged from 0.043 to 0.067 (Table 1) equivalent to or better than that other reported by other published pooling studies (9,19,20). We also observed an average correlation (r2) of 0.904 for the true replicas and 0.959 for the technical replicas (Table 2). The range of accuracy for the three true replicas (0.048, 0.043 and 0.062) was no different than that for the technical replicas (0.048, 0.053 and 0.067). However when creating a replica average, by averaging the individual SNP by SNP frequency estimates from each of the ‘true’ and ‘technical’ replicas before comparing it with the corresponding deduced allele frequency, there were different responses from the true replicas compared with the technical replicas. When averaging in this way the average accuracy increased to 0.029 for the true replicas and to 0.050 for the technical replicas. This increase in accuracy for the true replicas was also reflected in the other parameters shown in Table 1. For instance, the percentages of estimates that differed by >0.1 for the technical replicas 1, 2 and 3 were 10.3, 14.2 and 22.4%, respectively, while the average figure for the technical replicas was 12.1%. Corresponding percentages for the true replicas were within the same range (10.3, 7.6 and 19.3% for replicas 1, 2 and 3, respectively) but showed an exceptional improvement for the replica average, that is 1.8%. This effect was probably largely owing to the canceling out of variation in the pooling procedure (for instance pipetting inaccuracies) for the true replicas while the corresponding variation in the technical replicas was only introduced once and therefore no canceling out will occur. Table 3 shows the accuracy in relation to the deduced allele frequency for the average of the true replicas.

Table 2
r2-Values from allele frequency estimates in all pools of 10 individuals
Table 3
Average accuracy in different allele frequency intervals for the average replica sample p10average true replicas

Comparison with previously described algorithms

To further evaluate the algorithm described here scripts in Perl and R were designed to estimate allele frequencies from the three true p10 replicas described above using previously described algorithms for identical (19,20) or very similar types of microarrays (9). All comparisons shown in Table 4 and Figure 1 are based on averaging the individual estimates of the three true p10 replicas as described above.

Figure 1
The figure shows the relationship between the allele frequency deduced from individual genotyping and the allele frequency estimated with (A) the PPC described here, (B) the algorithm described in (20), (C) the algorithm described in (19) and (D) the ...
Table 4
Comparisons of the accuracy using different algorithms

DISCUSSION

Estimating allele frequencies from pooled data, or allelotyping (13), has been suggested as an alternative to individual genotyping in large scale projects as a way to bring down the costs. Although pooling can be used for directly identifying disease associated markers most scientists view pooling as a first screen in order to identify markers for a subsequent targeted genotyping on an individual level. To demonstrate the utility of PPC on the estimation of allele frequencies in DNA pools, our study used pools of 10 individuals with three true replicas, resulting in a one-third study wide saving compared with individual genotyping and only a modest budget benefit. Optimal pool size for allelotyping will ultimately depend on a cost/benefit analysis specific to the individual study design, allelotyping methodology and budget restrains. Pfeiffer et al. (26) explores pool sizes of 3–10 while Barrat et al. (27) advocates an optimal pool size of 50 individuals and Le Hellard et al. (28) argues for pool sizes of several hundred individuals. While there appears to be clear benefits in the pool sizes described here, the effect a larger pool size has on the average accuracy when allelotyping with the PPC remains to be tested. On the basis of our results, it appeared that multiple true replicas made a significant difference to the allele frequency estimation accuracy, therefore, when considering the cost benefits of a pooling approach we suggest retaining adequate replicas for each pool regardless of size.

Differences in SNP hybridization characteristics make it difficult to make accurate allele frequency estimations in high-density microarray data. We have tackled this problem by empirically deriving a specific correction formula for each probe pair. The underlying mathematic relationship for the probe response has been accurately described with a second-degree polynomial. During the course of this project logistical regression and robust regression using Huber's M estimator (29) were also assessed as allelotyping algorithms and although they also performed fairly well in preliminary studies they were always outperformed by the second-degree polynomial which is why they were not explored further (data not shown).

The PPC is an ideal correction formula for standardized genotyping platforms like the Affymetrix GeneChip Mapping array system where large sets of reference microarrays can easily be gathered for the purpose of deriving the probe specific hybridization profiles. In addition to standardized microarrays, the Affymetrix GeneChip platform employs standardized hybridization, washing and scanning equipment which further works to keep the hybridization profile constant between experiments. Our results show that this technology platform has a very high level of consistency as reflected in the high r2 values for the technical replicas (Table 2). Moreover, our results from averaging of replicas highlight the importance of making true replicas rather than technical replicas where only the amplification to hybridization steps are replicated. In technical replicas all biases would be re-amplified and would show up on all replica microarrays while random variation introduced in the pooling procedure would work in different directions and therefore increase the accuracy measure for true replicas. Under the conditions described here the accuracy increased from an average of 0.050 for the technical replicas to 0.029 for the true replicas. While this level of accuracy is comparable with most other technologies that have been used in conjunction with pooled DNA (Table 5), the mass-parallelism of high-density microarrays distinguish it as an ideal tool for genome wide genetic scans.

Table 5
Previously described accuracy measures of SNP estimates in pooled DNA using non-array based technologies

When comparing the results from the previously published algorithms it's important to note the algorithm described in (9) was not developed for the GeneChip 10K Mapping Xba 131 arrays as were the other algorithms. The arrays they used in that study were similar in the way they were manufactured and the probes were also 25mers but one main difference is that there were 80 probes per SNP rather than the 40 present on the GeneChip. The 40 extra probes were an additional set of mismatch probes that would make the estimates of background signal more accurate for the chips used in (9) compared with the GeneChip 10K Mapping Xba 131 arrays used in this study. Since the performance of their algorithm was highly dependent on the subtraction of background it is probable that the estimates presented here (using the less comprehensive background estimate from our GeneChip 10K Mapping Xba 131 array data) would have been more accurate if we had used microarrays similar in design to those used in (9). Furthermore, they used their algorithm to estimate differences in allele frequency between cases and controls rather than for accurate allelotyping. While there are differences between the microarrays used in this article and the ones used in (9) we believe that they are similar enough to make an interesting comparison with the other algorithms in Table 4.

The algorithm described in (19) was based on the relative allele signal (RAS) value calculated by genotype scoring using the Modified Partitioning Around Medoids (MPAM) algorithm implemented the GDAS software. These authors stress the difference between cases and controls as the main outcome rather than accurate estimates for each DNA pool. Simpson et al. (20) developed this algorithm further and implemented k-correction. The k-correction uses the pattern of heterozygotes to correct the average RAS values for allelic biases. Their paper also showed the need to correct for allelic biases particularly for rare alleles. Meaburn et al. (21) reported that the accuracy increased from 0.077 to 0.036 when a k-correction was applied to GeneChip 10K Mapping Xba 131 data in a subset of 104 SNPs in a pool consisting of DNAs from 100 individuals. This was in fairly good agreement with the corresponding figures in this article (0.070 increased to 0.053 with k-correction). When applying the probe specific correction equations based on second-degree polynomials described in this paper the accuracy further increased to 0.029. Moreover, allelotyping using PPC occasionally resulted in estimates outside the 0–1 range which is in contrast to the algorithms that were based on the RAS values. When examining the results from p10average true replicas the extent of this was very low with only 16 markers (0.28%) having estimates outside the 0–1 range further supporting the high accuracy of PPC.

While the algorithms described by Butcher et al. (19) and Hinds et al. (9) have an advantage in that no prior knowledge about the probe response is required, the lower levels of accuracy might be limiting their usefulness. In contrast, both the PPC and the k-correction produce highly accurate estimates but depend on prior knowledge of the behavior of the probe specific hybridization profile which limits the number of markers available for allelotyping. In particular the PPC algorithm was affected by this since a reliable second-degree polynomial could only be derived after assessing at least one individual homozygous for the reference allele, one heterozygote and one homozygote for the alternative allele. This could easily be overcome by increasing the size of databases from where the probe specific hybridization profiles was derived. This has for instance been addressed by others (20) who have designed a public database where anyone can deposit GeneChip data to constantly improve and access the k-correction coefficients.

A GeneChip 10K Mapping Xba 131 arrays suffices for performing linkage analysis in a family but to perform association studies in a complex population, we will need 100 000, or maybe even 500 000 to a million SNPs (3). Even given that the cost of genotyping is coming down to <1 cent per SNP, the overall cost for an association study involving hundreds of cases and controls will be very expensive. It will be a great advantage to have new algorithms developed to reduce cost while not affecting the power to locate disease genes in study populations, particularly with the recent release of the Affymetrix 500K SNPs arrays.

Acknowledgments

The authors thank Diana Brookes and Glenn Brown for technical assistance. We also wish to thank the Tasmanian Cancer council and Royal Hobart Hospital for their financial assistance to enable sample collection. Funding to pay the Open Access publication charges for this article was provided by CSIRO.

Conflict of interest statement. None declared.

REFERENCES

1. Collins F.S., Guyer M.S., Charkravarti A. Variations on a theme: cataloging human DNA sequence variation. Science. 1997;278:1580–1581. [PubMed]
2. Risch N., Merikangas K. The future of genetic studies of complex human diseases. Science. 1996;273:1516–1517. [PubMed]
3. Kruglyak L. Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nature Genet. 1999;22:139–144. [PubMed]
4. Risch N., Teng J. The relative power of family-based and case–control designs for linkage disequilibrium studies of complex human diseases I. DNA pooling. Genome Res. 1998;8:1273–1288. [PubMed]
5. Pharoah P.D., Dunning A.M., Ponder B.A., Easton D.F. Association studies for finding cancer-susceptibility genetic variants. Nature Rev. Cancer. 2004;4:850–860. [PubMed]
6. Arnheim N., Strange C., Erlich H. Use of pooled DNA samples to detect linkage disequilibrium of polymorphic restriction fragments and human disease: studies of the HLA class II loci. Proc. Natl Acad. Sci. USA. 1985;82:6970–6974. [PMC free article] [PubMed]
7. Michelmore R.W., Paran I., Kesseli R.V. Identification of markers linked to disease-resistance genes by bulked segregant analysis: a rapid method to detect markers in specific genomic regions by using segregating populations. Proc. Natl Acad. Sci. USA. 1991;88:9828–9832. [PMC free article] [PubMed]
8. Butcher L.M., Meaburn E., Dale P.S., Sham P., Schalkwyk L.C., Craig I.W., Plomin R. Association analysis of mild mental impairment using DNA pooling to screen 432 brain-expressed single-nucleotide polymorphisms. Mol. Psychiatry. 2005;10:384–392. [PubMed]
9. Hinds D.A., Seymour A.B., Durham L.K., Banerjee P., Ballinger D.G., Milos P.M., Cox D.R., Thompson J.F., Frazer K.A. Application of pooled genotyping to scan candidate regions for association with HDL cholesterol levels. Hum. Genomics. 2004;1:421–434. [PMC free article] [PubMed]
10. Kirov G., Williams N., Sham P., Craddock N., Owen M.J. Pooled genotyping of microsatellite markers in parent-offspring trios. Genome Res. 2000;10:105–115. [PMC free article] [PubMed]
11. Sham P., Bader J.S., Craig I., O'Donovan M., Owen M. DNA Pooling: a tool for large-scale association studies. Nature Rev. Genet. 2002;3:862–871. [PubMed]
12. König I.R., Ziegler A. Analysis of SNPs in pooled DNA: a decision theoretic model. Genet. Epidemiol. 2004;26:31–43. [PubMed]
13. Butcher L.M., Meaburn E., Knight J., Sham P.C., Schalkwyk L.C., Craig I.W., Plomin R. SNPs, microarrays and pooled DNA: identification of four loci associated with mild mental impairment in a sample of 6000 children. Hum. Mol. Genet. 2005;14:1315–1325. [PubMed]
14. Law G.R., Rollinson S., Feltbower R., Allan J.M., Morgan G.J., Roman E. Application of DNA pooling to large studies of disease. Stat. Med. 2004;23:3841–3850. [PubMed]
15. Matsuzaki H., Loi H., Dong S., Tsai Y.Y., Fang J., Law J., Di X., Liu W.M., Yang G., Liu G., et al. Parallel genotyping of over 10 000 SNPs using a one-primer assay on a high-density oligonucleotide array. Genome Res. 2004;14:414–425. [PMC free article] [PubMed]
16. Kennedy G.C., Matsuzaki H., Dong S., Liu W.M., Huang J., Liu G., Su X., Cao M., Chen W., Zhang J., et al. Large-scale genotyping of complex DNA. Nat. Biotechnol. 2003;21:1233–1237. [PubMed]
17. Di X., Matsuzaki H., Webster T.A., Hubbell E., Liu G., Dong S., Bartell D., Huang J., Chiles R., Yang G., et al. Dynamic model based algorithms for screening and genotyping over 100K SNPs on oligonucleotide microarrays. Bioinformatics. 2005;21:1958–1963. [PubMed]
18. Uhl G.R., Liu Q.R., Walther D., Hess J., Naiman D. Polysubstance abuse-vulnerability genes: genome scans for association, using 1004 subjects and 1494 single-nucleotide polymorphisms. Am. J. Hum. Genet. 2001;69:1290–1300. [PMC free article] [PubMed]
19. Butcher L.M., Meaburn E., Liu L., Fernandes C., Hill L., Al-Chalabi A., Plomin R., Schalkwyk L., Craig I.W. Genotyping pooled DNA on microarrays: a systematic genome screen of thousands of SNPs in large samples to detect QTLs for complex traits. Behav. Genet. 2004;34:549–555. [PubMed]
20. Simpson C.L., Knight J., Butcher L.M., Hansen V.K., Meaburn E., Schalkwyk L.C., Craig I.W., Powell J.F., Sham P.C., Al-Chalabi A. A central resource for accurate allele frequency estimation from pooled DNA genotyped on DNA microarrays. Nucleic Acids Res. 2005;33:e25. [PMC free article] [PubMed]
21. Meaburn E., Butcher L.M., Liu L., Fernandes C., Hansen V., Al-Chalabi A., Plomin R., Craig I., Schalkwyk L.C. Genotyping DNA pools on microarrays: tackling the QTL problem of large samples and large numbers of SNPs. BMC Genomics. 2005;6:e52. [PMC free article] [PubMed]
22. Barcellos L.F., Klitz W., Field L.L., Tobias R., Bowcock A.M., Wilson R., Nelson M.P., Nagatomi J., Thomson G. Association mapping of disease loci, by use of a pooled DNA genomic screen. Am. J. Hum. Genet. 1997;61:734–747. [PMC free article] [PubMed]
23. Perlin M.W., Lancia G., Ng S.K. Toward fully automated genotyping: genotyping microsatellite markers by deconvolution. Am. J. Hum. Genet. 1995;57:1199–1210. [PMC free article] [PubMed]
24. Hoogendoorn B., Norton N., Kirov G., Williams N., Hamshere M.L., Spurlock G., Austin J., Stephens M.K., Buckland P.R., Owen M.J., et al. Cheap, accurate and rapid allele frequency estimation of single nucleotide polymorphisms by primer extension and DHPLC in DNA pools. Hum. Genet. 2000;107:488–493. [PubMed]
25. Moskvina V., Norton N., Williams N., Holmans P., Owen M., O'donovan M. Streamlined analysis of pooled genotype data in SNP-based association studies. Genet. Epidemiol. 2005;28:273–282. [PubMed]
26. Pfeiffer R.M., Rutter J.L., Gail M.H., Struewing J., Gastwirth J.L. Efficiency of DNA pooling to estimate joint allele frequencies and measure linkage disequilibrium. Genet. Epidemiol. 2002;22:94–102. [PubMed]
27. Barratt B.J., Payne F., Rance H.E., Nutland S., Todd J.A., Clayton D.G. Identification of the sources of error in allele frequency estimations from pooled DNA indicates an optimal experimental design. Ann. Hum. Genet. 2002;66:393–405. [PubMed]
28. Le Hellard S., Ballereau S.J., Visscher P.M., Torrance H.S., Pinson J., Morris S.W., Thomson M.L., Semple C.A., Muir W.J., Blackwood D.H., et al. SNP genotyping on pooled DNAs: comparison of genotyping technologies and a semi automated method for data storage and analysis. Nucleic Acids Res. 2002;30:e74. [PMC free article] [PubMed]
29. Huber P.J. Robust Statistics. Wiley, NY: 1981.
30. Germer S., Holland M.J., Higuchi R. High-throughput SNP allele-frequency determination in pooled DNA samples by kinetic PCR. Genome Res. 2000;10:258–266. [PMC free article] [PubMed]
31. Chen J., Germer S., Higuchi R., Berkowitz G., Godbold J., Wetmur J.G. Kinetic polymerase chain reaction on pooled DNA: a high-throughput, high-efficiency alternative in genetic epidemiological studies. Cancer Epidemiol. Biomarkers Prev. 2002;11:131–136. [PubMed]
32. Lavebratt C., Sengul S., Jansson M., Schalling M. Pyrosequencing-based SNP allele frequency estimation in DNA pools. Hum. Mutat. 2004;23:92–97. [PubMed]
33. Wasson J., Skolnick G., Love-Gregory L., Permutt M.A. Assessing allele frequencies of single nucleotide polymorphisms in DNA pools by pyrosequencing technology. Biotechniques. 2002;32:1144–1150. [PubMed]
34. Norton N., Williams N.M., Williams H.J., Spurlock G., Kirov G., Morris D.W., Hoogendoorn B., Owen M.J., O'Donovan M.C. Universal, robust, highly quantitative SNP allele frequency measurement in DNA pools. Hum. Genet. 2002;110:471–478. [PubMed]
35. Giordano M., Mellai M., Hoogendoorn B., Momigliano-Richiardi P. Determination of SNP allele frequencies in pooled DNAs by primer extension genotyping and denaturing high-performance liquid chromatography. J. Biochem. Biophys. Methods. 2001;47:101–110. [PubMed]
36. Downes K., Barratt B.J., Akan P., Bumpstead S.J., Taylor S.D., Clayton D.G., Deloukas P. SNP allele frequency estimation in DNA pools and variance components analysis. Biotechniques. 2004;36:840–845. [PubMed]
37. Sasaki T., Tahira T., Suzuki A., Higasa K., Kukita Y., Baba S., Hayashi K. Precise estimation of allele frequencies of single-nucleotide polymorphisms by a quantitative SSCP analysis of pooled DNA. Am. J. Hum. Genet. 2001;68:214–218. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats: