- Journal List
- BMC Genet
- v.9; 2008
- PMC2387159

# PGA: power calculator for case-control genetic association analyses

^{1}Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institute of Health, Department of Health and Human Services, Rockville, MD, USA

^{2}Department of Mathematics and Statistics, Concordia University, Montréal, Québec, Canada

^{}Corresponding author.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## Abstract

### Background

Statistical power calculations inform the design and interpretation of genetic association studies, but few programs are tailored to case-control studies of single nucleotide polymorphisms (SNPs) in unrelated subjects.

### Results

We have developed the "Power for Genetic Association analyses" (PGA) package which comprises algorithms and graphical user interfaces for sample size and minimum detectable risk calculations using SNP or haplotype effects under different genetic models and study constrains. The software accounts for linkage disequilibrium and statistical multiple comparisons. The results are presented in graphs or tables and can be printed or exported in standard file formats.

### Conclusion

PGA is user friendly software that can facilitate decision making for association studies of candidate genes, fine-mapping studies, and whole-genome scans. Stand-alone executable files and a Matlab toolbox are available for download at: http://dceg.cancer.gov/bb/tools/pga

## Background

Case-control genetic association studies are increasingly being used in studying the genetic basis of human complex traits [1-3]. Statistical power analyses constitute a key step in the design process of these studies. Power calculations elucidates the actual sample size needed to find a true genotype-phenotype correlation under the study constraints [4]. Indeed, most grants applications for genetic association studies require a power analysis section to justify the research proposal. Alternatively, power analysis can be used to explore possible reasons for equivocal or negative results. Thus, it is an indispensable procedure both for *a priori *and *a posteriori *analyses in genetic association studies.

The principals for power calculation can be found in standard statistical textbooks. Moreover, the scientific literature describes the mathematics of power analyses for a variety of specialized experimental designs [4-6]. Yet, there is limited computer-software to assist scientists in this task [7]. Many commonly used computational tools for genetic studies are oriented towards family-based studies [8-11] and only few have been developed to handle power calculations for case-control studies of single nucleotide polymorphisms (SNPs) in unrelated subjects [12-14]. Since the latter approach is increasingly used, we have developed algorithms and graphical user interfaces (GUIs) to calculate the sample size and the minimum detectable relative risk in genetic case-control studies for dominant, co-dominant, and recessive models of SNPs and SNP haplotypes.

## Implementation

The "Power for Genetic Association Analyses" (PGA) package was developed in Matlab and consists a toolbox of command line functions and three unifying graphical user interfaces (GUIs). Users with a Matlab software can run the three GUIs or the command line functions in Matlab environment. Users without a Matlab license can download and install the compiled versions of the three GUIs that run as stand-alone applications under Windows XP or Vista operating systems.

The program assumes that SNPs are biallelic and in Hardy-Weinberg equilibrium. All statistical tests are two-sided. The GUIs called PGA1 and PGA2 can display up to 9 scenarios simultaneously. Hence, they can be used to identify a robust choice of sample size. The graphs produced by each GUI can be printed or exported as TIF files, and tables of numerical results can be exported as HTML or csv files.

## Results

The GUI called PGA1 provides a computational and graphical interface for the relation between statistical power and sample size for dominant, co-dominant and recessive SNP or haplotype effect (Figure (Figure1A).1A). The genotyped markers can include the causative SNP, or be in linkage disequilibrium (LD) with the causal SNP at a given level. The impact of multiple hypothesis testing can be accomplished by adjusting the effective degrees of freedom (EDF) or the alpha level. For example, in a fine-mapping study of 200 effective tests (see below), the sample size required to detect an overall 2-fold increase in risk (assuming a co-dominant model with 1 df) with 90% power, false positive rate of 5%, disease prevalence of 7%, disease allele frequency of 5%, and assuming a complete LD between the genotyped marker and the causative SNP (r^{2 }= 1.0) is 800 cases and 800 controls (Figure (Figure1A).1A). PGA1 allows one to explore the impact of different parameters. For example, reducing the genotype relative risk from 2-fold to 1.7-fold in the same study, increases the required sample size from 800 to 1400 cases and controls. PGA1 is designed to execute power calculations for haplotype data. For example, using the same parameters in the example above and assuming 12 common haplotypes in an LD block within the region show that the required sample size would be 600 and 1100 cases and controls to attain 90% power for relative risks of 2 and 1.7 respectively (Figure (Figure1A1A).

**Graphical user interfaces for statistical power calculations.**(A) PGA1 – statistical power is calculated and plotted for different sample sizes and various genetic and statistical parameters. Input variables (e.g. 'Genetic mode of inheritance',

**...**

The GUI PGA2 has a similar interface to PGA1, but it is designed to calculate and plot the minimum detectable relative risk (MDRR) for genetic loci, given a fixed number of cases and controls, according to their minor allele frequencies (MAFs). MDRR can calculate the smallest relative risk that can be detected, with sample in hand, at the target level of power. Hence, PGA2 can assist in designing fine mapping studies of prominent genomic loci, identified from familial linkage analyses or genome-wide association studies. For example, multiple markers along a 600-kb segment on human chromosome 8q24 have recently been associated with prostate cancer susceptibility [15-17]. Consequently, one may want to genotype additional SNPs in this region aiming to find the most strongly associated markers as a prelude to functional or comparative studies. Given a fixed sample size, there is a detection limit such that one is under-powered to detect true associations to SNPs with MAF below a certain threshold. Considerable resources can be saved by excluding SNPs with MAF below the detection threshold. For example, using the PGA2 tool reveals that with a sample size of 500 cases and controls and assuming an effective number of tests (effective degrees of freedom – EDF) of 500, there is no justification (power < 90%) to genotype SNPs with minor allele frequency (MAF) < 0.08 assuming a modest relative risk of ~2-fold as implied by the preliminary studies [15-17] (Figure (Figure1B1B).

An important utility for PGA1 and PGA2 is the GUI EDF, which calculates the effective degrees of freedom (EDF) for a particular set of SNP genotypes in linkage disequilibrium. This tool allows the user to assess the extent of multiple testing that is often overestimated or underestimated in naive power analyses. The EDF calculator accepts as input genotype data files from Hapmap [18] or tab-delimited text files. It calculates and maps the linkage disequilibrium patterns (r^{2}) among the SNPs in the dataset, and from these data computes a summary measure of the EDF [19] (Figure (Figure2).2). The value of EDF can then be used in PGA1 and PGA2 to precisely calibrate the calculations to the specific SNPs under consideration by a given study. It is important to note that other methods accounting for linkage disequilibrium between genetic markers as well as other approaches for multiple testing adjustments can be incorporated into the PGA calculations (see Additional file 1).

**Effective degrees of freedom calculator.**(A) HapMap SNP genotype data from human chromosome 8q24 (chr8:128100000-128700000) is used as an input. The calculated EDF for SNPs with MAF > 0.05 in this dataset is 608. (B) LD map for the selected SNPs

**...**

All the procedures included in the PGA GUIs are available in a single Matlab toolbox and can be executed at the Matlab command line. This allows Matlab users to use some of the incorporated functions in their own Matlab scripts. For example, to calculate EDF for 100 different regions with 80 SNPs each, took ~176 sec to run using a Windows XP dual 3.19 GHz, Intel Xion workstation.

## Discussion

The PGA package is well suited for power calculations where relatively small genomic regions are scanned for disease susceptibility loci. However, it can also be used to assess larger regions and even genome-wide association studies, via appropriate specification of the false positive rate, i.e. α/m where m is the number of genotyped markers in the study. Similarly to other popular software in this field [12-14], PGA incorporates basic power and sample size calculations for various genetic models and presents the results 'on the fly' in graphs and tables. In addition, it offers unique power analyses for haplotype data using the method of Chen et. al. [20]. Another novel feature is the calculation of minimal detectable risk over a range of marker allele frequencies, implemented in the PGA2 GUI. This tool may become extremely important in the current phase of genetic association studies where a large number of diseases-susceptibility genomic loci are revealed by genome-wide association studies (GWAS) [21-23]. These regions are expected to be further investigated in higher resolution, using a denser set of makers, in efforts to identify the actual predisposing genetic variation of these diseases. In this realm, PGA2 would facilitate the design of these studies by assessing power at the lower allele frequency threshold under consideration. Finally, the assessment of effective degrees of freedom for a particular genomic region or set of SNPs, as implemented in the GUI EDF, provides power calculation for procedures such as the minP test [20] that are more powerful than the conservative Bonferroni procedure. The incorporation of other methods for multiple testing adjustments (e.g. false discovery rate [24]) in automatic power calculation tools is more complex and requires specification of parameters such as the number of associated versus null SNPs and the magnitude of any effects. These calculations might be useful, especially for genome-wide association studies, but they are currently not in the scope of PGA.

Other freely-available software packages have features that are complimentary to PGA (see Additional file 2). The novel features of PGA are especially relevant to studies of candidate genes and fine-mapping efforts.

## Conclusion

The PGA package assembles a broad spectrum of statistical power calculations for genetic association studies in a single Matlab toolbox and three stand-alone GUIs. The software offers user-friendly tools for advanced calculations of statistical power and sample size and presents the results 'on the fly' in graphs and tables. Hence, PGA may significantly facilitate decision making and interpretation of association studies of candidate genes, fine-mapping studies, and genome-wide scans.

## Availability and requirements

• **Project name**: Power for genetic association analyses (PGA).

• **Project home page**: http://dceg.cancer.gov/bb/tools/pga

• **Operating system(s)**: Windows XP & Vista.

• **Programming language**: Matlab.

• **Other requirements**: To run the stand-alone GUIs, users without Matlab licenses should install first the MATLAB Component Runtime (MCR) that is available in the PGA home page.

• **Any restrictions to use by non-academics**: None

• **Reviewers access to the software**: reviewers can download the software in a way that preserves their anonymity, through the following links:

Readme file: http://dceg.cancer.gov/bb/tools/pga/readme

PGA.exe file: http://dceg.cancer.gov/PGA/pga.exe.

MCRinstaller file: http://dceg.cancer.gov/PGA/MCRInstaller.exe

## Authors' contributions

IM drafted the manuscript and assisted in the design and implementation of the software. PSR conceived of the study, assisted in the design and implementation of the software and in drafting the manuscript. BEC developed the software and helped draft the manuscript.

## Supplementary Material

**Additional file 2:**

Table 1. Major features of four commonly used power software for case-control association studies.

^{(74K, pdf)}

## Acknowledgements

This research was supported by the Intramural Research Program of the NIH, National Cancer Institute, Division of Cancer Epidemiology and Genetics.

## References

- Cardon LR, Bell JI. Association study designs for complex diseases. Nature reviews. 2001;2:91–99. doi: 10.1038/35052543. [PubMed] [Cross Ref]
- Lander ES, Schork NJ. Genetic dissection of complex traits. Science. 1994;265:2037–2048. doi: 10.1126/science.8091226. [PubMed] [Cross Ref]
- Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science. 1996;273:1516–1517. doi: 10.1126/science.273.5281.1516. [PubMed] [Cross Ref]
- Gordon D, Finch SJ. Factors affecting statistical power in the detection of genetic association. The Journal of clinical investigation. 2005;115:1408–1418. doi: 10.1172/JCI24756. [PMC free article] [PubMed] [Cross Ref]
- Lubin JH, Gail MH. On power and sample size for studying features of the relative odds of disease. American journal of epidemiology. 1990;131:552–566. [PubMed]
- De La Vega FM, Gordon D, Su X, Scafe C, Isaac H, Gilbert DA, Spier EG. Power and sample size calculations for genetic case/control studies using gene-centric SNP maps: application to human chromosomes 6, 21, and 22 in three populations. Human heredity. 2005;60:43–60. doi: 10.1159/000087918. [PubMed] [Cross Ref]
- Knight J. A survey of current software for genetic power calculations. Human genomics. 2004;1:225–227. [PMC free article] [PubMed]
- S.A.G.E. - Statistical Analysis for Genetic Epidemiology http://darwin.cwru.edu/sage/ [PMC free article] [PubMed]
- Lange C, DeMeo D, Silverman EK, Weiss ST, Laird NM. PBAT: tools for family-based association studies. American journal of human genetics. 2004;74:367–369. doi: 10.1086/381563. [PMC free article] [PubMed] [Cross Ref]
- Ploughman LM, Boehnke M. Estimating the power of a proposed linkage study for a complex genetic trait. American journal of human genetics. 1989;44:543–551. [PMC free article] [PubMed]
- Weeks DE, Ott J, Lathrop GM. SLINK: a general simulation program for linkage analysis. American journal of human genetics. 1990;47:A204.
- Purcell S, Cherny SS, Sham PC. Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics (Oxford, England) 2003;19:149–150. doi: 10.1093/bioinformatics/19.1.149. [PubMed] [Cross Ref]
- Gordon D, Haynes C, Blumenfeld J, Finch SJ. PAWE-3D: visualizing power for association with error in case-control genetic studies of complex traits. Bioinformatics (Oxford, England) 2005;21:3935–3937. doi: 10.1093/bioinformatics/bti643. [PubMed] [Cross Ref]
- Skol AD, Scott LJ, Abecasis GR, Boehnke M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nature genetics. 2006;38:209–213. doi: 10.1038/ng1706. [PubMed] [Cross Ref]
- Gudmundsson J, Sulem P, Manolescu A, Amundadottir LT, Gudbjartsson D, Helgason A, Rafnar T, Bergthorsson JT, Agnarsson BA, Baker A, Sigurdsson A, Benediktsdottir KR, Jakobsdottir M, Xu J, Blondal T, Kostic J, Sun J, Ghosh S, Stacey SN, Mouy M, Saemundsdottir J, Backman VM, Kristjansson K, Tres A, Partin AW, Albers-Akkers MT, Godino-Ivan Marcos J, Walsh PC, Swinkels DW, Navarrete S, Isaacs SD, Aben KK, Graif T, Cashy J, Ruiz-Echarri M, Wiley KE, Suarez BK, Witjes JA, Frigge M, Ober C, Jonsson E, Einarsson GV, Mayordomo JI, Kiemeney LA, Isaacs WB, Catalona WJ, Barkardottir RB, Gulcher JR, Thorsteinsdottir U, Kong A, Stefansson K. Genome-wide association study identifies a second prostate cancer susceptibility variant at 8q24. Nature genetics. 2007;39:631–637. doi: 10.1038/ng1999. [PubMed] [Cross Ref]
- Haiman CA, Patterson N, Freedman ML, Myers SR, Pike MC, Waliszewska A, Neubauer J, Tandon A, Schirmer C, McDonald GJ, Greenway SC, Stram DO, Le Marchand L, Kolonel LN, Frasco M, Wong D, Pooler LC, Ardlie K, Oakley-Girvan I, Whittemore AS, Cooney KA, John EM, Ingles SA, Altshuler D, Henderson BE, Reich D. Multiple regions within 8q24 independently affect risk for prostate cancer. Nature genetics. 2007;39:638–644. doi: 10.1038/ng2015. [PMC free article] [PubMed] [Cross Ref]
- Yeager M, Orr N, Hayes RB, Jacobs KB, Kraft P, Wacholder S, Minichiello MJ, Fearnhead P, Yu K, Chatterjee N, Wang Z, Welch R, Staats BJ, Calle EE, Feigelson HS, Thun MJ, Rodriguez C, Albanes D, Virtamo J, Weinstein S, Schumacher FR, Giovannucci E, Willett WC, Cancel-Tassin G, Cussenot O, Valeri A, Andriole GL, Gelmann EP, Tucker M, Gerhard DS, Fraumeni JF, Jr., Hoover R, Hunter DJ, Chanock SJ, Thomas G. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nature genetics. 2007;39:645–649. doi: 10.1038/ng2022. [PubMed] [Cross Ref]
- International HapMap Project http://www.hapmap.org/
- Nyholt DR. A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. American journal of human genetics. 2004;74:765–769. doi: 10.1086/383251. [PMC free article] [PubMed] [Cross Ref]
- Chen BE, Sakoda LC, Hsing AW, Rosenberg PS. Resampling-based multiple hypothesis testing procedures for genetic case-control association studies. Genetic epidemiology. 2006;30:495–507. doi: 10.1002/gepi.20162. [PubMed] [Cross Ref]
- Witte JS. Multiple prostate cancer risk variants on 8q24. Nature genetics. 2007;39:579–580. doi: 10.1038/ng0507-579. [PubMed] [Cross Ref]
- Manolio TA, Rodriguez LL, Brooks L, Abecasis G, Ballinger D, Daly M, Donnelly P, Faraone SV, Frazer K, Gabriel S, Gejman P, Guttmacher A, Harris EL, Insel T, Kelsoe JR, Lander E, McCowin N, Mailman MD, Nabel E, Ostell J, Pugh E, Sherry S, Sullivan PF, Thompson JF, Warram J, Wholley D, Milos PM, Collins FS. New models of collaboration in genome-wide association studies: the Genetic Association Information Network. Nature genetics. 2007;39:1045–1051. doi: 10.1038/ng2127. [PubMed] [Cross Ref]
- Frayling TM. Genome-wide association studies provide new insights into type 2 diabetes aetiology. Nature reviews. 2007;8:657–662. [PubMed]
- Benjamini Y, Y. H. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. 1995;57:289–300.

**BioMed Central**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (372K) |
- Citation

- Selecting single-nucleotide polymorphisms for association studies with SNPbrowser software.[Methods Mol Biol. 2007]
*De La Vega FM.**Methods Mol Biol. 2007; 376:177-93.* - SimHap GUI: an intuitive graphical user interface for genetic association analysis.[BMC Bioinformatics. 2008]
*Carter KW, McCaskie PA, Palmer LJ.**BMC Bioinformatics. 2008 Dec 25; 9:557. Epub 2008 Dec 25.* - SNP-VISTA: an interactive SNP visualization tool.[BMC Bioinformatics. 2005]
*Shah N, Teplitsky MV, Minovitsky S, Pennacchio LA, Hugenholtz P, Hamann B, Dubchak IL.**BMC Bioinformatics. 2005 Dec 8; 6:292. Epub 2005 Dec 8.* - SNPHunter: a bioinformatic software for single nucleotide polymorphism data acquisition and management.[BMC Bioinformatics. 2005]
*Wang L, Liu S, Niu T, Xu X.**BMC Bioinformatics. 2005 Mar 18; 6:60. Epub 2005 Mar 18.* - A generalized model to estimate the statistical power in mitochondrial disease studies involving 2×k tables.[PLoS One. 2013]
*Pardo-Seco J, Amigo J, González-Manteiga W, Salas A.**PLoS One. 2013; 8(9):e73567. Epub 2013 Sep 27.*

- The association of 5-alpha reductase type 2 (SRD5A2) gene polymorphisms with prostate cancer in a Korean population[Korean Journal of Urology. 2015]
*Choi SY, Kim HJ, Cheong HS, Myung SC.**Korean Journal of Urology. 2015 Jan; 56(1)19-30* - A Genome-Wide Association Study Identifies Potential Susceptibility Loci for Hirschsprung Disease[PLoS ONE. ]
*Kim JH, Cheong HS, Sul JH, Seo JM, Kim DY, Oh JT, Park KW, Kim HY, Jung SM, Jung K, Cho MJ, Bae JS, Shin HD.**PLoS ONE. 9(10)e110292* - Factors modifying the risk for developing acute skin toxicity after whole-breast intensity modulated radiotherapy[BMC Cancer. ]
*De Langhe S, Mulliez T, Veldeman L, Remouchamps V, van Greveling A, Gilsoul M, De Schepper E, De Ruyck K, De Neve W, Thierens H.**BMC Cancer. 14711* - Analysis of Interleukin-8 Gene Variants Reveals Their Relative Importance as Genetic Susceptibility Factors for Chronic Periodontitis in the Han Population[PLoS ONE. ]
*Zhang N, Xu Y, Zhang B, Zhang T, Yang H, Zhang B, Feng Z, Zhong D.**PLoS ONE. 9(8)e104436* - Validation of copy number variants associated with prostate cancer risk and prognosis[Urologic oncology. 2014]
*Blackburn A, Wilson D, Gelfond J, Yao L, Hernandez J, Thompson IM, Leach RJ, Lehman DM.**Urologic oncology. 2014 Jan; 32(1)44.e15-44.e20*

- PubMedPubMedPubMed citations for these articles

- PGA: power calculator for case-control genetic association analysesPGA: power calculator for case-control genetic association analysesBMC Genetics. 2008; 9()36

Your browsing activity is empty.

Activity recording is turned off.

See more...