Logo of bioinfoLink to Publisher's site
Bioinformatics. 2010 Sep 1; 26(17): 2190–2191.
Published online 2010 Jul 8. doi:  10.1093/bioinformatics/btq340
PMCID: PMC2922887

METAL: fast and efficient meta-analysis of genomewide association scans

Abstract

Summary: METAL provides a computationally efficient tool for meta-analysis of genome-wide association scans, which is a commonly used approach for improving power complex traits gene mapping studies. METAL provides a rich scripting interface and implements efficient memory management to allow analyses of very large data sets and to support a variety of input file formats.

Availability and implementation: METAL, including source code, documentation, examples, and executables, is available at http://www.sph.umich.edu/csg/abecasis/metal/

Contact: ude.hcimu@olacnog

1 INTRODUCTION

Meta-analysis is becoming an increasingly important tool in genome-wide association studies (GWAS) of complex genetic diseases and traits (de Bakker et al., 2008). Meta-analysis provides an efficient and practical strategy for detecting variants with modest effect sizes (Skol et al., 2007). We, and others, have used METAL for performing meta-analysis of GWAS to identify loci reproducibly associated with a variety of traits, such as type 2 diabetes (Scott et al., 2007; Zeggini et al., 2008), lipid levels (Kathiresan et al., 2008, 2009; Willer et al., 2008), BMI (Willer et al., 2009), blood pressure (Newton-Cheh et al., 2009) and fasting glucose levels (Prokopenko et al., 2009).

Meta-analysis of genome-wide association summary statistics, in contrast to direct analysis of pooled individual-level data, alleviates common concerns with privacy of study participants and avoids cumbersome integration of genotype and phenotypic data from different studies. Meta-analysis allows for custom analyses of individual studies to conveniently account for population substructure, the presence of related individuals, study-specific covariates and many other ascertainment-related issues. It has been shown that meta-analysis of summary statistics is as efficient (in terms of statistical power) as pooling individual-level data across studies, but much less cumbersome (Lin and Zeng, 2009). Since GWAS routinely examine evidence for association at millions of directly genotyped and imputed SNPs across dozens or even hundreds of individual studies, it is important to use a fast and flexible tool to perform meta-analysis.

2 METHODS

The basic principle of meta-analysis is to combine the evidence for association from individual studies, using appropriate weights. METAL implements two approaches. The first approach converts the direction of effect and P-value observed in each study into a signed Z-score such that very negative Z-scores indicate a small P-value and an allele associated with lower disease risk or quantitative trait levels, whereas large positive Z-scores indicate a small P-value and an allele associated with higher disease risk or quantitative trait levels. Z-scores for each allele are combined across samples in a weighted sum, with weights proportional to the square-root of the sample size for each study (Stouffer et al., 1949). In a study with unequal numbers of cases and controls, we recommend that the effective sample size be provided in the input file, where Neff = 4/(1/Ncases+1/Nctrls). This approach is very flexible and allows results to be combined even when effect size estimates are not available or the β-coefficients and standard errors from individual studies are in different units. The second approach implemented in METAL weights the effect size estimates, or β-coefficients, by their estimated standard errors. This second approach requires effect size estimates and their standard errors to be in consistent units across studies. Asymptotically, the two approaches are equivalent when the trait distribution is identical across samples (such that standard errors are a predictable function of sample size). Key formulae for both approaches are in Table 1.

Table 1.
Formulae for meta-analysis

3 RESULTS

3.1 Implementation

In implementing our software for meta-analysis, a primary consideration was to facilitate identification and resolution of common problems in meta-analysis. A secondary consideration was the ability to specify custom headers and delimiters so as to combine input files with varying formats generated from a variety of statistical packages. METAL tries to resolve or flag common problems that result from an inconsistent choice of allele labels or genomic strand across studies, or the presence of invalid P-values or test statistics at a subset of markers (due to numerical errors). METAL allows data to be filtered according to quality control measures, and can handle very large data sets (that typically total several GB in size) in workstations with a memory capacity not exceeding 2 GB.

3.2 Usage

METAL has been used extensively by many groups since its initial release in January 2008. This field testing enabled not only thorough debugging but improvements in error-detection methods. METAL can be run interactively or with a command script as input. Input files are processed one at a time and used to update intermediate statistics stored in memory. METAL implements Cochran's Q-test for heterogeneity (Cochran, 1954) and the appropriate statistics can be calculated if requested by the user. METAL was designed for flexible formatting of input files, and allows users to customize labels for key columns, input field delimiters and other characteristics of each input file. Information on genomic strand is used, if available, and—when it is unavailable—METAL automatically resolves strand mismatches for markers where strand is obvious (e.g. all SNPs except those with A/T and C/G alleles). METAL has an option to estimate a genomic control parameter (Devlin and Roeder, 1999) for each input file and apply an appropriate genomic control correction to input statistics prior to performing meta-analysis. To facilitate the detection of allele labels that may have been mis-specified by the user, which is critical for the correct determination of the direction of effect, METAL implements an option to output the mean, variance and minimum and maximum allele frequencies for each marker. METAL will track custom statistics, such as cumulative sample size, even when the standard error-weighted meta-analysis was performed. METAL can read gzipped files to allow for efficient use of disk space and optionally allows for subsets of markers to be analyzed. Full documentation of all options is available at http://www.sph.umich.edu/csg/abecasis/metal/.

3.3 Performance

METAL was written in C++ and is freely available for download. METAL compiles and runs on most Unix and Linux systems, and on Windows and Mac workstations. We recently performed a meta-analysis of GWAS for BMI (Willer et al., 2009). The analysis included 15 studies, each with association statistics at 2.2–2.5 million SNPs (average file size 225 MB), for a total of 36 million association statistics and a set of input files totaling 3.4 GB. This analysis required <6 min computing time and 790 MB of memory on a 2.83 GHz Intel processor. Runtime scales linearly with the number of studies examined—a meta-analysis including 74 input files (each with >2.5 m SNPs) took 36 min and 1 GB of memory.

ACKNOWLEDGEMENTS

The authors thank Michael Boehnke, Hyun Min Kang and Anne Jackson for reviewing early versions of this article. We are also grateful to numerous collaborators in the GIANT Consortium, the Global Lipids Genetic Consortium and the DIAGRAM Consortium for testing METAL and providing many useful suggestions.

Funding: G.R.A. was supported in part by the National Human Genome Research Institute (HG0002651 and HG0005214) and the National Institute of Mental Health (MH084698). C.J.W. was supported by a Pathway to Independence Award from the National Heart, Lung and Blood Institute (K99HL094535). Y.L. was supported by the National Institute for Diabetes and Digestive and Kidney Diseases (DK078150-03, PI Mohlke) and the National Cancer Institute (CA082659-11S1, PI Lin).

Conflict of Interest: none declared.

REFERENCES

  • Cochran WG. The combination of estimates from different experiments. Biometrics. 1954;10:101–129.
  • de Bakker PI, et al. Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum. Mol. Genet. 2008;17:R122–R128. [PMC free article] [PubMed]
  • Devlin B, Roeder K. Genomic control for association studies. Biometrics. 1999;55:997–1004. [PubMed]
  • Kathiresan S, et al. Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans. Nat. Genet. 2008;40:189–197. [PMC free article] [PubMed]
  • Kathiresan S, et al. Common variants at 30 loci contribute to polygenic dyslipidemia. Nat. Genet. 2009;41:56–65. [PMC free article] [PubMed]
  • Lin DY, Zeng D. Meta-analysis of genome-wide association studies: no efficiency gain in using individual participant data. Genet. Epidemiol. 2009;34:60–66. [PMC free article] [PubMed]
  • Newton-Cheh C, et al. Genome-wide association study identifies eight loci associated with blood pressure. Nat. Genet. 2009;41:666–676. [PMC free article] [PubMed]
  • Prokopenko I, et al. Variants in MTNR1B influence fasting glucose levels. Nat. Genet. 2009;41:77–81. [PMC free article] [PubMed]
  • Scott LJ, et al. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science. 2007;316:1341–1345. [PMC free article] [PubMed]
  • Skol AD, et al. Optimal designs for two-stage genome-wide association studies. Genet. Epidemiol. 2007;31:776–788. [PubMed]
  • Stouffer SA, et al. Adjustment During Army Life. Princeton, NJ: Princeton University Press; 1949.
  • Willer CJ, et al. Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nat. Genet. 2008;40:161–169. [PubMed]
  • Willer CJ, et al. Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nat. Genet. 2009;41:25–34. [PMC free article] [PubMed]
  • Zeggini E, et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat. Genet. 2008;40:638–645. [PMC free article] [PubMed]

Articles from Bioinformatics are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Save items

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

  • The Genetic Architecture of the Human Immune System: A Bioresource for Autoimmunity and Disease Pathogenesis[Cell. 2015]
    Roederer M, Quaye L, Mangino M, Beddall MH, Mahnke Y, Chattopadhyay P, Tosi I, Napolitano L, Barberio MT, Menni C, Villanova F, Di Meglio P, Spector TD, Nestle FO. Cell. 2015 Apr 9; 161(2)387-403
  • Common genetic variants influence human subcortical brain structures[Nature. 2015]
    Hibar DP, Stein JL, Renteria ME, Arias-Vasquez A, Desrivières S, Jahanshad N, Toro R, Wittfeld K, Abramovic L, Andersson M, Aribisala BS, Armstrong NJ, Bernard M, Bohlken MM, Boks MP, Bralten J, Brown AA, Chakravarty MM, Chen Q, Ching CR, Cuellar-Partida G, den Braber A, Giddaluru S, Goldman AL, Grimm O, Guadalupe T, Hass J, Woldehawariat G, Holmes AJ, Hoogman M, Janowitz D, Jia T, Kim S, Klein M, Kraemer B, Lee PH, Olde Loohuis LM, Luciano M, Macare C, Mather KA, Mattheisen M, Milaneschi Y, Nho K, Papmeyer M, Ramasamy A, Risacher SL, Roiz-Santiañez R, Rose EJ, Salami A, Sämann PG, Schmaal L, Schork AJ, Shin J, Strike LT, Teumer A, van Donkelaar MM, van Eijk KR, Walters RK, Westlye LT, Whelan CD, Winkler AM, Zwiers MP, Alhusaini S, Athanasiu L, Ehrlich S, Hakobjan MM, Hartberg CB, Haukvik UK, Heister AJ, Hoehn D, Kasperaviciute D, Liewald DC, Lopez LM, Makkinje RR, Matarin M, Naber MA, McKay DR, Needham M, Nugent AC, Pütz B, Royle NA, Shen L, Sprooten E, Trabzuni D, van der Marel SS, van Hulzen KJ, Walton E, Wolf C, Almasy L, Ames D, Arepalli S, Assareh AA, Bastin ME, Brodaty H, Bulayeva KB, Carless MA, Cichon S, Corvin A, Curran JE, Czisch M, de Zubicaray GI, Dillman A, Duggirala R, Dyer TD, Erk S, Fedko IO, Ferrucci L, Foroud TM, Fox PT, Fukunaga M, Gibbs JR, Göring HH, Green RC, Guelfi S, Hansell NK, Hartman CA, Hegenscheid K, Heinz A, Hernandez DG, Heslenfeld DJ, Hoekstra PJ, Holsboer F, Homuth G, Hottenga JJ, Ikeda M, Jack CR Jr, Jenkinson M, Johnson R, Kanai R, Keil M, Kent JW Jr, Kochunov P, Kwok JB, Lawrie SM, Liu X, Longo DL, McMahon KL, Meisenzahl E, Melle I, Mohnke S, Montgomery GW, Mostert JC, Mühleisen TW, Nalls MA, Nichols TE, Nilsson LG, Nöthen MM, Ohi K, Olvera RL, Perez-Iglesias R, Pike GB, Potkin SG, Reinvang I, Reppermund S, Rietschel M, Romanczuk-Seiferth N, Rosen GD, Rujescu D, Schnell K, Schofield PR, Smith C, Steen VM, Sussmann JE, Thalamuthu A, Toga AW, Traynor BJ, Troncoso J, Turner JA, Valdés Hernández MC, van ’t Ent D, van der Brug M, van der Wee NJ, van Tol MJ, Veltman DJ, Wassink TH, Westman E, Zielke RH, Zonderman AB, Ashbrook DG, Hager R, Lu L, McMahon FJ, Morris DW, Williams RW, Brunner HG, Buckner RL, Buitelaar JK, Cahn W, Calhoun VD, Cavalleri GL, Crespo-Facorro B, Dale AM, Davies GE, Delanty N, Depondt C, Djurovic S, Drevets WC, Espeseth T, Gollub RL, Ho BC, Hoffmann W, Hosten N, Kahn RS, Le Hellard S, Meyer-Lindenberg A, Müller-Myhsok B, Nauck M, Nyberg L, Pandolfo M, Penninx BW, Roffman JL, Sisodiya SM, Smoller JW, van Bokhoven H, van Haren NE, Völzke H, Walter H, Weiner MW, Wen W, White T, Agartz I, Andreassen OA, Blangero J, Boomsma DI, Brouwer RM, Cannon DM, Cookson MR, de Geus EJ, Deary IJ, Donohoe G, Fernández G, Fisher SE, Francks C, Glahn DC, Grabe HJ, Gruber O, Hardy J, Hashimoto R, Hulshoff Pol HE, Jönsson EG, Kloszewska I, Lovestone S, Mattay VS, Mecocci P, McDonald C, McIntosh AM, Ophoff RA, Paus T, Pausova Z, Ryten M, Sachdev PS, Saykin AJ, Simmons A, Singleton A, Soininen H, Wardlaw JM, Weale ME, Weinberger DR, Adams HH, Launer LJ, Seiler S, Schmidt R, Chauhan G, Satizabal CL, Becker JT, Yanek L, van der Lee SJ, Ebling M, Fischl B, Longstreth WT Jr, Greve D, Schmidt H, Nyquist P, Vinke LN, van Duijn CM, Xue L, Mazoyer B, Bis JC, Gudnason V, Seshadri S, Ikram MA, The Alzheimer’s Disease Neuroimaging Initiative, The CHARGE Consortium, EPIGEN, IMAGEN, SYS, Martin NG, Wright MJ, Schumann G, Franke B, Thompson PM, Medland SE. Nature. 2015 Apr 9; 520(7546)224-229
  • Genome-wide meta-analysis identifies six novel loci associated with habitual coffee consumption[Molecular psychiatry. ]
    Coffee and Caffeine Genetics Consortium, Cornelis MC, Byrne EM, Esko T, Nalls MA, Ganna A, Paynter N, Monda KL, Amin N, Fischer K, Renstrom F, Ngwa JS, Huikari V, Cavadino A, Nolte IM, Teumer A, Yu K, Marques-Vidal P, Rawal R, Manichaikul A, Wojczynski MK, Vink JM, Zhao JH, Burlutsky G, Lahti J, Mikkilä V, Lemaitre RN, Eriksson J, Musani SK, Tanaka T, Geller F, Luan J, Hui J, Mägi R, Dimitriou M, Garcia ME, Ho WK, Wright MJ, Rose LM, Magnusson PK, Pedersen NL, Couper D, Oostra BA, Hofman A, Ikram MA, Tiemeier HW, Uitterlinden AG, van Rooij FJ, Barroso I, Johansson I, Xue L, Kaakinen M, Milani L, Power C, Snieder H, Stolk RP, Baumeister SE, Biffar R, Gu F, Bastardot F, Kutalik Z, Jacobs DR Jr, Forouhi NG, Mihailov E, Lind L, Lindgren C, Michaëlsson K, Morris A, Jensen M, Khaw KT, Luben RN, Wang JJ, Männistö S, Perälä MM, Kähönen M, Lehtimäki T, Viikari J, Mozaffarian D, Mukamal K, Psaty BM, Döring A, Heath AC, Montgomery GW, Dahmen N, Carithers T, Tucker KL, Ferrucci L, Boyd HA, Melbye M, Treur JL, Mellström D, Hottenga JJ, Prokopenko I, Tönjes A, Deloukas P, Kanoni S, Lorentzon M, Houston DK, Liu Y, Danesh J, Rasheed A, Mason MA, Zonderman AB, Franke L, Kristal BS, International Parkinson’s Disease Genomics Consortium (IPDGC), North American Brain Expression Consortium (NABEC), UK Brain Expression Consortium (UKBEC), Karjalainen J, Reed DR, Westra HJ, Evans MK, Saleheen D, Harris TB, Dedoussis G, Curhan G, Stumvoll M, Beilby J, Pasquale LR, Feenstra B, Bandinelli S, Ordovas JM, Chan AT, Peters U, Ohlsson C, Gieger C, Martin NG, Waldenberger M, Siscovick DS, Raitakari O, Eriksson JG, Mitchell P, Hunter DJ, Kraft P, Rimm EB, Boomsma DI, Borecki IB, Loos RJ, Wareham NJ, Vollenweider P, Caporaso N, Grabe HJ, Neuhouser ML, Wolffenbuttel BH, Hu FB, Hyppönen E, Järvelin MR, Cupples LA, Franks PW, Ridker PM, van Duijn CM, Heiss G, Metspalu A, North KE, Ingelsson E, Nettleton JA, van Dam RM, Chasman DI. Molecular psychiatry.10.1038/mp.2014.107
  • Meta-analysis of 65,734 Individuals Identifies TSPAN15 and SLC44A2 as Two Susceptibility Loci for Venous Thromboembolism[American Journal of Human Genetics. 2015]
    Germain M, Chasman DI, de Haan H, Tang W, Lindström S, Weng LC, de Andrade M, de Visser MC, Wiggins KL, Suchon P, Saut N, Smadja DM, Le Gal G, van Hylckama Vlieg A, Di Narzo A, Hao K, Nelson CP, Rocanin-Arjo A, Folkersen L, Monajemi R, Rose LM, Brody JA, Slagboom E, Aïssi D, Gagnon F, Deleuze JF, Deloukas P, Tzourio C, Dartigues JF, Berr C, Taylor KD, Civelek M, Eriksson P, Cardiogenics Consortium, Psaty BM, Houwing-Duitermaat J, Goodall AH, Cambien F, Kraft P, Amouyel P, Samani NJ, Basu S, Ridker PM, Rosendaal FR, Kabrhel C, Folsom AR, Heit J, Reitsma PH, Trégouët DA, Smith NL, Morange PE. American Journal of Human Genetics. 2015 Apr 2; 96(4)532-542
  • A meta-analysis of 87,040 individuals identifies 23 new susceptibility loci for prostate cancer[Nature genetics. 2014]
    Al Olama AA, Kote-Jarai Z, Berndt SI, Conti DV, Schumacher F, Han Y, Benlloch S, Hazelett DJ, Wang Z, Saunders E, Leongamornlert D, Lindstrom S, Jugurnauth-Little S, Dadaev T, Tymrakiewicz M, Stram DO, Rand K, Wan P, Stram A, Sheng X, Pooler LC, Park K, Xia L, Tyrer J, Kolonel LN, Le Marchand L, Hoover RN, Machiela MJ, Yeager M, Burdette L, Chung CC, Hutchinson A, Yu K, Goh C, Ahmed M, Govindasami K, Guy M, Tammela TL, Auvinen A, Wahlfors T, Schleutker J, Visakorpi T, Leinonen KA, Xu J, Aly M, Donovan J, Travis RC, Key TJ, Siddiq A, Canzian F, Khaw KT, Takahashi A, Kubo M, Pharoah P, Pashayan N, Weischer M, Nordestgaard BG, Nielsen SF, Klarskov P, Røder MA, Iversen P, Thibodeau SN, McDonnell SK, Schaid DJ, Stanford JL, Kolb S, Holt S, Knudsen B, Coll AH, Gapstur SM, Diver WR, Stevens VL, Maier C, Luedeke M, Herkommer K, Rinckleb AE, Strom SS, Pettaway C, Yeboah ED, Tettey Y, Biritwum RB, Adjei AA, Tay E, Truelove A, Niwa S, Chokkalingam AP, Cannon-Albright L, Cybulski C, Wokołorczyk D, Kluźniak W, Park J, Sellers T, Lin HY, Isaacs WB, Partin AW, Brenner H, Dieffenbach AK, Stegmaier C, Chen C, Giovannucci EL, Ma J, Stampfer M, Penney KL, Mucci L, John EM, Ingles SA, Kittles RA, Murphy AB, Pandha H, Michael A, Kierzek AM, Blot W, Signorello LB, Zheng W, Albanes D, Virtamo J, Weinstein S, Nemesure B, Carpten J, Leske C, Wu SY, Hennis A, Kibel AS, Rybicki BA, Neslund-Dudas C, Hsing AW, Chu L, Goodman PJ, Klein EA, Zheng SL, Batra J, Clements J, Spurdle A, Teixeira MR, Paulo P, Maia S, Slavov C, Kaneva R, Mitev V, Witte JS, Casey G, Gillanders EM, Seminara D, Riboli E, Hamdy FC, Coetzee GA, Li Q, Freedman ML, Hunter DJ, Muir K, Gronberg H, Neal DE, Southey M, Giles GG, Severi G, The Breast and Prostate Cancer Cohort Consortium (BPC3), The PRACTICAL (Prostate Cancer Association Group to Investigate Cancer-Associated Alterations in the Genome) Consortium, The COGS (Collaborative Oncological Gene-environment Study) Consortium, The GAME-ON/ELLIPSE Consortium, Cook MB, Nakagawa H, Wiklund F, Kraft P, Chanock SJ, Henderson BE, Easton DF, Eeles RA, Haiman CA. Nature genetics. 2014 Oct; 46(10)1103-1109
See all...

Links

  • Compound
    Compound
    PubChem chemical compound records that cite the current articles. These references are taken from those provided on submitted PubChem chemical substance records. Multiple substance records may contribute to the PubChem compound record.
  • PubMed
    PubMed
    PubMed citations for these articles
  • Substance
    Substance
    PubChem chemical substance records that cite the current articles. These references are taken from those provided on submitted PubChem chemical substance records.

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...