Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Nat Genet. Author manuscript; available in PMC Apr 1, 2013.
Published in final edited form as:
Published online Apr 25, 2010. doi:  10.1038/ng.572
PMCID: PMC3612983

Meta-analysis and imputation refines the association of 15q25 with smoking quantity

Jason Z. Liu,1 Federica Tozzi,2 Dawn M. Waterworth,3 Sreekumar G. Pillai,3 Pierandrea Muglia,2 Lefkos Middleton,4 Wade Berrettini,5 Christopher W. Knouff,6 Xin Yuan,3 Gérard Waeber,7,8 Peter Vollenweider,7,8 Martin Preisig,7,9 Nicholas J Wareham,10 Jing Hua Zhao,10 Ruth J.F. Loos,10 Inês Barroso,11 Kay-Tee Khaw,12 Scott Grundy,13 Philip Barter,14 Robert Mahley,15,16 Antero Kesaniemi,17,18 Ruth McPherson,19 John B. Vincent,20 John Strauss,20 James L. Kennedy,20 Anne Farmer,21 Peter McGuffin,21 Richard Day,22 Keith Matthews,22 Per Bakke,23 Amund Gulsvik,23 Susanne Lucae,24 Marcus Ising,24 Tanja Brueckl,24 Sonja Horstmann,24 H.-Erich Wichmann,25,26,27 Rajesh Rawal,25 Norbert Dahmen,28 Claudia Lamina,25,29 Ozren Polasek,30 Lina Zgaga,31 Jennifer Huffman,32 Susan Campbell,32 Jaspal Kooner,33 John C Chambers,34 Mary Susan Burnett,35 Joseph M. Devaney,35 Augusto D. Pichard,35 Kenneth M. Kent,35 Lowell Satler,35 Joseph M. Lindsay,35 Ron Waksman,35 Stephen Epstein,35 James F. Wilson,31 Sarah H. Wild,31 Harry Campbell,31 Veronique Vitart,32 Muredach P. Reilly,36,37 Mingyao Li,38 Liming Qu,38 Robert Wilensky,36 William Matthai,36 Hakon H. Hakonarson,39 Daniel J. Rader,36,37 Andre Franke,40 Michael Wittig,40 Arne Schäfer,40 Manuela Uda,41 Antonio Terracciano,42 Xiangjun Xiao,43 Fabio Busonero,41 Paul Scheet,43 David Schlessinger,42 David St Clair,44 Dan Rujescu,45 Gonçalo R. Abecasis,46 Hans Jörgen Grabe,47 Alexander Teumer,48 Henry Völzke,49 Astrid Petersmann,50 Ulrich John,51 Igor Rudan,52,31 Caroline Hayward,32 Alan F. Wright,32 Ivana Kolcic,30 Benjamin J Wright,53 John R Thompson,53 Anthony J. Balmforth,54 Alistair S. Hall,54 Nilesh J. Samani,55 Carl A. Anderson,11 Tariq Ahmad,56 Christopher G. Mathew,57 Miles Parkes,58 Jack Satsangi,59 Mark Caulfield,60 Patricia B. Munroe,60 Martin Farrall,61 Anna Dominiczak,62 Jane Worthington,63 Wendy Thomson,63 Steve Eyre,63 Anne Barton,63 The Wellcome Trust Case Control Consortium, Vincent Mooser,3 Clyde Francks,2,64 and Jonathan Marchini1


Smoking is a leading global cause of disease and mortality1. We performed a genomewide meta-analytic association study of smoking-related behavioral traits in a total sample of 41,150 individuals drawn from 20 disease, population, and control cohorts. Our analysis confirmed an effect on smoking quantity (SQ) at a locus on 15q25 (P=9.45e-19) that includes three genes encoding neuronal nicotinic acetylcholine receptor subunits (CHRNA5, CHRNA3, CHRNB4). We used data from the 1000 Genomes project to investigate the region using imputation, which allowed analysis of virtually all common variants in the region and offered a five-fold increase in coverage over the HapMap. This increased the spectrum of potentially causal single nucleotide polymorphisms (SNPs), which included a novel SNP that showed the highest significance, rs55853698, located within the promoter region of CHRNA5. Conditional analysis also identified a secondary locus (rs6495308) in CHRNA3.

Smoking behavior and Nicotine Dependence (ND) are multifactorial traits with substantial genetic influences2. There is an urgent need to better understand the molecular neurobiology of ND, in order to design targeted, more effective therapies3. Recently, genome-wide association scans (GWAS) have established one locus in ND and Smoking Quantity (SQ), which implicates a cluster of three genes encoding neuronal nicotinic acetylcholine receptor subunits, CHRNA5, CHRNA3, and CHRNB4, on chromosome 15q2548. The locus is also associated with lung cancer7,9,10, peripheral arterial disease7, and chronic obstructive pulmonary disease and lung function11.

We initially performed a GWAS meta-analytic study of smoking-related traits in a total sample of 41,150 individuals of white European descent, sourced from multiple disease, population and control cohorts (Table 1, Supplementary Table 1, Online Methods). As the cohorts were genotyped on a variety of different genome-wide SNP arrays (Table 1, Supplementary Table 1), we first imputed genotypes for all datasets12, for all SNPs in the HapMap version release 2213.

Table 1
Summary information for the cohorts used in meta-analysis. Further details are given in Online Methods and Supplementary Table 1.

The main focus of our analysis was on SQ within current or past smokers, as a semi-quantitative trait based on the self-reported variable of Cigarettes-per-Day (CPD)7. We performed association analysis separately within each cohort under an additive model, using covariate effects for age and sex, disease case/control status where applicable, and other cohort-specific covariates (Supplementary Table 1). The meta-analysis was then carried out by combining study-specific β- estimates using a fixed effects model14. In total, 15,574 subjects reported CPD values >0 and were used for meta-analysis of SQ (Table 1, Supplementary Table 1). We followed up our most promising association findings by comparing them with results from two concurrent GWAS meta-analyses of smoking; the ENGAGE study of 46,481 subjects15, and the TAG study of 74,035 subjects16. We also made our meta-analysis results available to the authors of those studies to check their top findings for replication.

Our meta-analysis of SQ identified the CHRNA5/CHRNA3 locus on 15q25 as the single outstandingly significant locus in the genome (Figure 1, Table 2, Supplementary Table 2), with a minimum P=9.45e-19 for rs1051730, which has been a SNP commonly reported48, and very low P values for many other SNPs in the region (Supplementary Figure 1, Supplementary Table 2). All cohorts in the analysis contributed at least somewhat to the 15q25 association (Supplementary Figure 1). Each copy of the ‘high-smoking’ A allele (34% frequency) had a quantitative effect size on SQ of 0.079 (95% CI 0.070–0.088) which is inline with previous estimates7. Joint analysis of our total dataset together with TAG and ENGAGE, for rs1051730, yielded P=1.71E-66 (Table 2).

Figure 1
Manhattan plot showing the significance of association of all SNPs in genome-wide SQ meta-analysis. SNPs are plotted on the x-axis according to their positions on each chromosome, against association with SQ on the y-axis (−log10 P-value). SNPs ...
Table 2
Summary information for selected SNPs at 15q25 from meta-analysis of association with the Smoking Quantity (SQ) phenotype. Our study is referred to as OX-GSK. Information for all SNPs spanning the 15q25 locus in our genomewide analysis is given in Supplementary ...

Multiple variants at the 15q25 locus have been suggested to underlie its effect, including a non-synonymous SNP in CHRNA5, together with variants that affect mRNA expression levels1719. We decided to use our very large sample, in combination with data from the 1000 Genomes Project (see URL below), to perform fine mapping and modeling of the 15q25 locus in relation to SQ. We reasoned that, with the near complete information on common variants derived from 1000 Genomes, it might be possible to pinpoint a variant, or combination of variants, that can explain all the signal of association at 15q25. We used data from 108 estimated CEU haplotypes from the April 2009 release of the 1000 Genomes Pilot 1 data. This contained 2189 SNPs in our region of interest (See Online Methods), approximately a five-fold increase in coverage compared to 437 SNPs in release 22 of the HapMap. By imputing genotypes for all SNPs across this locus from 1000 Genomes, and repeating the meta-analysis, we found that the most significant association was with a novel and previously untested SNP, not in the HapMap, located within the 5' untranslated region (UTR) of CHRNA5, which makes it a candidate for affecting mRNA transcription (rs55853698, P = 1.31E-16; Figure 2). The p-value for the commonly reported SNP rs1051730 in this analysis was similar but a little higher, P=1.47E-15. (P values for our 1000 Genomes analysis are generally higher than our HapMap-based analysis because not all cohorts were included in the 1000 Genomes imputation - see Online Methods.) SNP rs55853698 is a G/T substitution where the G allele has a frequency ranging from 0.313 to 0.378 across the various cohorts.

Figure 2
Chromosome 15q25 signal plots. Top: Signal plot based on 1000 Genomes imputation and meta-analysis of SQ association. SNPs are plotted by their positions on the chromosome, against association with SQ (−log10 p-value) on the left Y-axis. The five ...

To investigate whether the association at 15q25 can be explained completely by rs55853698, we carried out tests of association for all SNPs spanning the CHRNA5/CHRNA3 locus conditional upon this SNP (Figure 2). Residual association was still detected at many SNPs in the region, with the most significant signal occurring at rs6495308 (P= 3.96E-05), located within an intron of CHRNA3 (Figure 2). In unconditioned analysis rs6495308 has a marginal association in the meta-analysis of P=3.30E-10. Further conditioning on rs6495308, after already conditioning on rs55853698, leaves no obvious signal of association in the region (Supplementary Figure 2), suggesting that these two SNPs together could be sufficient to explain this genetic effect.

Wang et al.18 suggested that a non-synonymous SNP rs16969968, in CHRNA5, is functional for ND risk (and lung cancer risk), but also that variants that cause high expression of CHRNA5 mRNA, tagged by SNP rs588765, increase the risk for ND independently. The marginal p-values of rs16969968 and rs588765 in our meta-analysis were P=1.64E-18 and P=1.74E-03. Conditional analysis on rs16969968 within our cohorts still left residual association within the region (Supplementary Figure 2), with the most significant signal again occurring at rs6495308 (P=1.54E-05). Conditioning on both rs16969968 and rs588765, i.e. the proposed combination of Wang et al.18, leaves no obvious signal of association (Supplementary Figure 2). To further investigate which pair of SNPs best explains the signal of association we used the Bayesian Information Criteria (BIC) measure of model fit 20. For the model of Wang et al.18, i.e. conditioning on both rs16969968 and rs588765, we obtained BIC = 22719.87, posterior probability 0.15. For the model conditioning on the novel promoter SNP rs55853698, and rs6495308, we obtained BIC = 22716.49, posterior probability 0.85, which indicates a better model fit.

Examination of the LD structure between the SNPs that we have considered shows that rs1051730, rs16969968, and rs55853698 are all close tagging proxies of each other (all pairwise R2 > 0.96). These variants tag, or cause, the principal risk for high SQ attributable to the 15q25 locus, but the high LD makes it difficult to assign causality. The ‘residual association’ SNPs rs588765 and rs6495308 are in low LD with each other (R2 = 0.21), and are both only in modest LD with the principal SNPs (maximum R2 = 0.47). It is not therefore clear that this locus can be completely understood in the way proposed by Wang et al.18. While the non-synonymous SNP in CHRNA5, rs16969968, may be important, we have identified a novel and potentially functional SNP in the 5' UTR of this gene that is a close proxy to the non-synonymous SNP in terms of LD, but shows a slightly more significant association in our meta-analysis. Then, while rs588765 can explain much of the secondary or residual association at this locus, we find that a largely independent variant within CHRNA3, rs6495308, is the best tagger of the residually associated variation, while also contributing to a better fitting 2-SNP model, and having a much stronger marginal significance in unconditioned analysis (P=3.30E-10 for rs6495308 compared to P=1.74E-03 for rs588765).

Our analysis has, for the first time, surveyed virtually all of the common variants in the 15q25 region, and provides one of the first examples of how data from the 1000 Genomes Project can contribute new information to mapping and characterizing loci for complex traits. We recommend that further analysis of this locus should not be limited in focus to CHRNA5, nor particularly to the common, non-synonymous SNP rs16969968. It is notoriously difficult to distinguish functional variation in the context of high LD across a region21. There are numerous ways in which variants can be functional, including expression regulatory changes that affect close or distant genes, epigenetic changes, splicing effects, alterations to microRNA binding sites, or non-coding RNAs21. It is also conceivable that association with common variants can arise through the effects of multiple rarer variants that happen to be relatively restricted to specific haplotype backgrounds.

The second strongest association within the genome in our meta-analysis, for SQ, was at a locus on 8p21 that received modest support from the TAG and ENGAGE studies (Supplementary Table 2, Supplementary Figure 3; P=5.26E-07 for rs11782673). This locus would not survive correcting for genome-wide multiple testing, although it is noteworthy that the locus spans another neuronal nicotinic acetylcholine receptor subunit gene, CHRNA2.

In addition to our analysis of SQ, we also tested genome-wide for allelic differences between those who reported currently smoking, or smoking in the past, versus those who said they had never been smokers (the EVER/NEVER phenotype; sample sizes in Table 1, Supplementary Table 1). This was in order to identify genetic effects on the establishment of a smoking habit. No locus achieved genome-wide significance, and none of the top 15 loci showed evidence for replication (Supplementary Table 2, Supplementary Figure 4). Likewise, no consistent results emerged when we tested for allelic differences between those who reported currently smoking versus those who had smoked in the past but had stopped at the time of interview (Supplementary Table 2, Supplementary Figure 4). When age-adjusted, this is a rough measure of smoking cessation.

Our study identified association at some loci which, while not reaching genomewide significance in our own meta-analysis, supported findings from the concurrent TAG and ENGAGE studies15,16. These include novel loci on chromosomes 8 and 19 for SQ, 11 for EVER/NEVER, and 9 for Current/Non-Current15,16. These findings have provided further novel insights into the biology of smoking behavior.


Study samples

Study collections and their basic characteristics are listed in Table 1 and Supplementary Table 1. Subjects used in our analysis were adults of white European descent. Summary descriptions of the collections are given below, together with primary citations that describe the collections fully. Data were used in accordance with the ethical permissions and consents relating to each collection.

GEMS22: The Genetic Epidemiology of Metabolic Syndrome (GEMS) study consists of dyslipidaemic cases (age 20–65 years) matched with normolipidaemic controls by sex and recruitment site, drawn from non-Mediterranean subjects of the Genetic Epidemiology of Metabolic Syndrome study (Finland, Switzerland, Canada, Australia, USA).

CoLaus23: The Cohorte Lausannoise (CoLaus) is a single-center, cross-sectional population-based study, including individuals aged 35 to 75 years randomly selected from the list of residents of the city of Lausanne (Switzerland).

GSK COPD11: This collection includes cases with chronic obstructive pulmonary disease diagnosed according to Global Initiative of Chronic Obstructive Lung Disease (GOLD) criteria, and unaffected controls recruited from Bergen, Norway.

GSK UPD24: This collection includes cases with recurrent major depression according to DSM-IV criteria and age- and gender-matched non-affected controls, recruited at the Max-Planck Institute of Psychiatry in Munich, Germany; patients were also recruited at two satellite recruiting hospitals (BKH Augsburg and Klinikum Ingolstadt) in the Munich area.

GSK Bipolar25: The Bipolar collection includes DSM-IV Bipolar cases and controls from subjects recruited at 3 study sites: the Institute of Psychiatry (IOP) in London, U.K.; the Centre for Addiction and Mental Health in Toronto, Canada; and the University of Dundee, U.K.

GSK Lolipop26: The London Life Sciences Prospective Population (LOLIPOP) is a population based study including Indian Asian and European white men and women recruited from the lists of 58 General Practitioners in West London.

GSK Medstar27: The MedStar cohort includes cases with acute coronary syndrome or chronic coronary artery disease from Washington DC, and unaffected controls.

PennCath27: The Penn-CATH cohort is a University of Pennsylvania Medical Center based angiographic study, from which cases with coronary artery disease (CAD) and controls with no evidence of CAD at the coronary angiography were derived.

EPIC28: The EPIC-Obesity cohort is a case-control cohort for obesity drawn from the EPIC-Norfolk cohort, which includes white European men and women aged 39–79 years recruited in Norfolk, UK.

KORA29: The Co-operative Health Research in the Region of Augsburg (KORA) study is an epidemiological survey of the general population living in the city of Augsburg, Southern Germany, and two adjacent counties.

WTCCC HT30: The WTCCC-HT collection comprises severely hypertensive probands ascertained from families with multiple affected members in the UK as part of the BRIGHT study.

WTCCC CAD, WTCCC CD, WTCCC RA30: include patients with Coronary Artery Disease, Chrohn’s disease and Rheumatoid Arthritis from the Wellcome Trust Case Control Consortium Study.

POPGEN study31: The Population Genetic Cohort (POPGEN) is a cross sectional epidemiological surveys of regional German populations from Schleswig-Holstein, northern Germany.

SHIP Study32: The Study of Health in Pomerania (SHIP) is a longitudinal, population-based survey from West Pomerania, Germany. Data from the baseline cohort were used for this study.

VIS Study33: This study includes unselected Croatians, aged 18–93 years, recruited from the villages of Vis and Komiza on the Dalmatian island of Vis.

ORCADES Study34: The Orkney Complex Disease Study is a family-based, cross-sectional study that seeks to identify genetic factors influencing cardiovascular and other disease risk in the population isolate of the Orkney Isles in northern Scotland.

KORCULA Study35: The KORCULA study includes healthy volunteers aged 18 and over from the villages of Lumbarda, Žrnovo, and Račišće on the Island of Korcula, Croatia.

SardiNIA Study36: The SardiNIA is a population-based longitudinal cohort study that includes male and female related individuals, aged 14 years and above, from a cluster of four towns in the Ogliastra province of Sardinia, Italy.

Genotyping, quality control and imputation

Supplementary Table 1 lists the various genotype platforms used for each cohort, genotype calling algorithms, SNP and sample quality control, and details of the imputation and association analysis software used. The quality control measures from previous analyses of each cohort were adopted for this study and are detailed in the table. We used NCBI Build 36 co-ordinates for SNP base-pair positions so that all the cohorts could be combined seamlessly.

We imputed all SNPs reported in the CEU sample in HapMap Phase II using various imputation algorithms12,37 (see the URL section for a link to the software ProbABEL). Imputations were performed after excluding samples and SNPs that did not meet the study-specific quality control criteria. Genotypes were imputed for SNPs not present in the genome-wide arrays or for those where genotyping had failed to meet the QC criteria.

Only imputed SNPs with good imputation quality were included in the meta-analysis. This was defined as proper_info≥0.5 (for studies analysed with IMPUTE/SNPTEST12) or rsq-hat≥0.5 (for studies analysed using MACH37) and Imp_info≥0.5 (for studies analysed using ProbABEL).

Derivation of smoking phenotypes

We used the categorical SQ levels defined by Thorgeirsson et al.7. The SQ levels were 0 (1–10 cigarettes per day), 1, (11–20), 2 (21–30) and 3 (31 or more). Each increment represents an increase in SQ of 10 cigarettes per day. Most of the cohorts in our study have maximal CPD recorded on each sample but a few have collected average CPD (Supplementary Table 1). We examined the distributions of CPD across cohorts and found no large differences between those cohorts with average or maximal CPD. The mean and standard deviation of the CPD measurements in each cohort are given in Supplementary Table 1. The Ever/Never and Current/Non-current phenotypes used were those collected by the individual cohorts. Not all cohorts had all three phenotypes collected. Precise details of the phenotypes collected in each cohort are given in Supplementary Table 1. An assessment would typically be questionnaire-based, following a structure such as:

  • Tick the option that best describes you:
    • -
      I smoke now
    • -
      I don’t smoke now. I have stopped for … years.
    • -
      I have never smoked
  • About how many cigarettes do you or did you smoke per day?
  • Put the number of years you have smoked.

Statistical Analysis and Meta-analysis

Each cohort was analyzed separately for each of the 3 phenotypes considered. The majority of the analysis was carried out on the raw genotype data in Oxford but some cohorts (SardiNIA, VIS, KORCULA, ORCADES, SHIP) carried out their own analysis and submitted results for the meta-analysis. For the binary traits (Ever/Never, Current/Non-Current) tests for additive genetic effects on the logodds scale were carried out using logistic regression. For the categorical SQ phenotype, tests for additive genetic effects were carried out on a linear scale using linear regression. The programs SNPTEST, probABEL and MERLIN were used on the various cohorts to fit these models taking account of the genotype uncertainty at imputed SNPs. All tests conditioned on Sex and Age and for some cohorts other covariates of self-reported ancestry, country of origin or PCA-derived covariates were included (a complete list is given in Supplementary Table 1). A Genomic Control (GC) lambda estimate was calculated for each phenotype and each cohort (Supplementary Table 3).

The meta-analysis was carried out by combining study-specific β-estimates using a fixed effects model14 using the inverse of the variance of the study-specific β-estimates to weight the contribution of each study. The variance of each cohort’s β-estimate was multiplied by the GC lambda estimate to correct for observed inflation38. Specifically,

βMETA=iβi/(λiσi2)i1/(λiσi2),  σMETA=1i1/(λiσi2),  ZMETA=βMETAσMETA,

where βi βi,σi2 and λi are the β-estimate, β-estimate variance and GC lambda estimate for the ith cohort. This method is appropriate when the same phenotype and measurement scale are used in each cohort and has the advantage that measures of effect size (eβ is an estimate of the Odds Ratio of the risk allele) and its standard error can be calculated. We also repeated the analysis of SQ by combining Z-scores from each cohort weighted by their sample size38 and obtained almost identical results. All meta-analysis was carried out using the SNPMETA program (see URL list). After performing each meta-analysis the overall lambda estimate for each phenotype was: SQ 1.0145, Ever/Never 1.002, Current/Non-Current 0.998. For each SNP we also calculated a p-value for the heterogeneity across the studies38.

SNP selection for replication

In collaboration with two other groups carrying out similar meta-analysis of smoking related traits (ENGAGE15 and TAG16) we agreed to an in-silico replication strategy in which for each phenotype (SQ, EVER/NEVER, CURRENT/NON-CURRENT) each group would select 15 regions of the genome showing evidence for association, and summary data (p-values, β-estimate, β-estimate variances, sample sizes, GC-lambda estimates and sample sizes) would be shared across groups to facilitate replication. We selected the top 15 regions for each phenotype based on the p-values we obtained in our own meta-analysis. We excluded regions in which only a small number of cohorts contributed to the study because the information measure at the SNPs in the excluded cohorts were below our thresholds, or where the heterogeneity between the studies was high. Each selected region consisted of several SNPs showing evidence of association in our meta-analysis with p-values below 1e-5. For each of the three phenotypes the results from all the cohorts in all three concurrent studies were combined together using the same GC-corrected inverse-variance meta-analysis method described above. A full list of the selected regions and the summary information from all 3 phenotypes is given in Supplementary Table 2.

1000 Genomes imputation analysis of the 15q22 associated region for SQ

We used 108 estimated CEU haplotypes from the April 2009 release of the 1000 Genomes Pilot 1 data to carry out our fine-mapping experiments at the 15q25 locus (see the URL list for a link to the data source). We used these haplotypes to carry out imputation in the interval 76.4–77.0Mb on chr15 in 12 of the cohorts (GSK-Bipolar, GSK-Unipolar, GSK-COPD, KORA, POPGEN, Lausanne, GSKLolipop, GSK-GEMS, Medstar, SHIP, WTCCC-CAD and WTCCC-HT) using the program IMPUTE12. This release contains 2189 SNPs in this interval compared to 437 SNPs in release 22 of the HapMap data. Meta-analysis of the imputed data was then carried out in the same way as described above. An important technical detail when carrying out imputation using the 1000 Genomes haplotype data is how to align it with the genotype data from genome-wide studies. The program IMPUTE aligns SNPs between the haplotype and genotype data based on base-pair position (and not using SNP identifiers such as rs IDs) so as long as the same co-ordinate system is used for both the haplotype and genotype data the alignment is automatic.

Conditional analysis and modeling

The analysis conditional upon SNPs was carried out using all of the centrally analyzed cohorts (Bipolar, Unipolar, COPD, KORA, POPGEN, Lausanne, LOLIPOP, GEMS, MEDSTAR, SHIP, WTCCC-CAD and WTCCC-HT). At the SNP being conditioned upon we used expected genotype counts as this allowed us to combine data from cohorts which had imputed the SNP and cohorts which had genotyed the SNP. These expected counts where included into the baseline null model as an additional covariate along with the other covariates such as Age, Sex and covariates coding for population structure. The same method was used when conditioning upon two SNPs. The model selection analysis of the two pairs of SNPs in the 15q25 region was carried out using the expected genotype counts. Analysis was carried out using the R statistical package.

Supplementary Material


GlaxoSmithKline plc (GSK), a pharmaceuticals company that is interested to develop novel cessation therapies for smoking, funded a postdoctoral fellowship for JL at Oxford University. GSK also funded the collection, characterization, and in some cases the genotyping and genotype data preparation for several of the cohorts used in this study. Allen Roses and Paul Matthews played crucial roles in establishing and funding the Medical Genetics activities at GSK. Acknowledgements that are specific to individual cohorts are given in the Supplementary Note.



JZL carried out most of the analysis for this study. JM and CF conceived and directed this study and wrote the manuscript. FT, DMW and VM were involved in study design and helped to coordinate the inclusion of many of the GSK cohorts. SGP, PM, LM, WB, CWK, XY, GW, PV, MP, NJW, JHZ, RJFL, IB, KK, SG, PB, RM, AK, RM, JBV, JS, JLK, AF, PM, RD, KM, PB, AG, SL, MI, TB, SH, HEW, RR, ND, CL, OP, LZ, JH, SC, JK, JCC, MSB, JMD, ADP, KMK. LS, JML, RW, SE, JFW, SHW, HC, VV, MPR, ML, LQ, RW, WM, HHH, DJR, AF, MW, AS, MU, AT, XX, FB, PS, DS, DStC, DR, GRA, HJG, AT, HV, AP, UJ, IR, CH, AFW, IK. BJW, JRT, AJB, ASH, NJS, CAA, TA, CGM, MP, JS, MC, PBM, MF, AD, JW, WT, SE, AB, & WTCCC prepared and shared datasets and, in some cases, cohort-specific results from their own primary analysis.


FT, CF, DMW, VM, PM, SGP, CWK are/were full time employees of the company GlaxoSmithKline (GSK). GSK also funded several aspects of the study as detailed in ACKNOWLEDGEMENTS. There were no competing interests arising from GSK’s involvement in this study.


ProbABEL software: http://mga.bionet.nsc.ru/yurii/ABEL/

SNPMETA software: http://www.stats.ox.ac.uk/~marchini/software/gwas/snpmeta.html

1000 Genomes Project: http://www.1000genomes.org/

April 2009 release of the 1000 Genomes Pilot 1 data: ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/pilot_data/release/2009_04/

UCSC Genome Browser: http://genome.ucsc.edu/


1. Ezzati M, Lopez AD, Rodgers A, Vander Hoorn S, Murray CJ. Selected major risk factors and global and regional burden of disease. Lancet. 2002;360:1347–1360. [PubMed]
2. Li MD. The genetics of nicotine dependence. Curr.Psychiatry Rep. 2006;8:158–164. [PubMed]
3. Benowitz NL. Neurobiology of nicotine addiction: implications for smoking cessation treatment. Am J Med. 2008;121:S3–S10. [PubMed]
4. Berrettini W, et al. [alpha]-5//[alpha]-3 nicotinic receptor subunit alleles increase risk for heavy smoking. Mol Psychiatry. 2008;13:368–373. [PMC free article] [PubMed]
5. Bierut LJ, et al. Novel genes identified in a high-density genome wide association study for nicotine dependence. Human Molecular Genetics. 2007;16:24–35. [PMC free article] [PubMed]
6. Li MD. Identifying susceptibility loci for nicotine dependence: 2008 update based on recent genome-wide linkage analyses. Hum Genet. 2008;123:119–131. [PubMed]
7. Thorgeirsson TE, et al. A variant associated with nicotine dependence, lung cancer and peripheral arterial disease. Nature. 2008;452:638–642. [PubMed]
8. Caporaso N, et al. Genome-wide and candidate gene association study of cigarette smoking behaviors. PLoS One. 2009;4:e4653. [PMC free article] [PubMed]
9. Amos CI, et al. Genome-wide association scan of tag SNPs identifies a susceptibility locus for lung cancer at 15q25.1. Nat Genet. 2008;40:616–622. [PMC free article] [PubMed]
10. Hung RJ, et al. A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature. 2008;452:633–637. [PubMed]
11. Pillai SG, et al. A genome-wide association study in chronic obstructive pulmonary disease (COPD): identification of two major susceptibility loci. PLoS Genet. 2009;5:e1000421. [PMC free article] [PubMed]
12. Marchini J, Howie B, Myers S, McVean G, Donnelly P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet. 2007;39:906–913. [PubMed]
13. Frazer KA, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. [PMC free article] [PubMed]
14. Normand SL. Meta-analysis: formulating, evaluating, combining, and reporting. Stat Med. 1999;18:321–359. [PubMed]
15. ENGAGE Smoking Consortium. Sequence variants at CHRNB3-CHRNA6 and CYP2A6 influence smoking behavior. (Submitted). [PMC free article] [PubMed]
16. Tobacco and Genetics Consortium. Meta-analyses of genomewide association studies implicate loci on chromosomes 9, 11 and 15 for smoking behavior. (Submitted).
17. Falvella FS, et al. Transcription deregulation at the 15q25 locus in association with lung adenocarcinoma risk. Clin Cancer Res. 2009;15:1837–1842. [PubMed]
18. Wang JC, et al. Risk for nicotine dependence and lung cancer is conferred by mRNA expression levels and amino acid change in CHRNA5. Hum. Mol. Genet. 2009;18:3125–3135. [PMC free article] [PubMed]
19. Wang JC, et al. Genetic variation in the CHRNA5 gene affects mRNA levels and is associated with risk for alcohol dependence. Mol Psychiatry. 2008 [PubMed]
20. Schwarz G. Estimating the dimension of a model. Annals of Statistics. 1978;6:461–464.
21. Ioannidis JP, Thomas G, Daly MJ. Validating, augmenting and refining genome-wide association signals. Nat Rev Genet. 2009;10:318–329. [PubMed]
22. Stirnadel H, et al. Genetic and phenotypic architecture of metabolic syndrome-associated components in dyslipidemic and normolipidemic subjects: the GEMS Study. Atherosclerosis. 2008;197:868–876. [PubMed]
23. Firmann M, et al. The CoLaus study: a population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc Disord. 2008;8:6. [PMC free article] [PubMed]
24. Muglia P, et al. Genome-wide association study of recurrent major depressive disorder in two European case-control cohorts. Mol Psychiatry. 2008 [PubMed]
25. Scott LJ, et al. Genome-wide association and meta-analysis of bipolar disorder in individuals of European ancestry. Proc Natl Acad Sci U S A. 2009;106:7501–7506. [PMC free article] [PubMed]
26. Chahal NS, et al. Ethnicity-Related Differences in Left Ventricular Function, Structure and Geometry: A Population Study of UK Indian Asians and European Whites. Heart. 2009 [PubMed]
27. Kathiresan S, et al. Genome-wide association of early-onset myocardial infarction with single nucleotide polymorphisms and copy number variants. Nat Genet. 2009;41:334–341. [PMC free article] [PubMed]
28. Day N, et al. EPIC-Norfolk: study design and characteristics of the cohort. European Prospective Investigation of Cancer. Br J Cancer. 1999;80(Suppl 1):95–103. [PubMed]
29. Wichmann HE, Gieger C, Illig T. KORA-gen--resource for population genetics, controls and a broad spectrum of disease phenotypes. Gesundheitswesen. 2005;67:S26–S30. [PubMed]
30. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. [PMC free article] [PubMed]
31. Krawczak M, et al. PopGen: population-based recruitment of patients and controls for the analysis of complex genotype-phenotype relationships. Community Genet. 2006;9:55–61. [PubMed]
32. John U GB, Hensel E, Lüdemann J, Piek M, Sauer S, Adam C, Born G, Alte D GE, Haertel U, Hense H-W, Haerting J, Willich S, Kessler C. Study of Health in Pomerania (SHIP): a health examination survey in an east German region: objectives and design. Soz-Präventivmed. 2001;46:186–194. [PubMed]
33. Vitart V, et al. SLC2A9 is a newly identified urate transporter influencing serum urate concentration, urate excretion and gout. Nat Genet. 2008;40:437–442. [PubMed]
34. McQuillan R, et al. Runs of homozygosity in European populations. Am J Hum Genet. 2008;83:359–372. [PMC free article] [PubMed]
35. Zemunik T, et al. Genome-wide association study of biochemical traits in Korcula Island, Croatia. Croat Med J. 2009;50:23–33. [PMC free article] [PubMed]
36. Pilia G, et al. Heritability of cardiovascular and personality traits in 6,148 Sardinians. PLoS Genet. 2006;2:e132. [PMC free article] [PubMed]
37. Li Y, Abecasis GR. Mach 1.0: Rapid Haplotype Reconstruction and Missing Genotype Inference. Am J Hum Genet. 2006;S79:2290.
38. de Bakker PI, et al. Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum Mol Genet. 2008;17:R122–R128. [PMC free article] [PubMed]
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...