Logo of ajhgLink to Publisher's site
Am J Hum Genet. 2009 Nov 13; 85(5): 679–691.
PMCID: PMC2775843

A Genome-wide Association Study of Lung Cancer Identifies a Region of Chromosome 5p15 Associated with Risk for Adenocarcinoma


Three genetic loci for lung cancer risk have been identified by genome-wide association studies (GWAS), but inherited susceptibility to specific histologic types of lung cancer is not well established. We conducted a GWAS of lung cancer and its major histologic types, genotyping 515,922 single-nucleotide polymorphisms (SNPs) in 5739 lung cancer cases and 5848 controls from one population-based case-control study and three cohort studies. Results were combined with summary data from ten additional studies, for a total of 13,300 cases and 19,666 controls of European descent. Four studies also provided histology data for replication, resulting in 3333 adenocarcinomas (AD), 2589 squamous cell carcinomas (SQ), and 1418 small cell carcinomas (SC). In analyses by histology, rs2736100 (TERT), on chromosome 5p15.33, was associated with risk of adenocarcinoma (odds ratio [OR] = 1.23, 95% confidence interval [CI] = 1.13–1.33, p = 3.02 × 10−7), but not with other histologic types (OR = 1.01, p = 0.84 and OR = 1.00, p = 0.93 for SQ and SC, respectively). This finding was confirmed in each replication study and overall meta-analysis (OR = 1.24, 95% CI = 1.17–1.31, p = 3.74 × 10−14 for AD; OR = 0.99, p = 0.69 and OR = 0.97, p = 0.48 for SQ and SC, respectively). Other previously reported association signals on 15q25 and 6p21 were also refined, but no additional loci reached genome-wide significance. In conclusion, a lung cancer GWAS identified a distinct hereditary contribution to adenocarcinoma.

Main Text

Recently, three genome-wide association studies (GWAS) of lung cancer and subsequent pooled GWAS analyses identified inherited susceptibility variants on chromosome 15q25,1–3 5p15,4–6 and 6p21.5 Lung cancer is classified into two main histologic groups: small cell lung cancer (SC) and non-small cell lung cancer; the latter includes adenocarcinoma (AD) and squamous cell carcinoma (SQ), along with rarer subtypes. Worldwide, adenocarcinoma is the most frequently identified histologic type, and the relative proportion of lung cancer due to this histology has steadily risen. Demographic, etiologic, clinical, and molecular characteristics of the lung cancer subtypes have been reported.7 Although family history of lung cancer has been associated with histologic subtypes,8–11 the inherited susceptibility factors that affect specific histologies are unknown.

We conducted a GWAS in 5739 lung cancer cases and 5848 controls (National Cancer Institute [NCI] GWAS) to search for overall susceptibility variants and variants associated with specific histologic types and smoking status. We also conducted a meta-analysis of the NCI GWAS with summary data from ten additional studies, for a total of 13,300 primary lung cancer cases and 19,666 controls, all of European descent. Four of the ten studies provided information on histology for replication analyses; 3333 AD, 2589 SQ, and 1418 SC cases were analyzed overall.

The 11,587 subjects in the NCI GWAS were drawn from one population-based case-control study and three cohort studies (Table 1); specifically: the Environment and Genetics in Lung Cancer Etiology (EAGLE),12 a population-based case-control study including 2100 primary lung cancer cases and 2120 healthy controls enrolled in Italy between 2002 and 2005; the Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study (ATBC),13 a randomized primary prevention trial including 29,133 male smokers enrolled in Finland between 1985 and 1993; the Prostate, Lung, Colon, Ovary Screening Trial (PLCO),14 a randomized trial including 150,000 individuals enrolled in ten U.S. study centers between 1992 and 2001; and the Cancer Prevention Study II Nutrition Cohort (CPS-II),15 including over 183,000 subjects enrolled by the American Cancer Society between 1992 and 2001 across all U.S. states. Analyses stratified by histology in the NCI GWAS included 1730 AD cases, 1400 SQ cases, 678 SC cases, and groups of other histological types or of mixed histologies. These studies were approved by the individual institutional review boards of each location, and each subject gave his or her informed consent for participation.

Table 1
Studies Included in the Genome-wide Association Analysis of Lung Cancer

The meta-analysis included all of the NCI GWAS data plus summary data from ten additional studies contributing 7561 cases and 13,818 controls (Table 1): (1) the UK study from the Institute for Cancer Research,5 including lung cancer cases from the Genetic Lung Cancer Predisposition Study established in 1999 and controls from the 1958 birth cohort;16 (2) the International Agency for Research on Cancer (IARC) study in central Europe,1 a hospital-based case-control study conducted in the Czech Republic, Hungary, Poland, Romania, Russia, and Slovakia between 1998 and 2002; (3) the Texas case-control study,2 including cases newly diagnosed at the University of Texas M.D. Anderson Cancer Center since 1991 and controls from the Kelsey-Seybold clinics (the GWAS included only smokers and cases with non-small cell lung cancer); (4) the population-based case-control study from deCODE Genetics in Iceland,3 including all Icelandic subjects originally recruited for different genetic studies between 1996 and 2007 at deCODE Genetics and lung cancer cases recruited from the Icelandic Cancer Registry since 1998; (5) the Helmholtz-Gemeinschaft Deutscher Forschungszentren (HGF) lung cancer GWA study,17 including lung cancer cases diagnosed at ≤ 50 years from the LUng Cancer in the Young (LUCY) study, a multicenter study within 31 German hospitals, and the Heidelberg lung cancer study, a hospital-based case-control study conducted by the German Cancer Research Center (DKFZ) (controls were selected from the Cooperative Health Research in the Region of Augsburg [KORA]); (6) the Carotene and Retinol Efficacy Trial (CARET) cohort,18 including smokers with a smoking history of at least 20 pack-years enrolled in six U.S. centers between 1983 and 1994; (7) the HUNT2/Tromso study, including lung cancer cases and controls from the North Trondelag Health Study (HUNT 2),19 a population-based study conducted between 1995 and 1997 in North Trondelag County, and the Tromsø IV population-based study conducted in Tromsø County between 1994 and 1995; 8) the lung cancer study from Canada,1 including lung cancer cases recruited at the University of Toronto and the Samuel Lunenfeld Research Institute between 1997 and 2002 and GWAS controls randomly selected from family medicine clinics; 9) the lung cancer study from France,20 a hospital-based case-control study including smoking cases and controls recruited between 1988 and 1992 in ten French hospitals; and 10) the lung cancer study from Estonia, a hospital-based case-control study including lung cancer cases enrolled between 2002 and 2006 in Estonian hospitals and controls randomly selected from the Estonian Genome Project population-based cohort.21

Three studies (the Texas,2 deCODE,3 and HGF German17 studies) also contributed summary data from genome-wide scans stratified by histology, including 1138 AD, 578 SQ, and 210 SC cases. The UK study5 contributed data on the top single nucleotide polymorphisms (SNPs) of chromosome 5p15.33 by histology. These four studies contributed 1603 AD, 1189 SQ, and 740 SC cases to the meta-analysis by histology for this locus.

In both the NCI GWAS and the studies in the meta-analysis, the lung cancer diagnosis was based on clinical criteria and confirmed by pathology reports from surgery, biopsy, or cytology samples in approximately 95% of cases and on clinical history and imaging for the remaining 5%. Tumor histology was coded according to the International Classification of Diseases for Oncology. In analyses stratified by histology, only adenocarcinoma, squamous cell carcinoma, and small cell carcinoma cases were included. All mixed subtypes or other histologies were excluded. Overall, between 10% and 50% of all diagnoses from the NCI GWAS were centrally reviewed by expert lung pathologists from NCI.

The NCI GWAS scan was conducted at two institutions: the Center for Inherited Disease Research (CIDR), which genotyped all EAGLE and 1675 PLCO subjects, and the Core Genotyping Facility (CGF), NCI, which genotyped ATBC, CPS-II, and the remaining PLCO subjects. Controls from the Cancer Genetic Markers of Susceptibility (CGEMS) prostate cancer scan22 were also included.

EAGLE samples and 1675 PLCO samples were genotyped at CIDR, as part of the Gene Environment Association Studies Initiative (GENEVA) funded through the National Human Genome Research Institute, with the use of Illumina HumanHap550v3_B BeadChips (Illumina, San Diego, CA, USA). Data were released for 5620 of 5727 (98%) samples, including 32 blind duplicates (concordance was 99.993%); these were genotyped with 124 HapMap controls (66 CEU; 58 YRI). Allele cluster definitions per SNP were determined with the use of the Illumina BeadStudio Genotyping Module version 3.1.14 and the combined intensity data from 95% of the samples. The resulting cluster definitions were used on all samples. Genotypes were not called if the quality threshold (Gencall score) was below 0.15. Genotypes were released by CIDR for 560,505 (99.83% of attempted) SNPs. Genotypes were not released for SNPs not called by BeadStudio or for those with call rates less than 85%, more than one HapMap replicate error, more than a 3% (autosomal) or 5% (X chromosome) difference in call rate between genders, or more than 0.5% male AB frequency for the X chromosome. The mean non-Y chromosome SNP call rate and mean sample call rate were each 99.8% for the CIDR data set.

Similar procedures were followed at CGF for the ATBC, CPSII, and PLCO cohorts with the use of three Illumina platforms: the HumanHap550K, the HumanHap610, and HumanHap 1 Million chips. All genotyped samples passed quality control metrics at CGF. After removal of assay and locus as a result of low completion rates, genotypes for each sample that appeared in duplicate were merged to form consensus genotypes for each subject. There were 12,111 study subjects available for subsequent analysis. Table S1, available online, shows the distribution of subjects by study and phenotype after application of quality control (QC) metrics. Figure S1 shows the cluster plot for the most notable SNP, rs2736100.

A total of 221 pairs of samples were identified with >70% genotype concordance rate. Among them, 189 pairs were expected duplicates and had genotype concordance rates > 99.9%. There were 12 unexpected duplicates (cross or within studies) with >99.97% concordance rates. We evaluated the pairwise concordance on the basis of the entire set and observed 40 pairs of subjects with over 60% of concordant genotypes (genotype concordance > 60%). Exclusions are listed in Table S2. Deviations from Hardy-Weinberg proportions (HWP) were assessed in controls. Expected and observed p values were calculated with the use of the uniform distribution for all loci and the exact test, respectively. Autosomal SNPs with minor allele frequencies (MAFs) >5% and completion rates >95% were included. Deviation from HWP was minimal, and only loci with extremely low p values (p < 10−7) for each QC group were excluded from further analyses (Table S3). A quantile-quantile (Q-Q) plot of the p values per study is shown in Figure S2.

To assess population structure, we estimated imputed continental ancestry by using the STRUCTURE program,23 with a set of 12,898 autosomal SNPs with low local background linkage disequilibrium (LD) (pairwise r2 < 0.004 measured in the population of European ancestry for any pair of SNPs less than 500 kb apart)24 (Figure S3). Genotypes from the three HapMap populations (Build 22 for HapMap II with MAF > 5%)25 were used as reference populations. The number of inferred clusters (“K” parameter) was set to 3 for CEU, YRI, and JPT+CHB samples representing populations of European, African, and Asian origin, respectively. Eighteen subjects were detected as having less than 80% European ancestry and were excluded.

Principal component analysis (PCA) for each study group (excluding subjects with less than 80% European ancestry, unexpected duplicates, and potential relative pairs) was performed with the same informative 12,898 SNPs with the use of the EIGENSTRAT program26 (Figures S4A–S4D). After adjustment for significant principal components (PCs) in each study, comparison of observed and expected distributions showed no evidence for large-scale inflation of the association test statistics (inflation factor λ = 1.03, 103, 1.01, and 1.01 in EAGLE, PLCO, CPS-II, and ATBC, respectively), excluding the possibility of significant hidden population substructure. Q-Q plots for each NCI study are shown in Figures S5A–S5D.

After excluding 183 subjects for the reasons described above (summarized in Table S2) and 337 subjects with incomplete phenotype data, we report analyses on 515,922 SNPs in 5739 lung cancer cases and 5848 controls (NCI GWAS, Table 1).

Comparable QC procedures were conducted at each institution that provided summary results for the meta-analysis.1–5

For the genome-wide analysis of the NCI GWAS, we used unconditional logistic regression to derive a per-allele odds ratio (OR) and an associated 1 degree of freedom (df) association test adjusted for age in five-year intervals (defined as age at diagnosis or interview for the case-control study and as baseline age for cohort studies), gender, study (EAGLE, PLCO, ATBC, ACS), and four PCs for population stratification within studies (see description of PC analysis below). In additional analyses, we adjusted for smoking status (current, former, never), cigarettes smoked per day (≤ 10, 11–20, 21–30, 31–40, 41+), duration in 10 yr intervals, and number of years since quitting (1–5, 6–10, 11–20, 21–30, 30+) for former smokers (subjects who quit smoking at least 6 mo before participating in the study). The analyses with single and multiple SNPs stratified by histology, smoking status, and decade of birth were conducted with the use of the same models. Tests for interaction between a SNP (coded as a continuous variable) and smoking status or birth decade (coded with the use of dummy variables) were performed with Wald tests with the use of multiple dfs.

For the meta-analysis with other studies, we obtained per-allele ORs and standard errors from each study. Because only summary data were available, we conducted the meta-analysis in two separate groups: “Set 1 SNPs” included a core of 279,698 SNPs that were available across all studies; and “Set 2 SNPs” included 197,647 SNPs that were available only for a subset of the studies that used the HumanHap500 or denser genomic platforms or provided summary data on imputed SNPs. We obtained meta-analysis estimates of per-allele ORs and associated p values by using the weighted Z-score method under a fixed effect model.27 Tests for heterogeneity by study were performed with the use of the QE statistics, assuming a random effect model. For testing of heterogeneity across histologic subtypes, we reported the smallest p values obtained from pairwise case-case analyses between the subtypes after adjustment for multiple testing with the use of the Bonferroni correction. All odds-ratios were reported with respect to the minor allele in the pooled set of controls from all studies that contributed to the meta-analysis.

For adjustment of population stratification, we used the same set of 12,898 autosomal informative SNPs24 used for QC. We conducted PCA in each of the four study groups (EAGLE, PLCO, ATBC, and CPS-II) separately.27 For each study group, we identified among the top ten PCs the ones on which lung cancer cases and controls were not distributed evenly (the Wilcoxon rank-sum test p < 0.1), and adjusted them as continuous covariates nested within each study group. We selected two PCs for PLCO and one for each EAGLE and ATBC. No PCs were selected for the CPS-II study. Of note, we replicated the major analyses by using PCs obtained from all SNPs and found virtually identical results (data not shown).

Genome-wide analysis of the NCI GWAS confirmed previously reported findings on 5p15, 15q25, and 6p21, but did not conclusively identify novel susceptibility loci for lung cancer risk. Similar results were obtained in the overall meta-analysis combining the NCI GWAS with results from the ten additional studies, for a total of over 30,000 subjects (Figure 1 and Table S5).

Figure 1
“Manhattan Plot” of Meta-Analysis Results for Lung Cancer Susceptibility Loci

The analysis of the locus on the 5p15.33 region4–6 provided notable findings. In the overall analysis of the NCI GWAS, the most prominent SNPs were rs4635969 (OR = 0.87, 95% confidence interval [CI] = 0.82–0.93, p = 9.80 × 10−5), followed by rs31489 in CLPTM1L (also named CCR9 [MIM 612585]) (OR = 0.90, 95% CI = 0.86–0.95, p = 2.80 × 10−4) and rs2736100 in TERT (MIM 187270) (OR = 1.09, 95% CI = 1.03–1.15, p = 0.001) (Table 2). rs2736100 showed discrepant results by smoking status (Table 2), which reflected different associations across histologies. In fact, this locus showed remarkable differences by histology (Figure 2). The rs2736100 SNP was associated only with AD (OR = 1.23, 95% CI = 1.13–1.33, p = 3.02 × 10−7), but not the other histologic types (OR = 1.01, p = 0.84 and OR = 1.00, p = 0.93, for SQ and SC, respectively; p = 0.001, test for heterogeneity across histologies corrected for multiple comparisons). The risk estimates were not altered by the adjustment for smoking and were consistent across categories of smoking intensity and decade of birth (data not shown). The analysis by smoking status (Table 2) revealed that the SNP was significantly associated with risk of adenocarcinoma in both smokers and those who had never smoked.

Figure 2
Results of Association Analyses and LD Block for the 5p15.33 Locus by Histology
Table 2
Association of Selected SNPs on 5p15 and 15q25 Loci with Lung Cancer Risk Overall and by Histologic Types and Smoking Status

In the NCI controls, rs2736100 is not in LD with other tested SNPs in the region (Figure 2), with the exception of rs2853676, which is in modest LD (r2 = 0.25, D′ = 0.82) and showed a similar pattern of association (OR = 1.16, p = 3.44 × 10−4; OR = 0.95, p = 0.30; and OR = 1.06, p = 0.41 for AD, SQ, and SC, respectively). In contrast, rs4635969 and rs31489 were associated with lung cancer risk across histology groups (Table 2). The previously reported SNPs in CLPTM1L, e.g., rs4975616, rs402710, and rs401681, which are in LD with rs31489 (r2 = 0.74, 0.63, and 0.85, respectively), showed associations similar to rs31489 (data not shown). rs31489 and rs4635969 are moderately correlated (r2 = 0.35, D′ = 0.95), but not in LD with rs2736100 (r2 = 0.03, D′ = 0.20 for rs31489; r2 = 0.07, D′ = 0.49 for rs4635969). In a multivariate model including the main effects of these three SNPs simultaneously, we found statistically independent associations with AD for both rs4635969 and rs2736100 (OR = 0.81, p = 0.002; OR = 1.21, p = 2.60 × 10−5) but not for rs31489 (OR = 1.03, p = 0.42). Only rs31489 was associated with SQ (OR = 0.82, p = 8.50 × 10−4) and no SNPs were associated with SC (data not shown) in the multivariate model.

In the meta-analysis, rs2736100 was the most notable SNP for lung cancer risk overall (OR = 1.12, 95% CI = 1.08–1.16, p = 1.60 × 10−10), followed by the same set of SNPs observed in the NCI GWAS (Table 2, larger list in Table S5). In analyses by histology, risk associated with rs2736100 was accounted for by cases with adenocarcinoma in each replication study (p < 0.05 in each study; p = 0.59, test for heterogeneity across studies) (Figure 3). Other histologies exhibited no association. In the meta-analysis combining the NCI GWAS data with the data from the Texas, deCODE, HGF, and UK studies, this SNP had OR = 1.24, 95% CI = 1.17–1.31, p = 3.74 × 10−14 for AD versus OR = 0.99, p = 0.69 and OR = 0.97, p = 0.48, for SQ and SC, respectively (Table 2).

Figure 3
Forest Plot Showing Associations between the rs2736100 SNP and Lung Cancer Risk Both Overall and by Histologic Types

These findings provide evidence that the previously identified association of the rs2736100 variant on 5p15.33 with lung cancer risk4,5 is confined to one histologic type, adenocarcinoma. Rs2736100 is located in intron 2 of TERT and, on the basis of the ESPERR score, lies within a putative regulatory region.28 TERT is a ribonucleoprotein that extends TTAGGG nucleotide repeats at the telomere, which shorten with each cell division. Telomere shortening is associated with increased genomic instability and, consequently, increased risk of cancer development.29 In cancer cells, reactivated TERT is linked to cellular proliferation and abnormal telomere maintenance.30

Other studies have suggested that perturbations of TERT may contribute to the pathogenesis of lung adenocarcinoma. TERT expression has been reported to be significantly lower in adenocarcinoma than in any other histological type,31 and its reexpression may indicate progression from bronchiolo-alveolar carcinoma to adenocarcinoma.32,33 TERT has been reported to promote epithelial proliferation,34 whereas telomere maintenance has been implicated in the progression from KRAS-activated adenoma to adenocarcinoma in murine models.35 rs2736100 was recently found in association with lung cancer risk in two populations, with suggestive stronger associations in adenocarcinoma cases, female cases, and case individuals who had never smoked.36,37 Interestingly, rs2736100 has been reported to be associated with susceptibility to sporadic idiopathic pulmonary fibrosis (IPF),38,39 whereas rare mutations in TERT and consequent shorter telomeres are responsible for familial IPF.40 IPF is a lethal disease characterized by massive fibrotic changes and thickening of the alveolar walls in the lung. Members of IPF families frequently develop lung adenocarcinoma, suggesting that these two diseases share a common etiology.41 Furthermore, telomerase dysfunction has been linked to leukemia, bone marrow failure, and cancer predisposition,42–44 and other SNPs in the TERT-CLPTM1L region have been found in association with multiple cancer types,6 suggesting that this region possesses a fundamental key for cancer development. rs2736100 and other SNPs in the locus are close to the location of known mutations that compromise telomerase activity (Figure 4) and may be in LD with mutations that have yet to be identified. Complete sequencing of this locus in adenocarcinoma patients would enable a more comprehensive understanding of the variants associated with this signal.

Figure 4
Location of SNPs and Mutations in the TERT Gene

The NCI GWAS also refined other previously identified susceptibility loci.1–3,5 The strongest evidence for association with lung cancer risk overall and with each histology group was observed for the 15q25.1 locus.1–3 (nicotinic acetylcholine receptor genes), beginning with rs12914385 in CHRNA3 (MIM 118503) (OR = 1.34, 95% CI = 1.27–1.42, p = 5.24 × 10−27) (Table 2; top SNPs with p < 0.0001 in each histologic group are shown in Tables S6–S8). In the meta-analysis including the NCI GWAS and summary results from ten studies, the SNP with the lowest p value was rs1051730 in CHRNA5 (MIM 118505) (OR = 1.31, 95% CI = 1.27–1.36, p = 1.91 × 10−51), whereas rs12914385 had an OR = 1.30, 95% CI = 1.25–1.36, p = 2.75 × 10−38 (Table 2; a longer SNP list in given in Table S5). The same SNPs were also strongly associated with all major histology groups (Table 2).

There were 20 SNPs at this locus with p < 10−7 and 15 after adjustment for multiple smoking variables, suggesting that the association of these SNPs with lung cancer risk is not entirely explained by the association with smoking or that residual confounding by smoking cannot be completely ruled out. In the analysis of the NCI GWAS stratified by smoking status (Table 2), SNPs on 15q25 showed a very strong association in current and former smokers but no association in those who had never smoked, as previously observed in some,2,45 but not all,1 studies. The interaction between these SNPs and smoking status was borderline significant (p = 0.07, p = 0.07, and p = 0.04 for rs12914385, rs1051730, and rs8034191, respectively).

We conducted an analysis stratified by decade of birth (<1930, 1930–1939, 1940–1949, and >1950) to explore whether the complex changes in cigarette composition and population trends in cigarette smoking affected the association between this locus and lung cancer risk. Most subjects in our study were born in the 1930s–1940s and began smoking approximately 15–20 years after birth. Thus, smokers were affected by the progressive adoption of cigarette filters and alterations in tar, nicotine, and nitrosamine content in cigarettes that began in the 1950s.46 For the top SNPs on 15q25, we observed a trend of increasing lung cancer risk with more recent decades of birth both overall and in each histology group, with the exception of the SQ cases (Table 3). Because widespread cigarette smoking generally began a few years later in Europe than in the U.S., we stratified the analyses between American (PLCO and CPS-II) and European (EAGLE and ATBC) studies. As expected, the American studies showed a slightly stronger association (data not shown).

Table 3
Association of Selected SNPs on 15q25 Loci with Lung Cancer Risk by Decade of Birth in the NCI GWAS

One possible explanation for the increasing risk in more recent decades of birth is that the association of chromosome 15q25 SNPs with lung cancer risk is mediated in part by the increasing nitrosamine content in cigarettes over the second half of the 20th century.47,48 Carcinogenic nitrosamines are thought to bind to the nicotinic receptor,49 and carriers of the 15q25 locus are exposed to a higher internal dose of nitrosamines per cigarette than are noncarriers.50 No clear trend by decade of birth was observed for squamous cell carcinoma, possibly because this histologic type is more strongly associated with polycyclic aromatic hydrocarbons than with nitrosamines.47,48,51 However, these findings may be affected by small numbers in subgroups and should be further explored.

A recent study from the UK reported that the 6p21.33 locus is associated with lung cancer risk,5 and a suggestive association for another SNP in the region (rs4324798 on 6p22.1) had been previously shown.1 In the meta-analysis, SNPs at this locus were associated with lung cancer risk (Table S4). However, the association varied by study and was weak in the NCI GWAS and in the studies used in replicating the analysis by histology (Table S4). These SNPs are located near the major histocompatibility complex and thus could be markers for differences in population substructure.

In this study including over 30,000 subjects, no new genomic regions associated with lung cancer risk were identified. This contrasts with studies of other cancer types, such as breast, colon, and prostate cancers, which observed multiple common susceptibility loci in similar or smaller sample sizes.52–54 It is plausible that additional genetic variation for smoking-related cancers will be revealed through the investigation of interaction with tobacco smoking or structural variants. Our findings are based on subjects of European descent only; further work is necessary for the exploration of whether additional loci or different variants in the same loci are associated with lung cancer risk in distinct populations.

In conclusion, we have established that a locus on chromosome 5p15.33 is distinctly associated with risk of lung adenocarcinoma and not with the other major histologic types, providing evidence of histology-specific germ-line susceptibility for lung cancer risk. Defining the distinct hereditary contribution to histological subtypes and their interplay with tobacco smoking is a step on the pathway to the molecular characterization of cancer and could lead to improved measures for early detection or prevention.


The authors wish to thank: the participants and collaborators of EAGLE (listed on the EAGLE website); the staff of the Core Genotyping Facility—specifically Chenwei Liu, Amy Hutchinson, Aurelie Vogt, and Belynda Hicks; the National Center for Biotechnology for assistance with data cleaning; Justin Paschall and Mike Feolo for data manipulation; Teri Manolio and Bruce Weir for the GENEVA initiative; Adam Risch of Information Management Services, Inc. for database support; all participants in the CPS-II Nutrition Cohort; the LUCY-consortium (detail in Sauter et al. 200817); the KORA study group and H. Dienemann, P. Drings, and the staff at the Thoraxklinik Heidelberg for the German study; the participants of the Nord Trondelag Study and the Tromso Study in Norway; and Steve Narod and Frances Shepherd for the Canadian study. K.S., P.S., G.T., and T.R. are employees and shareholders of deCODE Genetics. A list of funding agencies can be found in the Supplemental Data.

Supplemental Data

Document S1. Supplemental Acknowledgments, Five Figures, and Four Tables:
Table S5. Results of Meta-Analysis Including NCI GWAS Data and Data from Ten Additional GWASs:
Table S6. Top SNPs in the Adenocarcinoma Cases from the NCI GWAS:
Table S7. Top SNPs in the Squamous Cell Carcinoma Cases from the NCI GWAS:
Table S8. Top SNPs in the Small Cell Carcinoma Cases from the NCI GWAS:

Web Resources

The URLs for data presented herein are as follows:


1. Hung R.J., McKay J.D., Gaborieau V., Boffetta P., Hashibe M., Zaridze D., Mukeria A., Szeszenia-Dabrowska N., Lissowska J., Rudnai P. A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature. 2008;452:633–637. [PubMed]
2. Amos C.I., Wu X., Broderick P., Gorlov I.P., Gu J., Eisen T., Dong Q., Zhang Q., Gu X., Vijayakrishnan J. Genome-wide association scan of tag SNPs identifies a susceptibility locus for lung cancer at 15q25.1. Nat. Genet. 2008;40:616–622. [PMC free article] [PubMed]
3. Thorgeirsson T.E., Geller F., Sulem P., Rafnar T., Wiste A., Magnusson K.P., Manolescu A., Thorleifsson G., Stefansson H., Ingason A. A variant associated with nicotine dependence, lung cancer and peripheral arterial disease. Nature. 2008;452:638–642. [PubMed]
4. McKay J.D., Hung R.J., Gaborieau V., Boffetta P., Chabrier A., Byrnes G., Zaridze D., Mukeria A., Szeszenia-Dabrowska N., Lissowska J. Lung cancer susceptibility locus at 5p15.33. Nat. Genet. 2008;40:1404–1406. [PMC free article] [PubMed]
5. Wang Y., Broderick P., Webb E., Wu X., Vijayakrishnan J., Matakidou A., Qureshi M., Dong Q., Gu X., Chen W.V. Common 5p15.33 and 6p21.33 variants influence lung cancer risk. Nat. Genet. 2008;40:1407–1409. [PMC free article] [PubMed]
6. Rafnar T., Sulem P., Stacey S.N., Geller F., Gudmundsson J., Sigurdsson A., Jakobsdottir M., Helgadottir H., Thorlacius S., Aben K.K. Sequence variants at the TERT-CLPTM1L locus associate with many cancer types. Nat. Genet. 2009;41:221–227. [PubMed]
7. Gabrielson E. Worldwide trends in lung cancer pathology. Respirology. 2006;11:533–538. [PubMed]
8. Gao Y., Goldstein A.M., Consonni D., Pesatori A.C., Wacholder S., Tucker M.A., Caporaso N.E., Goldin L., Landi M.T. Family history of cancer and nonmalignant lung diseases as risk factors for lung cancer. Int. J. Cancer. 2009;125:146–152. [PMC free article] [PubMed]
9. Li X., Hemminki K. Inherited predisposition to early onset lung cancer according to histological type. Int. J. Cancer. 2004;112:451–457. [PubMed]
10. Ambrosone C.B., Rao U., Michalek A.M., Cummings K.M., Mettlin C.J. Lung cancer histologic types and family history of cancer. Analysis of histologic subtypes of 872 patients with primary lung cancer. Cancer. 1993;72:1192–1198. [PubMed]
11. Sellers T.A., Elston R.C., Atwood L.D., Rothschild H. Lung cancer histologic type and family history of cancer. Cancer. 1992;69:86–91. [PubMed]
12. Landi M.T., Consonni D., Rotunno M., Bergen A.W., Goldstein A.M., Lubin J.H., Goldin L., Alavanja M., Morgan G., Subar A.F. Environment And Genetics in Lung cancer Etiology (EAGLE) study: an integrative population-based case-control study of lung cancer. BMC Public Health. 2008;8:e203. [PMC free article] [PubMed]
13. The ATBC Cancer Prevention Study Group The alpha-tocopherol, beta-carotene lung cancer prevention study: design, methods, participant characteristics, and compliance. Ann. Epidemiol. 1994;4:1–10. [PubMed]
14. Hayes R.B., Sigurdson A., Moore L., Peters U., Huang W.Y., Pinsky P., Reding D., Gelmann E.P., Rothman N., Pfeiffer R.M. Methods for etiologic and early marker investigations in the PLCO trial. Mutat. Res. 2005;592:147–154. [PubMed]
15. Calle E.E., Rodriguez C., Jacobs E.J., Almon M.L., Chao A., McCullough M.L., Feigelson H.S., Thun M.J. The American Cancer Society Cancer Prevention Study II Nutrition Cohort: rationale, study design, and baseline characteristics. Cancer. 2002;94:2490–2501. [PubMed]
16. Power C., Elliott J. Cohort profile: 1958 British birth cohort (National Child Development Study) Int. J. Epidemiol. 2006;35:34–41. [PubMed]
17. Sauter W., Rosenberger A., Beckmann L., Kropp S., Mittelstrass K., Timofeeva M., Wolke G., Steinwachs A., Scheiner D., Meese E. Matrix metalloproteinase 1 (MMP1) is associated with early-onset lung cancer. Cancer Epidemiol. Biomarkers Prev. 2008;17:1127–1135. [PubMed]
18. Omenn G.S., Goodman G., Thornquist M., Grizzle J., Rosenstock L., Barnhart S., Balmes J., Cherniack M.G., Cullen M.R., Glass A. The beta-carotene and retinol efficacy trial (CARET) for chemoprevention of lung cancer in high risk populations: smokers and asbestos-exposed workers. Cancer Res. 1994;54(7, Suppl):2038s–2043s. [PubMed]
19. Holmen J.M.K., Kruger O., Langhammer A., Lingaas Holmen T., Bratberg G.H. The Nord-Trøndelag Health Study 1995-97 (HUNT 2): Objectives, contents, methods and participation. Norweg. J. Epidemiol. 2003;13:19–32.
20. Feyler A., Voho A., Bouchardy C., Kuokkanen K., Dayer P., Hirvonen A., Benhamou S. Point: myeloperoxidase –463G–> a polymorphism and lung cancer risk. Cancer Epidemiol. Biomarkers Prev. 2002;11:1550–1554. [PubMed]
21. Nelis M., Esko T., Magi R., Zimprich F., Zimprich A., Toncheva D., Karachanak S., Pischakova T., Balascak I., Peltonen L. Genetic structure of Europeans: a view from the North-East. PLoS ONE. 2009;4:e5472. [PMC free article] [PubMed]
22. Yeager M., Orr N., Hayes R.B., Jacobs K.B., Kraft P., Wacholder S., Minichiello M.J., Fearnhead P., Yu K., Chatterjee N. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat. Genet. 2007;39:645–649. [PubMed]
23. Pritchard J.K., Stephens M., Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155:945–959. [PMC free article] [PubMed]
24. Yu K., Wang Z., Li Q., Wacholder S., Hunter D.J., Hoover R.N., Chanock S., Thomas G. Population substructure and control selection in genome-wide association studies. PLoS ONE. 2008;3:e2551. [PMC free article] [PubMed]
25. The International HapMap Project Nature. 2003;426:789–796. [PubMed]
26. Price A.L., Patterson N.J., Plenge R.M., Weinblatt M.E., Shadick N.A., Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 2006;38:904–909. [PubMed]
27. Higgins J.P., Thompson S.G. Quantifying heterogeneity in a meta-analysis. Stat. Med. 2002;21:1539–1558. [PubMed]
28. Taylor J., Tyekucheva S., King D.C., Hardison R.C., Miller W., Chiaromonte F. ESPERR: learning strong and weak signals in genomic sequence alignments to identify functional elements. Genome Res. 2006;16:1596–1604. [PMC free article] [PubMed]
29. Feldser D.M., Hackett J.A., Greider C.W. Telomere dysfunction and the initiation of genome instability. Nat. Rev. Cancer. 2003;3:623–627. [PubMed]
30. Fernandez-Garcia I., Ortiz-de-Solorzano C., Montuenga L.M. Telomeres and telomerase in lung cancer. J. Thorac. Oncol. 2008;3:1085–1088. [PubMed]
31. Lantuejoul S., Soria J.C., Moro-Sibilot D., Morat L., Veyrenc S., Lorimier P., Brichon P.Y., Sabatier L., Brambilla C., Brambilla E. Differential expression of telomerase reverse transcriptase (hTERT) in lung tumours. Br. J. Cancer. 2004;90:1222–1229. [PMC free article] [PubMed]
32. Lantuejoul S., Salon C., Soria J.C., Brambilla E. Telomerase expression in lung preneoplasia and neoplasia. Int. J. Cancer. 2007;120:1835–1841. [PubMed]
33. viel-Ronen S., Coe B.P., Lau S.K., da Cunha S.G., Zhu C.Q., Strumpf D., Jurisica I., Lam W.L., Tsao M.S. Genomic markers for malignant progression in pulmonary adenocarcinoma with bronchioloalveolar features. Proc. Natl. Acad. Sci. 2008;105:10155–10160. [PMC free article] [PubMed]
34. Choi J., Southworth L.K., Sarin K.Y., Venteicher A.S., Ma W., Chang W., Cheung P., Jun S., Artandi M.K., Shah N. TERT promotes epithelial proliferation through transcriptional control of a Myc- and Wnt-related developmental program. PLoS Genet. 2008;4:e10. [PMC free article] [PubMed]
35. Sweet-Cordero A., Tseng G.C., You H., Douglass M., Huey B., Albertson D., Jacks T. Comparison of gene expression and DNA copy number changes in a murine model of lung cancer. Genes Chromosomes Cancer. 2006;45:338–348. [PubMed]
36. Jin G., Xu L., Shu Y., Tian T., Liang J., Xu Y., Wang F., Chen J., Dai J., Hu Z. Common genetic variants on 5p15.33 contribute to risk of lung adenocarcinoma in a Chinese population. Carcinogenesis. 2009;30:987–990. [PubMed]
37. Broderick P., Wang Y., Vijayakrishnan J., Matakidou A., Spitz M.R., Eisen T., Amos C.I., Houlston R.S. Deciphering the impact of common genetic variation on lung cancer risk: a genome-wide association study. Cancer Res. 2009;69:6633–6641. [PMC free article] [PubMed]
38. Mushiroda T., Wattanapokayakit S., Takahashi A., Nukiwa T., Kudoh S., Ogura T., Taniguchi H., Kubo M., Kamatani N., Nakamura Y. A genome-wide association study identifies an association of a common variant in TERT with susceptibility to idiopathic pulmonary fibrosis. J. Med. Genet. 2008;45:654–656. [PubMed]
39. Tsakiri K.D., Cronkhite J.T., Kuan P.J., Xing C., Raghu G., Weissler J.C., Rosenblatt R.L., Shay J.W., Garcia C.K. Adult-onset pulmonary fibrosis caused by mutations in telomerase. Proc. Natl. Acad. Sci. USA. 2007;104:7552–7557. [PMC free article] [PubMed]
40. Armanios M.Y., Chen J.J., Cogan J.D., Alder J.K., Ingersoll R.G., Markin C., Lawson W.E., Xie M., Vulto I., Phillips J.A., III Telomerase mutations in families with idiopathic pulmonary fibrosis. N. Engl. J. Med. 2007;356:1317–1326. [PubMed]
41. Wang Y., Kuan P.J., Xing C., Cronkhite J.T., Torres F., Rosenblatt R.L., DiMaio J.M., Kinch L.N., Grishin N.V., Garcia C.K. Genetic defects in surfactant protein A2 are associated with pulmonary fibrosis and lung cancer. Am. J. Hum. Genet. 2009;84:52–59. [PMC free article] [PubMed]
42. Vulliamy T.J., Marrone A., Knight S.W., Walne A., Mason P.J., Dokal I. Mutations in dyskeratosis congenita: their impact on telomere length and the diversity of clinical presentation. Blood. 2006;107:2680–2685. [PubMed]
43. Yamaguchi H., Calado R.T., Ly H., Kajigaya S., Baerlocher G.M., Chanock S.J., Lansdorp P.M., Young N.S. Mutations in TERT, the gene for telomerase reverse transcriptase, in aplastic anemia. N. Engl. J. Med. 2005;352:1413–1424. [PubMed]
44. Calado R.T., Regal J.A., Hills M., Yewdell W.T., Dalmazzo L.F., Zago M.A., Lansdorp P.M., Hogge D., Chanock S.J., Estey E.H. Constitutional hypomorphic telomerase mutations in patients with acute myeloid leukemia. Proc. Natl. Acad. Sci. USA. 2009;106:1187–1192. [PMC free article] [PubMed]
45. Spitz M.R., Amos C., Dong Q., Lin J., Wu X. The CHRNA5-A3 region on chromosome 15q24-25.1 is a risk factor both for nicotine dependence and for lung cancer. J. Natl. Cancer Inst. 2008;100:1552–1556. [PMC free article] [PubMed]
46. Hoffmann D., Djordjevic M.V., Hoffmann I. The changing cigarette. Prev. Med. 1997;26:427–434. [PubMed]
47. Hecht S.S. Tobacco smoke carcinogens and lung cancer. J. Natl. Cancer Inst. 1999;91:1194–1210. [PubMed]
48. Thun M.J., Lally C.A., Flannery J.T., Calle E.E., Flanders W.D., Heath C.W., Jr. Cigarette smoking and changes in the histopathology of lung cancer. J. Natl. Cancer Inst. 1997;89:1580–1586. [PubMed]
49. Schuller H.M., Orloff M. Tobacco-specific carcinogenic nitrosamines. Ligands for nicotinic acetylcholine receptors in human lung cancer cells. Biochem. Pharmacol. 1998;55:1377–1384. [PubMed]
50. Le Marchand L., Derby K.S., Murphy S.E., Hecht S.S., Hatsukami D., Carmella S.G., Tiirikainen M., Wang H. Smokers with the CHRNA lung cancer-associated variants are exposed to higher levels of nicotine equivalents and a carcinogenic tobacco-specific nitrosamine. Cancer Res. 2008;68:9137–9140. [PMC free article] [PubMed]
51. Le Marchand L., Sivaraman L., Pierce L., Seifried A., Lum A., Wilkens L.R., Lau A.F. Associations of CYP1A1, GSTM1, and CYP2E1 polymorphisms with lung cancer suggest cell type specificities to tobacco carcinogens. Cancer Res. 1998;58:4858–4863. [PubMed]
52. Thomas G., Jacobs K.B., Kraft P., Yeager M., Wacholder S., Cox D.G., Hankinson S.E., Hutchinson A., Wang Z., Yu K. A multistage genome-wide association study in breast cancer identifies two new risk alleles at 1p11.2 and 14q24.1 (RAD51L1) Nat. Genet. 2009;41:579–584. [PMC free article] [PubMed]
53. Houlston R.S., Webb E., Broderick P., Pittman A.M., Di Bernardo M.C., Lubbe S., Chandler I., Vijayakrishnan J., Sullivan K., Peneg S. Meta-analysis of genome-wide association data identifies four new susceptibility loci for colorectal cancer. Nat. Genet. 2008;40:1426–1435. [PMC free article] [PubMed]
54. Eeles R.A., Kote-Jarai Z., Giles G.G., Olama A.A., Guy M., Jugurnauth S.K., Mulholland S., Leongamornlert D.A., Edwards S.M., Morrison J. Multiple newly identified loci associated with prostate cancer susceptibility. Nat. Genet. 2008;40:316–321. [PubMed]

Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics
PubReader format: click here to try


Save items

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • BioProject
    BioProject links
  • dbGaP Links
    dbGaP Links
    Genotypes and Phenotypes (dbGaP) studies that cite the current articles.
  • Gene
    Gene records that cite the current articles. Citations in Gene are added manually by NCBI or imported from outside public resources.
  • GEO Profiles
    GEO Profiles
    Gene Expression Omnibus (GEO) Profiles of molecular abundance data. The current articles are references on the Gene record associated with the GEO profile.
  • HomoloGene
    HomoloGene clusters of homologous genes and sequences that cite the current articles. These are references on the Gene and sequence records in the HomoloGene entry.
  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles
  • SNP
    Nucleotide polymorphism records from dbSNP that have current articles as submitter-provided references.
  • Taxonomy
    Taxonomy records associated with the current articles through taxonomic information on related molecular database records (Nucleotide, Protein, Gene, SNP, Structure).
  • Taxonomy Tree
    Taxonomy Tree

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...