Jump to: Authorized Access | Attribution | Authorized Requests

Study Description

This sub-study phs000877 Meta Analysis contains genotype, sequence data, and selected phenotype of subjects available from the phs000877 study. Summary level phenotypes for the NCI Lung Cancer Transdisciplinary Research Cohort study participants can be viewed at the top-level study page phs000876 Lung Cancer Transdisciplinary Research Cohort. Individual level phenotype data and molecular data for all Lung Cancer Transdisciplinary Research Cohort top-level study and sub-studies are available by requesting Authorized Access to the NCI Lung Cancer Transdisciplinary Research Cohort phs000876 study.

The study was conducted under the auspices of the Transdisciplinary Research In Cancer of the Lung (TRICL) Research Team, which is a part of the Genetic Associations and MEchanisms in ONcology (GAME-ON) consortium, and associated with the International Lung Cancer Consortium (ILCCO).

All participants provided written informed consent. All studies were reviewed and approved by institutional ethics review committees at the involved institutions.

Genome-wide association studies
The meta-analysis was based on data from six previously reported lung cancer GWAS of European populations: the MD Anderson Cancer Center lung cancer study (MDACC-GWAS)1; the UK lung cancer GWAS from the Institute for Cancer Research (ICR-GWAS)2; the NCI lung cancer GWAS (NCI-GWAS)3, the IARC lung cancer GWAS (IARC-GWAS)4, the LUCY and Kora Studies from Germany and a hospital based case-control study from the Lunenfeld-Tanenbaum Research Institute of Mount Sinai Hospital and University of Toronto. In each of the studies, SNP genotyping had been performed using Illumina HumanHap 317, 317+240S, 370, 550, 610 or 1M arrays.

The IARC-GWAS4 comprised 3,062 lung cancer cases and 4,455 controls derived from five case-control studies: (i) Carotene and Retinol Efficacy Trial (CARET) cohort5; (ii) The Central Europe multicenter hospital-based case-control6,7; (iii) The hospital-based case-control study from France7; (iv) The hospital based case-control lung cancer study from Estonia8,9; and (v) The population-based HUNT2/Troms IV lung cancer studies10. Patient and control DNAs were derived from EDTA-venous blood samples. The lung cancer patients were classified according to ICD-O-3; SQ: 8070/3, 8071/3, 8072/3, 8074/3; AD: 8140/3, 8250/3, 8260/3, 8310/3, 8480/3, 8560/3, 8251/3, 8490/3, 8570/3, 8574/3; with tumours with overlapping histologies classified as mixed. After applying standardized quality control procedures 2,533 cases and 3,791 controls were included in the current analysis (Table 1).

NCI-GWAS: Details of the NCI-GWAS have been previously reported. Briefly, the study comprised samples from four series: (i) The Environment and Genetics in Lung cancer Etiology (EAGLE), a population-based case-control study of 2,100 lung cancer cases and 2,120 healthy controls enrolled in Italy between 2002 and 200511; cancers were classified according to ICD-O. Histology of ~10% of tumours were confirmed. (ii) The Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study (ATBC), a randomized primary prevention trial of 29,133 male smokers enrolled in Finland between 1985 and 199312; ICD-O-2 and ICD-O-3 was used to classify tumours. Cases diagnosed between 1985 and 1999 had histology reviewed by at least one pathologist. After 1999, histological coding (ICD-O-2 and ICD-O-3) was derived from the Finnish Cancer Registry. (iii) The Prostate, Lung, Colon, Ovary Screening Trial (PLCO), a randomized trial of 150,000 individuals enrolled in ten U.S. study centers between 1992 and 200113; ICD-O-2 was used to classify tumors and quality assurance measures included reabstraction of 50 lung cancer diagnoses per year; (iv) The Cancer Prevention Study II Nutrition Cohort (CPS-II), a cohort study of approximately 184,000 individuals enrolled by the American Cancer Society between 1992 and 1993 in 21 U.S. states of which 109,379 provided a blood (36%) or buccal (64%) sample between 1998 and 200314,15. Tumour histology was abstracted from Certified Tumor Registrars and coded using WHO ICD-O-2 and ICD-O-3. Quality assurance was done by re-abstracting 10% of all cancer diagnoses per year. After initial data control, the NCI-GWAS included 5,739 cases and 5,848 controls; however, an additional 26 cases and 112 controls were excluded due to changes in case status and further quality control filtering. The current meta-analysis included 5,713 lung cancer cases and 5,736 controls from the NCI-GWAS (Table 1).

ICR-GWAS : This comprised 1,952 cases (1,166 male; mean age at diagnosis 57 years, SD 6) with pathologically confirmed lung cancer ascertained through the Genetic Lung Cancer Predisposition Study (GELCAPS) conducted between March 1999 and July 200416. All cases were British residents and self-reported to be of European Ancestry. To ensure that data and samples were collected from bona fide lung cancer cases and avoid issues of bias from survivorship only incident cases with histologically or cytologically (only if not AD) confirmed primary disease were ascertained. Tumours from patients were classified according to ICD-O-3; Specifically, SQ: 8070/3, 8071/3, 8072/3, 8074/3; AD: 8140/3, 8250/3, 8260/3, 8310/3, 8480/3, 8560/3, 8251/3, 8490/3, 8570/3, 8574/3; with tumours with overlapping histologies classified as mixed. Patient DNA was derived from EDTA-venous blood samples using conventional methodologies. Genotype frequencies were compared with publicly accessible data generated by the UK Wellcome Trust Case-Control Consortium 2 (WTCCC2) study17 of individuals from the 1958 British Birth Cohort (58BC) and blood service typed using Illumina Human1.2M-Duo Custom_v1 Array BeadChips.

MDACC-GWAS: Cases and controls were ascertained from a case-control study at the U.T. M.D. Anderson Cancer Center conducted between 1997 and 20071. Cases were newly diagnosed, patients with histologically-confirmed lung cancer presenting at M.D. Anderson Cancer and who had not previously received treatment other than surgery. Clinical and pathological data were abstracted from patient medical records and lung cancer histology was coded according to major histologocial groups. As per ICD-O-2 these groups were, SQ: 8070/3, AD: 8140/3, 8250/3, 8260/3, 8310/3, 8480/3, 8251/3 and 8490/3. Only patients with predominantly or wholly AD or SQ cancers were included; those with mixed histology or unspecified lung cancers, were excluded from the study. Controls were healthy individuals seen for routine care at Kelsey-Seybold Clinics, in the Houston Metropolitan area. Controls were frequency matched to cases according to smoking behaviour, age in 5-year categories, ethnicity, and sex. Former smoking controls were further frequency matched to former smoking cases according to the number of years since smoking cessation (in 5-year categories). After applying quality control data were available on 1,150 cases and 1,134 controls.

HGF Germany: The HGF GWA study was made up of three independent German studies as detailed below: In total 506 incident lung cancer cases (LUCY-study: n=305, Heidelberg lung cancer case-control study: n=201) were compared to 480 population controls (KORA surveys KORA). After excluding individuals with missing values and potentially related individuals 487 cases and 480 controls entered the data analysis for the TRICL meta-analysis project.

LUCY (LUng Cancer in the Young) is a multicenter study with 31 recruiting hospitals in Germany 18,19. The study is conducted by the Institute of Epidemiology, Helmholtz Zentrum Muenchen, and the Department of Genetic Epidemiology, Medical School, University of Gottingen). The LUCY-study provides access to a nationwide, population based family and a case-control sample (control population KORA, described below) of lung cancer patients aged 50 years or younger at diagnosis. Detailed epidemiologic data have been collected including data on medical history, education, family history of cancer and smoking exposure by phase assessment. Blood samples are taken and DNA and lymphoblastoid cell lines are prepared of all cases and controls and of parts of the relatives. Phenotype data of 847 young patients with primary lung cancer and 5524 relatives have been collected.

Heidelberg lung cancer case-control study is an ongoing hospital based case-control study 19,20. The German Cancer Research Center (DKFZ) has recruited over 2000 lung cancer cases at and in collaboration with the Thoraxklinik Heidelberg, including 300 LC cases with onset of disease at the age of ≤ 50. Approximately 750 hospital-based controls have also been recruited. Data on occupational exposure, tobacco smoking, educational status, and for a subgroup also on family history of lung cancer, assessed by a self-administered questionnaire is available. Blood samples have been taken, and DNA has been extracted.

KORA (Cooperative health research in the Region of Augsburg) survey is a population-based KORA platform established by the Helmholtz Center Munich21. In total, four population based health surveys have been conducted during 1984/85-1999/2001. Overall 18000 participants in the age range between of 25 and 74 years at first interview were recruited. Detailed information on demographic characteristics, medical history, history of tobacco consumption and lifetime occupation together with biological materials were collected for more then 16000 probands.

The Toronto Study: The Toronto study was conducted in the Great Toronto Area between 1997 and 2002. Cases were recruited at the hospitals in the network of University of Toronto and Lunenfeld- Tanenbaum Research Institute. At the time of recruitment in the clinical setting, provisional diagnoses of lung carcinoma were first assigned based on clinical criteria. Diagnoses for all cases included were histologically confirmed by the reference pathologist who is a specialist in pulmonary pathology, based on review of pathology reports from surgery, biopsy or cytology samples in 100% of cases. Diagnostic classification was done initially according to ICD-9, ICD-10, and ICD for oncology-2, and subsequently converted to ICD-O-3. Tumors were grouped into the major categories included in this analysis according to primary cancer type based on the ICD-3 definitions. Controls were randomly selected from individual visiting family medicine clinics and Ministry of Finance Municipal Tax Tapes. All subjects were interviewed using a standard questionnaire and information on lifestyle risk factors, occupational history, medical and family history was collected. Blood samples were collected from more than 85% of the subjects. After applying the standardized quality control procedures and restricting to the study participants with European ancestry, 331 cases and 499 controls were included in the TRICL meta-analysis (7).

Quality control of GWAS datasets: Standard quality control was performed on all scans excluding individuals with low call rate (<90%) and extremely high or low heterozygosity (i.e. P<1.0 x 10-4), as well as all individuals evaluated to be of non-European ancestry (using the HapMap version 2 CEU, JPT/CHB and YRI populations as a reference; Supplementary Table 1). For apparent first-degree relative pairs, we removed the control from a case-control pair; otherwise, we excluded the individual with the lower call rate.

  • Study Design:
    • Case-Control
  • Study Type:
    • Case-Control
Authorized Access
Publicly Available Data (Public ftp)
Study Inclusion/Exclusion Criteria

All cases had to have received diagnosis of pathologically confirmed lung cancer. Tumors from patients were classified as adenocarcinomas (AD), squamous carcinomas (SQ), large-cell carcinomas (LCC), mixed adenosquamous carcinomas (MADSQ) and other non-small cell lung cancer (NSCLC) histologies following either the International Classification of Diseases for Oncology (ICD-O) or World Health Organization (WHO) coding. Tumors with overlapping histologies were classified as mixed.

Tumors from patients were classified as adenocarcinomas (AD), squamous carcinomas (SQ), large-cell carcinomas (LCC), mixed adenosquamous carcinomas (MADSQ) and other non-small cell lung cancer (NSCLC) histologies following either the International Classification of Diseases for Oncology (ICD-O) or World Health Organization (WHO) coding. Tumors with overlapping histologies classified as mixed. All cases and controls were reported to be European ancestry. During PLINK analysis, cases or controls that clustered more than 6 standard deviations from the centroid of the population were removed.

Controls were collected at each site, according to matching schemes at each site. The M.D. Anderson Cancer Site only collected ever smoking cases matched to ever smoking controls. Data on epidemiological risk factors were not available from the UK/ICR-GWAS as these originated from the 1958 birth cohort and Wellcome Trust Case Control Consortium for which data on epidemiological risk factors were not collected.

Molecular Data
TypeSourcePlatformNumber of Oligos/SNPsSNP Batch IdComment
Whole Genome Genotyping Illumina HumanHap550v3.0 561466 51468
Whole Genome Sequencing Complete Genomics Assembler Version 1.2.0; File Format Version: July 2009 N/A N/A
Whole Genome Genotyping Illumina HumanHap300v1.1 317503 33879
Whole Genome Genotyping Illumina HumanCNV370v1 370404 1047132
Whole Genome Genotyping Illumina Human610_Quadv1_B 601273 1048904
Whole Genome Genotyping Illumina ILLUMINA_Human_1M 1069796 52075
Study History

This study brings together 6 previously completed studies. The results for 6 of these studies were previously published with imputation to the HapMap 2, March 2012 release 22 while data from four of the studies (ICR-UK, IARC, NCI, and MD Anderson Cancer Center) were published with imputation to 1000 genomes. The current release includes two additional studies, from Germany and Toronto that have not been published as of March, 2015.

Selected publications
Diseases/Traits Related to Study (MeSH terms)
Links to Related Genes
Links to Related Resources
Authorized Data Access Requests
See research articles citing use of the data from this study
Study Attribution
  • Principal Investigator
    • Christopher Amos. National Institutes of Health, Bethesda, MD, USA.
  • Funding Source
    • U19 CA148127. National Institutes of Health, Bethesda, MD, USA.