• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of hmgLink to Publisher's site
Hum Mol Genet. Jun 15, 2010; 19(12): 2507–2515.
Published online Mar 23, 2010. doi:  10.1093/hmg/ddq122
PMCID: PMC2876886

Fine scale mapping of the breast cancer 16q12 locus


Recent genome-wide association studies have identified a breast cancer susceptibility locus on 16q12 with an unknown biological basis. We used a set of single nucleotide polymorphism (SNP) markers to generate a fine-scale map and narrowed the region of association to a 133 kb DNA segment containing the largely uncharacterized hypothetical gene LOC643714, a short intergenic region and the 5′ end of TOX3. Re-sequencing this segment in European subjects identified 293 common polymorphisms, including a set of 26 highly correlated candidate causal variants. By evaluation of these SNPs in five breast cancer case–control studies involving more than 23 000 subjects from populations of European and Southeast Asian ancestry, all but 14 variants could be excluded at odds of <1:100. Most of the remaining variants lie in the intergenic region, which exhibits evolutionary conservation and open chromatin conformation, consistent with a regulatory function. African-American case–control studies exhibit a different pattern of association suggestive of an additional causative variant.


In our initial genome-wide association study (GWAS) (1) two single nucleotide polymorphisms (SNPs), rs12443621 and rs8051542, in TOX3 were significantly associated with increased risk of breast cancer. TOX3 (also called TNRC9 or CAGF9) encodes a high mobility group box nuclear protein, involved in mediating calcium-dependent transcription (3). Increased expression of TOX3 has been reported to predict breast cancer metastasis to bone (4). Frequent loss of heterozygosity of the chromosome 16q arm is observed in breast tumours; however, the location of the critical region of loss, containing a putative tumour suppressor gene, remains undefined (5). A second, independent GWAS (2), using a different SNP set, also found a significant association with a correlated SNP rs3803662, within LOC643714. This work presents the results of a strategy to identify the causative variant directly responsible for the observed associations. Towards this end we have pursued fine-scale mapping of the region using case–control studies of European, East and Southeast Asian and African-American descent. In addition, we have sought to determine whether candidate SNPs reside in regions of open chromatin conformation or are associated with differences in expression of genes close to the locus. Candidate SNPs were further evaluated with in silico examination of evolutionary sequence conservation and putative transcription factor binding sites.


The initial GWAS (1,2) identified three SNPs (rs8051542, rs12443621 and rs3803662) in 16q12 associated with increased risk of breast cancer. These SNPs resided in a 133 kb linkage disequilibrium (LD) block containing the 5′ end of TOX3 gene and the entire hypothetical gene LOC643714. Initial refinement was performed on a larger 170 kb region which included this 133 kb block as well as two additional LD blocks that together contained the remainder of the TOX3 gene. The LD blocks were delimited by inspection of D′ plots using data from the CEU population of the International HapMap database (6). Nineteen SNPs were chosen to tag the 101 common SNPs in this region listed on the International HapMap database (r2 > 0.8) using Haploview (7) (Fig. 1A). Eleven of these tags showed no association with breast cancer in 2165 cases and 2278 controls from the European SEARCH case–control study (Table 1, Supplementary Material, Table S1) leaving eight significantly associated SNPs (P-trend < 0.05) which tagged the 133 kb region (Fig. 1A). The strongest association was observed with tag SNP rs3803662, and none of the other tags, including the original GWAS hits, maintained a significant association after adjusting for this SNP. Analysis of rs3803662 and the other two SNPs identified in the GWAS, rs12443621 and rs8051542, in 21 860 cases and 22 578 controls by the Breast Cancer Association Consortium also showed that only rs3803662 was independently significant. Furthermore, haplotype analysis of rs3803662 with nine correlated SNPs (D′ ≥ 0.5) revealed multiple haplotypes, all carrying the minor allele of rs3803662 and all associated with an increased risk of breast cancer (Supplementary Material, Table S2). Taken together these analyses suggest strongly that the association is mediated through a causative variant, strongly correlated with SNP rs3803662, and thus common in European subjects.

Table 1.
Breast cancer associations of 19 SNPs tagging TOX3 and LOC643714
Figure 1.
(A) LD blocks (D′ in greyscale) showing the 19 variants selected to tag TOX3 and LOC643714. Highlighted SNPs are those identified in GWAS by Easton et al. (1) and Stacey et al. (2). TagSNPs significantly associated with breast cancer risk are ...

The 133 kb region was re-sequenced in 42–45 individuals of European ancestry from the Centre d'Etude du Polymorphisme Humain (CEPH) collection (https://cgwb.nci.nih.gov/). No structural rearrangements have been identified in this interval. Four hundred and twenty-three variants were identified, of which 245 had minor allele frequency (MAF) > 0.05. Twenty-five variants were well correlated (r2 > 0.5) with the best tag SNP, rs3803662 and could thus be considered strong candidates for being the causative variant (Table 2, Supplementary Material, Table S3). Recently, the 1000 Genomes Project (http://www.1000genomes.org) released sequence data from 57 CEPH individuals. A comparison of the SNPs identified by this project and our own work is shown in Supplementary Material, Table S4. The 1000 Genomes Project identified 108 novel variants in this 133 kb region, bringing the total number of variants to 531, of which 293 had MAF ≥ 0.05. One of these newly discovered variants was also well correlated (r2 > 0.5) with our best tag SNP, rs3803662, increasing the number of candidate causative variants to 26 (Fig. 1B, Table 2). All other confirmed SNPs identified by the 1000 Genomes Project which were correlated with rs3803662 (0.2 ≤ r2 < 0.5) were better tagged by one of our other tagSNPs (which were no longer significant after adjusting for rs3803662, as discussed above) and thus were not investigated further.

Table 2.
Likelihood ratios for 26 variants identified as candidate causative variants

We hypothesized that the weaker LD between candidate causative variants in Asian and African-American populations (Fig. 2, Supplementary Material, Table S5) would increase the power to eliminate candidates, thereby improving resolution to locate causative alleles. Thus, we aimed to genotype these 26 candidates in 27 578 subjects from case–control studies of European, Asian and African-American ancestry (Supplementary Material, Fig. S1, Table S8 and Methods). Four of the variants could not be genotyped using high-throughput techniques, and so genotypes for these were determined by bidirectional sequencing of a subset of subjects, followed by imputation in the remaining European and Asian subjects (see Materials and Methods).

Figure 2.
Haplotype blocks of candidate SNPs genotyped by TaqMan in three study populations. Twenty-one SNPs all highly associated with breast cancer risk were genotyped in the SEARCH UK study as well as in the Seoul Breast Cancer Project. All but rs8045285 were ...

As anticipated, all 26 SNPs were significantly associated with breast cancer in the European studies (P-value < 10−8). Sixteen of these 26 (SNPs A, B, G, J, L-Q, S-U, W-Z) were also significantly associated (P-value < 0.05) in the Asian studies and allelic risks were in the same direction (Supplementary Material, Fig. S1). In the African-American studies, however, the significantly associated SNPs (P-value < 0.05) had effects in the opposite direction—the risk allele in Europeans and Asians was protective in African-Americans and vice-versa (Supplementary Material, Fig. S1). A similar phenomenon was described in Stacey et al. (2) for SNP rs3803662 in a subset of the African-American samples from the Multiethnic Cohort (MEC) Study, also included in this present study. The opposite direction of effect for SNPs in African-Americans was further explored by haplotype analysis of the genotyped candidate SNPs in the three ethnic groups (Table 3). In Europeans there are just two common haplotypes: haplotype #1, containing all the non-risk associated alleles, and haplotype #2, containing all the risk alleles, which has 1.3-fold increased risk relative to haplotype #1. Asians exhibit the same two haplotypes, with similar risks, as well as two further common haplotypes, #5 and #6. In contrast in African-Americans, haplotype #2, which is relatively uncommon, appears to be associated with similar risk to haplotype #1, although the confidence limits on this are wide. The second most common African-American haplotype is #5, which is associated with a clear reduction in risk relative to #1. Taken together, these results are consistent with the presence of a single, common causative variant in both the European and Asian populations, but suggest a different pattern of association in African-Americans. This different pattern in African-Americans meant these data could not be used to refine the mapping of the putative causative variant in Europeans and Asians.

Table 3.
Haplotype-specific breast cancer risks by ethnic group

On the basis of the assumption that there is a single disease-causing allele in European and Asian subjects, the likelihood of each candidate being causative was estimated (Table 2, Supplementary Material, Table S3). We computed a likelihood ratio for each candidate compared with the most strongly associated SNP rs4784227. Twelve of the variants had likelihood ratios 100-fold worse than rs4784227and so could reasonably be excluded from further consideration. The 14 remaining SNPs span three distinct, potentially functional, genetic elements. Three candidates (A, B and J) reside within intron 1 of TOX3; eight (M, N, O, P, Q, S, T and U) are clustered within a 3 kb segment of the intergenic region; and the remaining three lie within LOC643714: (W) is a synonymous change in a Ser residue encoded by putative exon 4, (X) is in putative intron 3 and the current strongest candidate (Y, rs4784227) is in putative intron 2 (Fig. 1B and Table 2).

We have used several approaches to further elucidate the likely functionality of the remaining candidates. We examined the chromatin conformation around each candidate using hypersensitivity to DNase I digestion in breast, prostate and colon cell lines as well as in primary human T-cells. There is a region of open chromatin conformation in the non-coding region between TOX3 and LOC643714 (Fig. 1B and Supplementary Material, Table S3), and examination of this region at higher resolution indicates that five candidates (SNPs M, N, O, P and Q) are in highly accessible chromatin (Supplementary Material, Fig. S2). The DNA sequence of this intergenic region is also highly conserved across mammalian species (Fig. 1B). Furthermore bioinformatic analysis using PReMod (http://genomequebec.mcgill.ca/PReMod/) identified two regulatory modules overlapping three of our candidates (SNPs M, N & O, Supplementary Material, Fig. S2) in open chromatin conformation as well as a third overlying SNPs W and X, but not in open chromatin. We also used multiple search algorithms to predict transcription factor binding sites containing the candidate causative SNPs (see Materials and Methods). Three of the predicted transcription factors binding to sites overlying SNPs N, O and Q also have the potential to interact with transcription factors predicted to be part of the two PReMod regulatory modules in open chromatin conformation (Supplementary Material, Table S6). The risk associated (T) allele of our current strongest candidate after genetic mapping, SNP Y (rs4784227), is predicted to create a C/EBP alpha binding site; however, this SNP resides in closed chromatin conformation.

A plausible hypothesis from the above findings is that the causative variant(s) may regulate gene expression. We first explored the possibility that the risk associated SNPs may alter expression levels of either TOX3 or LOC643714. TOX3 is expressed in both normal breast and breast tumour cells at similar levels (data not shown). We therefore examined the association between tag SNP rs3803662 genotype and TOX3 mRNA levels in 38 normal breast samples and in 77 breast tumours, and found no significant associations (regression P-trend = 0.83 and 0.66, respectively). Two predicted transcripts of LOC643714 (Ensembl:ENSESTT00000054674 and ENSESTT00000054675) have been detected, at very low levels, in the breast cancer cell line SUM190PT (data not shown), but levels were negligible in both normal breast and breast tumours, so that similar association studies with LOC643714 mRNA were not possible. Thus, as yet, we have no convincing evidence that either of these genes are regulated by the putative breast cancer association variant within this locus (Table 4).

Table 4.
Association of SNP rs3803662 with TOX3 and RBL2 mRNA levels

Working on the principle that regulatory variants may alter expression of distant genes in cis (8), we tested whether the 16q12 locus altered expression of surrounding genes (both confirmed and hypothetical) using expression data, where available, from lymphocytes (8,9) and breast tumours (Supplementary Material, Table S7). Expression of both TOX3 and LOC643714 is negligible in lymphocytes so could not be reliably assessed using these data. Of the 11 genes lying within 1 Mb of SNP rs3803662 with appropriate expression data, significant association with genotype was observed only with mRNA levels from RBL2 (Retinoblastoma-like gene 2, Supplementary Material, Fig. S3 and Table S7). Dose-dependent associations of the breast cancer risk allele with increasing levels of RBL2 mRNA were observed in lymphocytes from 210 HapMap subjects (P-trend = 0.01). TagSNPs across the 16q12 locus show moderate correlations between their associations with breast cancer risk and associations with RBL2 expression in HapMap samples (Supplementary Material, Fig. S4 and Table S8). However, no similar significant association of rs3803662 and RBL2 levels was observed in 77 breast tumours (P-value = 0.8) although this tumour set had limited power to detect such an association. RBL2 is a member of the retinoblastoma (Rb) gene family (10) is involved in cell cycle regulation and is frequently deleted in breast tumours (11).


This study serves to illustrate the complexities of identifying causal disease-susceptibility variants, even within a locus with very clear evidence of association. Using genetic epidemiology, we have been able to reduce the 293 common variants (MAF ≥ 0.05) found by re-sequencing to 14 strong candidates. Larger Asian case–control studies, when available, may eliminate more. However, four of these candidates (N, P, T and U) are too strongly correlated (r2 > 0.96) in both European and Asian studies, to be eliminated by epidemiological studies. The African-American data, with a different pattern of association, further adds to this complexity. It also remains possible that the causal variant we are seeking was not detected during re-sequencing.

The pattern of association in African-Americans, markedly different from that in Europeans and Asians, is puzzling. It is possible that the observed inverted allelic effect is a chance finding due to a lack of power. Indeed, the African-American studies are the smallest studies utilized in this analysis, and the ORs for SNPs differed across studies, as may be seen in Supplementary Material, Figure S1. Furthermore, African-Americans are of mixed ethnicity, and it has not been possible to assess the ancestral composition of the study subjects. It is thus possible that admixture has influenced these results, causing SNP allele frequencies in cases to differ from those in controls simply because the proportions of European and African ancestries differ between cases and controls. Our findings could also be explained by the existence of an additional risk variant, carried on a subset of haplotype #1 in the African-American population in addition to a variant shared by all populations. This possibility is supported by the fact that four of the remaining variants (A, J, Q and Y), including the current strongest candidate (Y, rs4784227), share the same risk allele in all three ethnic groups, although these associations did not reach statistical significance in the African-American case–control studies. Therefore, analysis of additional African and African-American studies will be necessary to clarify our findings. If these findings can be replicated, further re-sequencing and genotyping in African-American studies will be required to determine whether an additional risk variant underlies the different pattern of association.

The 14 remaining variants are all strong candidates for being causally important for breast cancer risk. We have explored these SNPs further using analysis of (i) chromatin conformation, (ii) evolutionary conservation and (iii) transcription factor binding site motifs. It is important to note that although rs4784227, in LOC643714, is the most significant SNP of those we tested, the other 13 remaining candidates could not be excluded at 100:1 odds, and any one of these may be the causative variant that we are seeking. The three analyses, listed above, are not definitive but they hint that the causative variant could be one of the five candidates (SNPs M, N, O, P and Q) located within open chromatin in the conserved, intergenic region.

We have, additionally, used the currently available data to search for evidence that this locus may regulate expression levels of neighbouring genes. In this analysis, we focused on the association of gene mRNA levels with SNP rs3803662, the best initial tagSNP of the locus and strongly correlated (r2 > 0.8) with the other remaining candidates in Europeans (It is unlikely that any of the other 13 SNP associations with expression would differ substantially from those observed for rs3803662 in breast tissues from European subjects. Association of rs4784227 with RBL2 expression, in the HapMap lymphoblastic cell lines, demonstrated similar findings to rs3803662—data not shown). These limited data have raised the intriguing hypothesis that this breast cancer locus might act via regulation of the RBL2 gene. However, this will need confirmation in larger datasets when they become available, as the risk of a false-positive finding is quite high, given the number of mRNAs examined in the existing small sample sets. An alternative hypothesis—that this locus regulates TOX3 and/or LOC643714 would be highly plausible, but this is not apparent from breast tissue or lymphocyte expression levels, perhaps because the relevant transcript or time-point was not examined. It also remains possible that this locus could regulate more distant genes in cis or even in trans.

Our combined evidence thus indicates a likely gene-regulatory function for this locus, but the gene or genes under regulation are not easily identified. Other tests of function will be required to evaluate the 14 variants that remain candidates after exhaustive evaluation by epidemiological studies.


Study populations

Initial associations were detected in the SEARCH breast cancer study, a population-based study in East Anglia (12). Eight additional studies were included in this fine-scale mapping work, all containing cases diagnosed with invasive breast cancer and cancer-free controls (see Supplementary Material, Methods). Briefly, there were 7536/7710 cases/controls of European ethnicity from the SEARCH (6704/6840) and KARBAC (832/870) studies; 4268/3868 cases/controls of Asian ethnicity from the MEC (Japanese 447/394), LAABC (Japanese 447/394, Chinese 263/375, Filipino 304/297), SEBCS (2159/1548) and TBCS (920/940) studies; and 1323/1398 controls/cases of African-American ethnicity from the CARE-5 Cities (452/435), MEC (428/654) and LIFE combined with CARE-LA (518/234) studies.


In order to create a full catalogue of common SNPs (MAF ≥ 0.5), DNA samples from CEPH individuals were sequenced across the 133 kb region of linkage. Two hundred and sixty-four overlapping PCR amplicons were designed from positions 51 074 000 to 51 206 999 of chromosome 16 (average amplicon size 660 pb, 160 pb overlap). M13-tagged PCR products were bidirectionally sequenced using Big Dye 3.0 (Applied Biosystems) and processed using automated trace analysis through the Cancer Genome Workbench (cgwb.nci.gov). The sequencing was done in two stages, with 108 kb in the first stage sequenced on 45 CEPH subjects and 25 kb in the second on 42 of the same individuals. In the first stage, 67% of nucleotides across the region could be scored for polymorphisms in at least 80% of subjects. In the second stage, 93% of nucleotides could be scored for polymorphisms in at least 80% of 43 subjects. This gave a >98% probability of detecting a variant with an MAF > 5%. A total of 423 variants were identified in this region with 245 at MAF ≥ 5%. SNP data from 57 CEPH individuals sequenced through the 1000 Genomes Project (http://www.1000genomes.org), identified an additional 48 variants with MAF ≥ 5%. A comparison of the SNPs identified by these two different sources is shown in Supplementary Material, Table S4.


For 22 of the 26 candidate SNPs, genotyping was performed in individual centres by TaqMan 5′ nuclease assay on 10 ng template DNA in a 384 well format containing No Template Controls and duplicate samples in each plate to ensure quality control. Genotypes were determined using the ABI PRISM 7900HT Sequence Detection System according to the manufacturer's instructions. Primers and probes were obtained from Applied Biosystems (http://www.appliedbiosystems.com/) as Assays-by-Design (distributed by one centre).

For four variants, which could not be assayed with TaqMan, genotyping was performed via bidirectional sequencing. These four variants included an insertion/deletions (rs45538731 [O], also known as rs45492607 and rs1362549) and three SNPs that failed to design as Assays-by-Design (rs35668161 [B], rs12600239 [C] and rs7500427 [D]). PCR conditions were: 95°C for 10 min, 40 cycles of 94°C for 30 s, 60°C (or 64°C) for 30 s, 72°C for 45 s and a final extension step at 72°C for 10 min. For up to 46 SEARCH samples and 69 Korean samples, 5–10 ng genomic DNA were used in a 5–10 µl PCR reactions. PCR products were treated using the ExoSAP-IT method (USB Corporation, Cleveland, OH, USA) and bidirectional sequencing performed using BigDye Terminator v3.1 Cycle Sequencing Kits (Applied Biosystems, Foster City, CA, USA) according to the manufacturer's instructions. Samples were run on a 3100 Genetic Analyzer (ABI) using Run 3100 Data Collection v2.0 and Sequencing Analysis Version 5.1.1 software.

DNase I hypersensitivity

Three breast cell lines, MCF-7, PMC42 and MDA231, two prostate lines, LnCapC4b and RWPE-1 and one colon cancer cell line, HCT116, were obtained from the Cambridge Research Institute (CRI) culture collection. All cell lines were maintained in RPMI with 10% foetal calf serum, except RWPE-1 which was maintained in keratinocyte serum free medium (Gibco, UK) supplemented with 0.05 mg/ml bovine pituitary extract and 5 ng/ml epidermal growth factor (both from Sigma, UK). Primary human T-cells were isolated from filters obtained from the Cambridge Blood Transfusion Service (REC reference number: 04/Q0108/21). Filters were flushed and cells purified using MACS® Separators (Miltenyi Biotec) and cultured for 24 h in RPMI medium supplemented with 20% foetal calf serum and 2% PHA-M (Sigma, UK). Cells were harvested while in the exponential growth phase. DNase I hypersensitivity experiments were carried out as described in Follows et al. (13) with amendments as published (14). The array data were corrected using Loess normalization and analyzed by ACME (15), using a 95% cut-off and a sliding window size of 500 bp by the CRI Computational Biology group. For each cell line at least two hybridizations were carried out. Results obtained with cell lines from the same tissue were averaged. The data was visualized using the Affymetrix Integrated Genome Browser.

Transcription factor binding sites searches

Searches were performed for the 100 base-pair surrounding sequences of the remaining candidate causal SNPs in following transcription factor motif search engines: AliBaba 2.1 (16) (http://darwin.nmsu.edu/~molb470/fall2003/Projects/solorz/aliBaba_2_1.htm), TFSEARCH (17) (http://www.cbrc.jp/research/db/TFSEARCH.html), Genomatix (18) (http://www.genomatix.de/). Scores >0.85 using at least two search engines were considered indicative of genuine prediction. Potential regulatory modules were identified using a fourth program, PReMod (19,20) (http://genomequebec.mcgill.ca/PReMod), which predicts regulatory regions on the basis of the clustering of TF binding sites. The potential of SNP binding proteins to interact with these modules was assessed using BioGRID, General Repository for Interaction Datasets (http://www.thebiogrid.org).

Expression analyses

DNA from 77 breast tumours from the Nottingham City Cohort (21), and 38 normal breast samples were genotyped for SNP rs3803662 using fluorescent 5′ exonuclease assay (TaqMan, Applied Biosystems). Normal breast tissue was collected at the Addenbroke's Hospital, from women undergoing aesthetic surgery, for reasons not related to cancer. The samples were analysed by a histopathologist, to ensure that they were free of displasia. Ethical approval was obtained for the collection and research use of all blood and breast samples used in this study.

Analysis of comparative TOX3 expression was performed on total RNA from a subset of 11 breast tumour and 12 normal breast samples. cDNA was prepared with the TaqMan Reverse Transcription Reagents kit (Applied Biosystems) using random hexamers, according to the manufacturer's instructions. Expression levels were determined using TaqMan Gene Expression assay Hs00300355_m1 (Applied Biosystems) for TOX3 and primers specific to LOC643714 predicted transcript ENSESTT00000054674 (Ensembl database) with SYBRgreen mix (ABI), and normalized to two different housekeeping genes. All samples were run in triplicate.

Associations between TOX3 expression and rs3803662 genotype were assessed using linear regression. Expression levels of the 38 normal breast samples were determined using TaqMan Gene Expression assay Hs00300355_m1 (Applied Biosystems) for TOX3. Microarray expression data for the breast tumours were available using the Illumina platform (22). For analyses involving breast tumours, we incorporated in the regression model a covariate for copy number based on array-comparative genome hybridization data (21) using the CGH probe closest to each gene expression probe location.

Analyses of the relationship between SNP genotype and gene expression were also analyzed with publicly available expression data generated from Epstein-Barr virus-transformed lymphoblastoid cell lines (8,9).

Statistical methods

Each of the 22 SNPs genotyped by TaqMan was assessed for association with disease status using a likelihood ratio test. Subjects missing more than 25% of the genotyped variants were excluded from analyses. Per-allele odds ratios (ORs) and confidence intervals (CIs) were estimated by logistic regression stratified by study centre using Intercooled Stata version 8.2. Some SNPs were not genotyped by all study centres (see Supplementary Material, Fig. S1), and genotypes of these SNPs were imputed (see below).

Sampling weights were developed for the CARE-5 Cities data to account for the non-random selection of subjects from the study population described in greater detail in (23). Estimates were very similar using both weighted and unweighted analysis (data not shown). For simplicity and consistency with the fine-scale mapping analyses, unweighted analyses are presented in the main text.

Haplotype frequencies were estimated using the haplo.stats package in S-plus (24), separately for the European and Asian populations, using the data from the case–control studies on whom the tagSNPs plus the 115 individuals on whom all SNPs were typed. The haplotype frequencies were used to impute genotype probabilities for each SNP in each individual. An Expectation-maximization (EM) algorithm was then used to fit a logistic regression model allowing uncertainty in the genotypes of the untyped SNPs and assuming that each SNP in turn was the causal variant. Thus, we calculated the likelihood that each SNP was the causal variant (1).

Haplotype analysis was conducted using an in-house program based on the TagSNPs program (25). Breast cancer risk was assessed for common haplotypes (frequencies >0.01) composed of the 20 SNPs genotyped by TaqMan. Rare haplotypes were pooled. Haplotype frequencies and subject-specific expected haplotype indicators were calculated separately for each study using the EM algorithm to account for the haplotype uncertainty given the unphased genotype data. Subjects missing >50% of genotype data were excluded from the analysis. Logistic regression was used to generate haplotype specific risks with respect to the baseline, chosen as the most common haplotype.


The genotyping and analysis of this study, and the conduct of the SEARCH study, was funded by Cancer Research UK grants (C20/A3084, C1287/A10118, C490/A11021, C8197/A10123, C1287/A7497, C8197/A10865) and COGS EU FP7 Health-F2-2009-223175. A.M.D., P.D.P. and H.F. were supported by Cancer Research UK. This research was supported in part by the Intramural Research Programs of the National Cancer Institute and National Human Genome Research Institute, NIH, U.S. Department of Health and Human Services. M.U. was supported by the NIH-Oxford/Cambridge PhD program. The MEC Study was supported by the US National Cancer Institute (CA 54281, CA 63464, CA132839). The LAABC study was supported by the California Breast Cancer Research Program (1RB-0287, 3PB-0102, 5PB-008, 10PB-0098). The CARE study was supported by the National Institute of Child Health and Human Development, with additional support from the National Cancer Institute, through contracts with Emory University (N01 HD 3-3168), Fred Hutchinson Cancer Research Center (N01 HD 2-3166), Karmanos Cancer Institute at Wayne State University (N01 HD 3-3174), University of Pennsylvania (N01 HD 3-3176), University of Southern California (N01 HD 3-3175) and through an intra-agency agreement with the Centers for Disease Control and Prevention (Y01 HD 7022). General support through SEER contracts [N01-PC-67006 (Atlanta), N01-CN-65064 (Detroit), N01-CN-67010 (Los Angeles) and N01-CN-0532 (Seattle)] are also acknowledged. B.A.J.P. is Li Ka Shing Professor of Oncology and we acknowledge Hutchison Whampoa Limited. KARBAC thank the Swedish Cancer Society, The Jubilee Foundation, and Bert von Kantzow foundation. The Seoul Breast Cancer Project is supported by the Ministry of Health & Welfare, ROK (01-PJ3-PG6-01GN07-0004).

Supplementary Material

[Supplementary Data]


The authors thank all the women who participated in this research as well as Caroline Baynes, Don Conroy, Craig Luccarini, Hannah Munday and Mitul Shah for work within the SEARCH study.

Conflict of Interest statement. None declared.


1. Easton D.F., Pooley K.A., Dunning A.M., Pharoah P.D., Thompson D., Ballinger D.G., Struewing J.P., Morrison J., Field H., Luben R., et al. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature. 2007;447:1087–1093. doi:10.1038/nature05887. [PMC free article] [PubMed]
2. Stacey S.N., Manolescu A., Sulem P., Rafnar T., Gudmundsson J., Gudjonsson S.A., Masson G., Jakobsdottir M., Thorlacius S., Helgason A., et al. Common variants on chromosomes 2q35 and 16q12 confer susceptibility to estrogen receptor-positive breast cancer. Nat. Genet. 2007;39:865–869. doi:10.1038/ng2064. [PubMed]
3. Yuan S.H., Qiu Z., Ghosh A. TOX3 regulates calcium-dependent transcription in neurons. Proc. Natl. Acad. Sci. USA. 2009;106:2909–2914. doi:10.1073/pnas.0805555106. [PMC free article] [PubMed]
4. Smid M., Wang Y., Klijn J.G., Sieuwerts A.M., Zhang Y., Atkins D., Martens J.W., Foekens J.A. Genes associated with breast cancer metastatic to bone. J. Clin. Oncol. 2006;24:2261–2267. doi:10.1200/JCO.2005.03.8802. [PubMed]
5. Rakha E.A., Green A.R., Powe D.G., Roylance R., Ellis I.O. Chromosome 16 tumor-suppressor genes in breast cancer. Genes Chromosomes Cancer. 2006;45:527–535. doi:10.1002/gcc.20318. [PubMed]
6. The International HapMap Project. Nature. 2003;426:789–796. doi:10.1038/nature02168. [PubMed]
7. Barrett J.C., Fry B., Maller J., Daly M.J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21:263–265. doi:10.1093/bioinformatics/bth457. [PubMed]
8. Stranger B.E., Forrest M.S., Clark A.G., Minichiello M.J., Deutsch S., Lyle R., Hunt S., Kahl B., Antonarakis S.E., Tavare S., et al. Genome-wide associations of gene expression variation in humans. PLoS Genet. 2005;1:e78. doi:10.1371/journal.pgen.0010078. [PMC free article] [PubMed]
9. Dixon A.L., Liang L., Moffatt M.F., Chen W., Heath S., Wong K.C., Taylor J., Burnett E., Gut I., Farrall M., et al. A genome-wide association study of global gene expression. Nat. Genet. 2007;39:1202–1207. doi:10.1038/ng2109. [PubMed]
10. Mayol X., Grana X., Baldi A., Sang N., Hu Q., Giordano A. Cloning of a new member of the retinoblastoma gene family (pRb2) which binds to the E1A transforming domain. Oncogene. 1993;8:2561–2566. [PubMed]
11. Naylor T.L., Greshock J., Wang Y., Colligon T., Yu Q.C., Clemmer V., Zaks T.Z., Weber B.L. High resolution genomic analysis of sporadic breast cancer using array-based comparative genomic hybridization. Breast Cancer Res. 2005;7:R1186–R1198. doi:10.1186/bcr1356. [PMC free article] [PubMed]
12. Lesueur F., Pharoah P.D., Laing S., Ahmed S., Jordan C., Smith P.L., Luben R., Wareham N.J., Easton D.F., Dunning A.M., et al. Allelic association of the human homologue of the mouse modifier Ptprj with breast cancer. Hum. Mol. Genet. 2005;14:2349–2356. doi:10.1093/hmg/ddi237. [PubMed]
13. Follows G.A., Janes M.E., Vallier L., Green A.R., Gottgens B. Real-time PCR mapping of DNaseI-hypersensitive sites using a novel ligation-mediated amplification technique. Nucleic Acids Res. 2007;35:e56. doi:10.1093/nar/gkm108. [PMC free article] [PubMed]
14. Udler M.S., Meyer K.B., Pooley K.A., Karlins E., Struewing J.P., Zhang J., Doody D.R., MacArthur S., Tyrer J., Pharoah P.D., et al. FGFR2 variants and breast cancer risk: fine-scale mapping using African American studies and analysis of chromatin conformation. Hum. Mol. Genet. 2009;18:1692–1703. doi:10.1093/hmg/ddp078. [PMC free article] [PubMed]
15. Scacheri P.C., Crawford G.E., Davis S. Statistics for ChIP-chip and DNase hypersensitivity experiments on NimbleGen arrays. Methods Enzymol. 2006;411:270–282. doi:10.1016/S0076-6879(06)11014-9. [PubMed]
16. Grabe N. AliBaba2: context specific identification of transcription factor binding sites. In Silico Biol. 2002;2:S1–S15. [PubMed]
17. Heinemeyer T., Wingender E., Reuter I., Hermjakob H., Kel A.E., Kel O.V., Ignatieva E.V., Ananko E.A., Podkolodnaya O.A., Kolpakov F.A., et al. Databases on transcriptional regulation: TRANSFAC, TRRD and COMPEL. Nucleic Acids Res. 1998;26:362–367. doi:10.1093/nar/26.1.362. [PMC free article] [PubMed]
18. Cartharius K., Frech K., Grote K., Klocke B., Haltmeier M., Klingenhoff A., Frisch M., Bayerlein M., Werner T. MatInspector and beyond: promoter analysis based on transcription factor binding sites. Bioinformatics. 2005;21:2933–2942. doi:10.1093/bioinformatics/bti473. [PubMed]
19. Blanchette M., Bataille A.R., Chen X., Poitras C., Laganiere J., Lefebvre C., Deblois G., Giguere V., Ferretti V., Bergeron D., et al. Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res. 2006;16:656–668. doi:10.1101/gr.4866006. [PMC free article] [PubMed]
20. Ferretti V., Poitras C., Bergeron D., Coulombe B., Robert F., Blanchette M. PReMod: a database of genome-wide mammalian cis-regulatory module predictions. Nucleic Acids Res. 2007;35:D122–D126. doi:10.1093/nar/gkl879. [PMC free article] [PubMed]
21. Chin S.F., Teschendorff A.E., Marioni J.C., Wang Y., Barbosa-Morais N.L., Thorne N.P., Costa J.L., Pinder S.E., van de Wiel M.A., Green A.R., et al. High-resolution aCGH and expression profiling identifies a novel genomic subtype of ER negative breast cancer. Genome Biol. 2007;8:R215. doi:10.1186/gb-2007-8-10-r215. [PMC free article] [PubMed]
22. Blenkiron C., Goldstein L.D., Thorne N.P., Spiteri I., Chin S.F., Dunning M.J., Barbosa-Morais N.L., Teschendorff A.E., Green A.R., Ellis I.O., et al. MicroRNA expression profiling of human breast cancer identifies new markers of tumor subtype. Genome Biol. 2007;8:R214. doi:10.1186/gb-2007-8-10-r214. [PMC free article] [PubMed]
23. Malone K.E., Daling J.R., Doody D.R., Hsu L., Bernstein L., Coates R.J., Marchbanks P.A., Simon M.S., McDonald J.A., Norman S.A., et al. Prevalence and predictors of BRCA1 and BRCA2 mutations in a population-based study of breast cancer in white and black American women ages 35 to 64 years. Cancer Res. 2006;66:8297–8308. doi:10.1158/0008-5472.CAN-06-0503. [PubMed]
24. Schaid D.J., Rowland C.M., Tines D.E., Jacobson R.M., Poland G.A. Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am. J. Hum. Genet. 2002;70:425–434. doi:10.1086/338688. [PMC free article] [PubMed]
25. Stram D.O., Haiman C.A., Hirschhorn J.N., Altshuler D., Kolonel L.N., Henderson B.E., Pike M.C. Choosing haplotype-tagging SNPS based on unphased genotype data using a preliminary sample of unrelated subjects with an example from the Multiethnic Cohort Study. Hum. Hered. 2003;55:27–36. doi:10.1159/000071807. [PubMed]

Articles from Human Molecular Genetics are provided here courtesy of Oxford University Press
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...