![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||
Copyright © 2008 Kendrick et al; licensee BioMed Central Ltd. Transcriptome analysis of mammary epithelial subpopulations identifies novel determinants of lineage commitment and cell fate 1Breakthrough Breast Cancer Research Centre, The Institute of Cancer Research, 237 Fulham Road, London, SW3 6JB, UK 2Breakthrough Breast Cancer Research Unit, Guy's Hospital, London, SE1 9RT, UK Corresponding author.Howard Kendrick: howard.kendrick/at/icr.ac.uk; Joseph L Regan: joseph.regan/at/icr.ac.uk; Fiona-Ann Magnay: fiona-ann.magnay/at/icr.ac.uk; Anita Grigoriadis: anita.grigoriadis/at/kcl.ac.uk; Costas Mitsopoulos: konstantinos.mitsopoulos/at/icr.ac.uk; Marketa Zvelebil: marketa.zvelebil/at/icr.ac.uk; Matthew J Smalley: matthew.smalley/at/icr.ac.uk Received September 3, 2008; Accepted December 8, 2008. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This article has been cited by other articles in PMC.Abstract Background Understanding the molecular control of cell lineages and fate determination in complex tissues is key to not only understanding the developmental biology and cellular homeostasis of such tissues but also for our understanding and interpretation of the molecular pathology of diseases such as cancer. The prerequisite for such an understanding is detailed knowledge of the cell types that make up such tissues, including their comprehensive molecular characterisation. In the mammary epithelium, the bulk of the tissue is composed of three cell lineages, namely the basal/myoepithelial, luminal epithelial estrogen receptor positive and luminal epithelial estrogen receptor negative cells. However, a detailed molecular characterisation of the transcriptomic differences between these three populations has not been carried out. Results A whole transcriptome analysis of basal/myoepithelial cells, luminal estrogen receptor negative cells and luminal estrogen receptor positive cells isolated from the virgin mouse mammary epithelium identified 861, 326 and 488 genes as highly differentially expressed in the three cell types, respectively. Network analysis of the transcriptomic data identified a subpopulation of luminal estrogen receptor negative cells with a novel potential role as non-professional immune cells. Analysis of the data for potential paracrine interacting factors showed that the basal/myoepithelial cells, remarkably, expressed over twice as many ligands and cell surface receptors as the other two populations combined. A number of transcriptional regulators were also identified that were differentially expressed between the cell lineages. One of these, Sox6, was specifically expressed in luminal estrogen receptor negative cells and functional assays confirmed that it maintained mammary epithelial cells in a differentiated luminal cell lineage. Conclusion The mouse mammary epithelium is composed of three main cell types with distinct gene expression patterns. These suggest the existence of a novel functional cell type within the gland, that the basal/myoepithelial cells are key regulators of paracrine signalling and that there is a complex network of differentially expressed transcription factors controlling mammary epithelial cell fate. These data will form the basis for understanding not only cell fate determination and cellular homeostasis in the normal mammary epithelium but also the contribution of different mammary epithelial cell types to the etiology and molecular pathology of breast disease. Background The function of complex tissues, such as the mammary epithelium, is a product of the interactions between their constituent cell types. In such tissues, disease states like cancer are essentially a failure of this cellular homeostasis and are characterised by insensitivity of cells to external regulatory factors and aberrant cell fate choices. Understanding the molecular regulation of the individual cell types in complex tissues is, therefore, a prerequisite for understanding disease states. Furthermore, advances in molecular pathology have demonstrated that different disease phenotypes correlate with different gene expression profiles [1]. In a complex tissue, composed of different cell types with different molecular characteristics, the gene expression profiles of different diseases may reflect the contribution of different cell types to that disease. Therefore, a detailed molecular characterisation of the cell types in a complex tissue is essential for the interpretation of the molecular pathology of its diseases. The resting adult mammary epithelium consists of two main structures, alveoli (which develop into milk-secreting lobulo-alveolar structures upon pregnancy) and ducts (which carry the milk from the lobulo-alveolar structures to the nipple) [2]. These two structures are themselves composed of two main epithelial cell layers, basal cells and luminal cells. The basal cell layer mainly consists of myoepithelial cells which contract in response to oxytocin release during lactation to force milk down the ducts to the nipple. Recent work has demonstrated that the basal cell layer also contains the mammary epithelial stem cell compartment [3-7]. The luminal cell layer has been shown to be composed of two functionally distinct lineages defined by expression of the cell surface proteins CD24 and Sca-1. CD24+/High Sca-1+ luminal cells express estrogen receptor alpha (ER), as well as receptors for prolactin and progesterone (the luminal ER+ compartment), while CD24+/High Sca-1- luminal cells (the luminal ER- compartment) express genes (at low levels) for milk proteins even in the virgin and likely include alveolar progenitors [5,7-9]. Although it is known that the stem cells can generate all the myoepithelial, luminal ER- and luminal ER+ daughter cell types [5], the mechanisms which control cellular homeostasis, fate determination and lineage commitment in the mammary epithelium are poorly understood. They are likely to be a product of complex interactions between cell extrinsic paracrine influences, cell intrinsic transcriptional regulators and epigenomic modifications [10]. Some progress has been made towards understanding some of the cell intrinsic factors involved. For instance, Gata3 was recently identified as a transcription factor important in specifying commitment in the general luminal lineage [11,12] and Elf5 was shown to be a specifier of alveolar cell fate [13]. A number of the cell extrinsic (paracrine) factors operating within the mammary epithelium have also been characterised, such as Wnt-4, which acts downstream of progesterone signalling in ductal side-branching [14] and the EGF-family member Amphiregulin [15,16], which is produced by ER+ cells in response to estrogen and stimulates mammary stem cell activity (most likely acting indirectly via non-epithelial cells and additional paracrine factors) [17]. The Notch signalling pathway has also been shown to be an important determinant of luminal cell fate [18,19]. However, the full extent and nature of paracrine interactions in the mammary gland, and the degree to which the different lineages contribute to them, and are defined by them, is still not fully understood. Gene expression patterns have been previously examined in the mouse mammary gland, either as changes in gene expression across the whole tissue in developmental timecourses [20,21] or as comparisons between the total epithelium and the mammary stroma [12]. In one report, gene expression patterns were examined in mouse luminal and myoepithelial cells purified by flow sorting as well as in stem cell enriched basal cell populations [6]. However, these stem cell gene signatures were found to be not significantly different from the myoepithelial signatures, suggesting they were derived from cell populations dominated by myoepithelial cells. The purity of the basal stem cell populations remains a persistent problem due the difficulty of isolating pure (as opposed to enriched) stem cell fractions. A number of gene expression studies have also been carried out on human breast cells. The response of human breast epithelium to estrogen has been analysed at the gene expression level in breast cancer cell lines in vitro and as xenografts [22-24], in normal breast tissue maintained as xenografts [25] and in normal human ER+ breast cells isolated by transduction of primary breast epithelial cells with a virus carrying an estrogen response element driving GFP expression [26]. The comparative gene expression profiles of normal human myoepithelial [27,28], basal non-myoepithelial (with a cell surface phenotype CD10- CD44+) [29] and luminal epithelial cells [27-29] have also been examined, as have the gene expression profiles of different in vitro progenitor (colony-forming) subpopulations of normal breast epithelial cells [18]. However, to date no genome-wide transcriptome study has made a direct comparison between the two luminal epithelial populations (ER- and ER+) and the basal/myoepithelial cells, confounding the molecular characterisation of the luminal cells and preventing the analysis of the lineage commitment of, and interactions between, the two luminal cell types and the other cell types in the gland. The aim of this study, therefore, was to carry out the first comprehensive gene expression study which examined gene expression patterns in the three distinct mouse mammary epithelial populations, basal/myoepithelial, luminal ER- and luminal ER+. The analysis was to concentrate on three specific areas. First, characterising cell-type specific patterns of gene expression which defined cell identity. Second, establishing a broad overview of the likely extent and nature of paracrine interactions amongst the populations. Last, defining cell intrinsic factors which may be important in determining lineage commitment and cell fate in the mammary epithelium. The results of these analyses have, first, identified a novel potential function for a subpopulation of mammary luminal epithelial cells as non-professional immune cells; second, provided information on the large number of paracrine interactions yet to be fully characterised in the mammary gland and the likely complexity of their interactions; third, identified population-specific transcription factors which may have a role in lineage determination and fate specification in the mammary epithelium. In particular, we have used in vitro and in vivo functional analyses to demonstrate that Sox6 is a determinant of luminal cell fate. Results Identification of population-specific gene expression patterns in the virgin mouse mammary epithelium To carry out a comprehensive, whole genome gene expression analysis of the epithelial cell populations in the virgin mammary gland (Figure (Figure1A1A
To identify genes whose expression characterised the three subpopulations, the list of genes was split into three sets on the basis of relative abundance of expression. Any gene with a relative abundance of 2 or higher in a population was considered as population-specific. If a gene was represented by more than one probe set, the average of all probe sets was used for further analysis. This analysis identified 861, 326 and 488 genes as characteristic of basal/myoepithelial cells, luminal ER- cells and luminal ER+ cells, respectively [see Additional files 4, 5, 6]. To confirm our approach, a subset of genes specific for each of the three populations, as well as some which were common to two of the populations, were selected for qPCR validation. Furthermore, a number of genes previously shown to be relevant in mammary biology were included in this analysis, as was the data collected on Krt14, Krt18 and Esr1 expression in the populations [see Additional file 1], giving 58 genes in total (Figure (Figure3).3
Comparisons with previously published datasets We compared our data with previous studies which have used a cell separation approach to isolate mouse [6] or human [27-29] mammary epithelial basal/myoepithelial and luminal subpopulations for gene expression profiling, to identify genes found in common between these datasets and our own data [see Additional files 8, 9, 10]. There was good concordance between the genes identified in our data and that of Stingl and colleagues [6] in their study of separated mouse epithelium. They identified 128 probes, corresponding to 80 well-annotated genes, significantly upregulated in their myoepithelial/mammary stem cell enriched cell fraction compared to their mammary colony forming cells (corresponding to the luminal cell fraction). Of these 80 genes, 49 (61%) were significantly enriched in our basal/myoepithelial cell dataset, three were enriched in both the basal/myoepithelial cells and the luminal ER+ cells and two were enriched in both the basal/myoepithelial cells and the luminal ER- cells [see Additional files 8 and 9]. Conversely, Stingl and colleagues identified 102 probes, corresponding to 66 well-annotated genes, significantly downregulated in their myoepithelial/mammary stem cell enriched cell fraction compared to their mammary colony forming cells. Of these, which would correspond to genes enriched in the luminal epithelial fraction, none were enriched in our basal/myoepithelial population but 28 (42%) were enriched in both our luminal populations, five were enriched only in the luminal ER- cells and four only in the luminal ER+ cells [see Additional files 8 and 10]. However, correspondence of our data with previously published datasets from separated human cells [27-29] was lower, although it tended to be better between the mouse myoepithelial and human myoepithelial data sets than between the luminal data sets from the different species [see Additional files 8, 9, 10]. Interaction mapping of genes differentially expressed in mammary epithelial subpopulations identifies key processes and novel functions To get a better understanding of the key biological processes occurring in each of the three cell types, we generated network interaction maps for the differentially expressed genes [see Additional files 11, 12, 13]. Interaction data derived from studies on human orthologues of the genes identified were used to create the network maps, as there is not enough data currently available purely from studies of mouse genes to make such an analysis meaningful. When a basal/myoepithelial interaction map was constructed using both the differentially expressed genes and genes interpolated by the network mapping program (which allow connections to be extended and elaborated), the resulting network was extraordinarily complex (data not shown). For ease of interpretation, therefore, a basal/myoepithelial network was constructed using only direct interactions between genes characteristic of this cell population, with no interpolations [see Additional file 11]. This network identified two major interaction 'modules' and three minor ones. Such interaction modules are indicative of important processes occurring within a cell, as they are composed of cell-type specific genes and defined by multiple interactions between those genes. The two major interaction modules can be broadly characterised as an extracellular matrix module including multiple collagen genes and a cytoskeletal module including genes for the keratins, vimentin, and genes whose protein products are involved in regulation of cell shape, movement and contractility, such as the actin binding proteins MYH10 [32] and TPM2 [33] and smooth muscle gamma actin ACTG2 [34]. The minor modules indicated that the basal/myoepithelial cell population also has important processes based around Platelet-Derived Growth Factor (PDGF), Ephrin and Insulin-Like Growth Factor (IGF1) signalling. Ephrins are mediators of contact-dependent communication between cells [35] whereas both PDGF and IGF1 signalling are involved in paracrine cell-cell communication [36,37]. The luminal interaction maps were built using both cell-specific genes and interpolated genes [see Additional files 12 and 13]. As a result, it was less straightforward to define cell-specific interaction modules which would indicate the key cellular processes occurring in these cell types. We therefore developed a mathematical approach to defining the modules which required the assignment of network hubs, node clusters and contiguous differentially expressed network paths. To define network hubs (nodes having multiple interactions), nodes were ranked by descending connectivity. The minimum node connectivity in the top 10% of nodes was five for either network and this was therefore set as a threshold for identifying hubs. Differentially expressed hubs for the luminal ER- and ER+ networks are listed in Tables 1 and 2 respectively. There was limited overlap between the luminal ER- and ER+ network hubs with only three differentially expressed hubs being shared (ERBB3, KRT18 and CD82). The majority of hubs in both networks had a high content of physical interactions with the exception of TNF, SP1, FAS, NFKB1 CREB1, EGR1 and SPI1, which are almost exclusively transcriptional hubs. In the luminal ER- network TLR4, LY96, ERBB3, MUC1 and CD82 were differentially expressed hubs with significant clustering character, although the strongest clustering was seen for the non-differentially expressed hubs EGFR and ERBB2. Clustering was less pronounced in the luminal ER+ network with PGR, ERBB3, FGG, FGB, CD82 and COL8A1 forming significantly clustered differentially expressed hubs. With the exception of TLR4 and LY96 in the luminal ER- network, clustering and connectivity were inversely correlated, with high connectivity nodes such as ESR, PTN, CCL5, TNF and BCL2, exhibiting very little or no clustering.
To identify modules within the networks, a three-pass approach was adopted [see Additional file 14]. Initially all differentially expressed hubs were identified and, where direct links existed between them in the network, these were used to provide the backbone for each module. In the second pass, modules were allowed to expand by the addition of differentially expressed nodes with lower connectivity (such that they did not qualify as hubs), if they were directly linked to differentially expressed hubs. In the third pass, non-differentially expressed hubs were incorporated, if they were connecting at least two differentially expressed hubs. This step allowed for bridging between modules, providing further information about their proximity and global topology within the luminal ER- and luminal ER+ networks [see Additional files 15 and 16]. Following the initial two passes, four distinct modules were established for the luminal ER- network [see Additional files 14 and 15]: the TLR (nodes TLR4, LY96 and CD14), KIT (nodes KIT, ERBB3, LYN, TEC, GRB7, MUC1 and CD82), KRT (nodes KRT18 and KRT8) and BCL2 (nodes BCL2, BIK and BNIPL) modules (Table 3). Nodes TNF, FAS, CCL5, CCL2 and RPS6KA5 were integrated in the BCL2 subnet only at the third pass through a network of transcriptional interactions, forming the TNF/FAS module [see Additional file 14]. In addition, the KRT module merged into the KIT module at this stage. The TLR and KIT modules contained predominantly physical interactions whereas the TNF/FAS module displayed a very strong transcriptional character (82%). The three modules exhibit topological proximity, defining a single subnetwork with 39 connections and 27 nodes [see Additional files 14 and 15]. Compared to the luminal ER- network overall, this subnetwork showed higher clustering and, whilst maintaining the average network connectivity, it exhibited a significantly shorter mean shortest path and a very low power exponent. Removing this subnetwork from the overall luminal ER- network yielded a highly fragmented graph ('non-module subnetwork') with no clustering and very low connectivity [see Additional file 17].
For the luminal ER+ network, six distinct modules were observed following the initial two passes [see Additional files 14 and 16]: the ESR (nodes ESR1, PGR, GADD45G, PTN, CRELD1, GIPC2, KRT19, MYCBP, STC2 and WFDC2), ERBB3 (nodes ERBB3, CD82 and GRB7), MLLT4 (nodes MLLT4, EPHB6 and NRXN3), COL8A1 (nodes COL8A1 and EFEMP2), KRT (nodes KRT18 and KRT8) and FGG (nodes FGG and FGB) modules (Table 3). After the third pass, modules ESR, ERBB3, KRT, COL8A1 and MLLT4 were merged into a large subnetwork with differentially expressed hubs TGFB1, AREG, MBP, CAV1, EPS15 and PCDHA4 playing an interconnecting role [see Additional file 14]. Unlike the luminal ER- network, where all modules were proximal to each other, more fragmentation was observed for the luminal ER+ network, with module FGG and differentially expressed hubs HIST1H4H, HIST1H4I and LNX1 not interconnected with the module subnetwork via any other hubs. The luminal ER+ subnetwork also had a much stronger physical interaction character than the luminal ER- subnetwork. Compared to the overall luminal ER+ network, its subnetwork showed high clustering and, whilst maintaining the average network connectivity, exhibited a significantly shorter mean shortest path. As with the luminal ER- subnetwork, removing the luminal ER+ subnetwork from the overall luminal ER+ network gave a highly fragmented graph with no clustering and very low connectivity [see Additional file 17]. The luminal ER- and luminal ER+ modules, their component nodes and potential functions are summarised in Table 3. They suggest that the major processes occurring in the luminal ER- cells are tyrosine kinase cell signalling pathways in association with cellular cytoskeletal components (the keratins) as well as passive immune signalling and inflammatory response processes. The picture in the luminal ER+ cells is less straightforward. There is no obvious unifying theme to the major ESR module, which leaves five minor modules involved in signal transduction, the cytoskeleton, cell adhesion and maintaining the extracellular matrix (although compared to the basal/myoepithelial cells, the role of the luminal ER+ cells in this is minor). The TLR module within the luminal ER- network was of particular interest (Figure (Figure4)4
To determine whether CD14 expression, and thus the potential to be able to respond to LPS, was a general property of all luminal ER- cells or only a subfraction, freshly isolated primary cells were stained with anti-CD24 and anti-Sca-1 antibodies, to enable the three main cell compartments to be identified, as well as with anti-CD14 and anti-CD61 (proposed as a marker of progenitor cells) [11] antibodies. The results (Figure (Figure6A)6A
The luminal ER+ gene expression pattern is not an 'estrogen-responsive' gene expression signature Upon examination of the luminal ER+ gene network, we noted that few of the genes enriched in the luminal ER+ population were directly linked to ESR1 by transcriptional interactions, with the exception of the progesterone receptor (PGR) [39] and the cytoskeletal protein keratin 19 (KRT19) [40]. This suggested that the gene signature of the luminal ER+ cells was not an 'estrogen responsive' gene signature. Rather, it was more likely to represent an underlying stable gene expression pattern characteristic of this differentiated cell population. To investigate this further, we compared lists of estrogen responsive genes reported in studies of estrogen-stimulated normal human mammary epithelial cells [25,26] and breast cancer cell lines with our lists of genes expressed in the epithelial subpopulations [22-24]. The results [see Additional file 18] confirmed that there was little correlation between 'estrogen-responsive' signatures and the genes enriched in the luminal ER+ population. Indeed, many of the gene whose expression was stimulated by estrogen in the breast cancer cell lines were found to be enriched in our basal/myoepithelial population. However, it should be noted that the well-known directly estrogen responsive genes KRT19 and PGR, were not found to be upregulated in most of the published datasets. Identification of basal/myoepithelial cells as key mediators of paracrine signalling Having identified key processes likely to be occurring within each of the three populations, we next investigated how the populations might be interacting with each other. Mammary epithelial biology is characterised by the conversion of systemic hormone signals into local growth factor signals which stimulate stem cell proliferation and differentiation of daughter cell types. Whilst some of these paracrine interactions have been studied in depth [15,17,41], the broad extent of paracrine networks within the mammary epithelium remains unknown. We therefore queried the gene expression array data for genes potentially involved in paracrine signalling, either as receptors or ligands (where there was a conflict between the gene expression array and qPCR data, the distribution predicted by the qPCR data was favoured). A number of genes were identified which fulfilled these criteria [see Additional file 19]. Remarkably, they showed that the basal/myoepithelial cells have more than double the complement of cell-surface receptors and ligands than either of the two luminal populations. Of particular interest, the basal/myoepithelial cells expressed the genes for the Notch family ligands, Jag1, Jag2 and Dll1 [42] whilst the gene for the Notch family receptor Notch3 [43] was expressed in both the luminal populations. Wnt family ligands were expressed by all three populations, although each cell type expressed a different complement of Wnt genes and only the basal/myoepithelial cells expressed the genes for the Frizzled receptors [44]. It is well established that Egf receptor family members and Egf family ligands are important for mammary gland development and breast cancer [45,46]. In particular, the paracrine role of Amphiregulin (Areg) is well described [15,41,47,48]. Our analysis confirmed that luminal ER+ mammary epithelial cells expressed the Areg gene but also showed that the genes for two other family members, Betacellulin (Btc) [49] and Epigen (Epgn) [50], were expressed in this cell type and Btc was also expressed in the luminal ER- cells. Interestingly, only one Egf receptor family member, Erbb3 [51], was found by gene expression array and qPCR analysis to be differentially expressed in the normal mammary epithelium. It was present in both the luminal epithelial populations but not in the basal/myoepithelial cells. This analysis extends previous findings of paracrine signalling within the mammary epithelium and emphasises the complexity of the interacting signalling networks. These include Wnt and Notch signalling, the Egf family, Fgf signalling, other receptor tyrosine kinases, G-protein coupled receptors, ligands for such receptors, integrins and ephrins/Eph receptors all of which are differentially expressed between the cell populations. In particular, the numbers of ligands and receptors expressed by the basal/myoepithelial cells indicates that this population is a key mediator of the paracrine signalling networks within the mammary epithelium. Identification of differentially regulated transcription factors within the mammary epithelium Transcriptional regulators are key cell-intrinsic factors in lineage selection and cell fate decisions in stem – differentiated cell hierarchies [52,53]. Therefore, the gene expression array data set was interrogated to identify differentially expressed factors which may regulate transcription [see Additional file 19]. The expression of a subset of these was analysed by qPCR (Figure (Figure6B).6B In common lineage progenitors, functional mutual repression and auto-stimulation by transcription factors can facilitate bilineage cell fate decisions [54]. Once the cell fate decisions have been executed, continued function of different subsets of the factors active in the progenitors are required to maintain the different differentiated cell lineages in stable fates [53]. Thus, modelling interactions between lineage-specific transcription factors can elucidate cell fate decision processes occurring in progenitor cells. Therefore, to begin to understand how interactions between lineage specific transcription factors may influence cell fate choices in the mammary epithelium, an interaction network was built. Transcription factors identified as lineage-specific but for which no interaction data exists were excluded. The resulting network (Figure (Figure7)7
Sox6 is a determinant of luminal cell fate To demonstrate that the lineage specific transcription factors we identified can indeed influence cell fate and differentiation, we chose to investigate in more detail the function of Sox6 [52], the expression of which was only detectable in one population, the luminal ER- cells (Figure (Figure6B).6B For in vitro analysis, cells were cultured for one week and then harvested, GFP+ cells were separated by flow cytometry and RNA isolated from them. The expression of Sox6, Krt14, Krt18 and Esr1 was examined by qPCR and compared to expression levels in cultured primary cells which had not been transduced with virus. The data (Figure (Figure8A)8A
For in vivo analysis, cells transduced with either GFP-only or Sox6-GFP virus were transplanted into cleared mouse mammary fat pads. After eight weeks, the transplanted fat pads were examined (Figure (Figure9A).9A
It was not possible to determine at the wholemount level whether fat pads which did not have extensive GFP+ outgrowths contained scattered GFP-labelled cells incorporated into non-GFP epithelium. Therefore, transplanted fat pads were processed to single cells, labelled with anti-Sca-1 and anti-CD24 antibodies and analysed by flow cytometry to identify GFP+ cells and determine which of the mammary epithelial cell populations they segregated with (Figure 9B, C Discussion We have described a comprehensive transcriptome analysis of three distinct mammary epithelial cell populations, basal/myoepithelial cells, estrogen receptor negative luminal epithelial cells and estrogen receptor positive luminal epithelial cells [5,8]. These data provide new support for the distinct identities of these three populations and, in particular, justify distinguishing between two major subpopulations within the luminal epithelium. We have termed these luminal ER- and luminal ER+ cells as it is expression of the estrogen receptor which makes them most easily distinguishable in tissue sections (Figure (Figure1).1 Comparison of our data set with other published data sets for separated myoepithelial and luminal cells [6,26,27,29] showed good concordance between genes previously identified as enriched in mouse mammary myoepithelial and luminal cells [6]. However, there was less agreement with genes previously identified as enriched in human myoepithelial and luminal cells. A number of factors are likely to contribute to these differences. Clearly, species differences could be important. For instance, it is known that while K14 is a basal/myoepithelial cell marker and K18 a luminal epithelial cell marker throughout the mouse mammary epithelium and in the ducts of the human breast, in the Terminal Ductal Lobuloalveolar Units of the human breast, K14 can be expressed by the luminal epithelial cells [63]. Furthermore, technical differences could influence the outcome of the analyses. In particular, it should be noted that Stingl and colleagues used the same Affymetrix platform as ourselves for their mouse analysis [6]. Finally, comparing three distinct populations against each other, rather than just two, improves the contrasts between the populations and enables more population specific genes to be detected. For example, a gene which is present in the basal/myoepithelial cells and one of the luminal populations, but not the other, may not be detected as being significantly differentially expressed when only myoepithelial and total luminal cells are compared. However, when all three populations are compared against each other, the contrast with the luminal population from which the gene is absent enables the differential expression of the gene to be detected. A novel functional cell type in the mammary epithelium The nature of the luminal ER- population as a discrete entity has been confirmed by its uniform staining profile with the 33A10 antibody [8]. However, this population appeared to contain within it further distinct functional cell types. Use of network interaction mapping on the transcriptomic profiles of the luminal ER- cells identified a Toll-like receptor (TLR) signalling module including genes for the three components of the bacterial lipopolysaccharide (LPS) receptor, Tlr4, CD14 and Ly96, as well as downstream transducers of Toll-like receptor signalling such as Irak2 and the pro-inflammatory cytokine Tnf, which is a TLR signalling target [38] (Figure (Figure44 The function of the double negative cells remains unclear. However, given that the expression of all components of the LPS receptor can be found in the luminal ER- population, it is likely that the CD14+ cells have a distinct function within the mammary epithelium as non-professional immune cells. Note that CD45+ cells were excluded from this analysis, so these are unlikely to be contaminating haematopoietic cells. Milk is an excellent growth medium for bacteria and it would be evolutionarily advantageous to have a cell type present in the breast capable of indicating the presence of bacterial contamination through Toll-like receptor signalling pathways. However, it is also likely that over-stimulation of this pathway in CD14+ mammary epithelial cells would lead to serious inflammation and would therefore be the cause of mastitis. The presence of the distinct subpopulations within the luminal ER- cells indicates that the gene expression profile of this population is derived from a mixture of different cell types. However, as the luminal ER- cells do all share at least one distinctive marker (high levels of expression of the epitope bound by the 33A10 antibody) [8] it is likely that the luminal ER- profile includes genes whose expression is common to all the cells of this subpopulation, as well as genes expressed in subsets of its cells. The basal/myoepithelial cell population is also a mixed population. However, > 90% of these cells express both keratin 14 and α-isoform smooth muscle actin and are thus differentiated myoepithelial cells [5]. Therefore, the gene expression profile of this population is largely a myoepithelial cell profile. The luminal ER+ population is also unlikely to represent a completely uniform cell population but as the majority of the cells are strongly keratin 18 positive and express the estrogen receptor [5], the gene expression profile of this population will be dominated by this single cell type. Myoepithelial cells are key regulators of paracrine signalling Transcriptome and network interaction analysis of the basal/myoepithelial cells identified key processes of these cells as cytoskeletal function and extracellular matrix production and interactions. Given what is known about the contractile function of these cells in lactation and their position in the mammary gland between the luminal epithelium and the basement membrane around the ducts and lobulo-alveolar structures, these results were reassuring. However, what was remarkable was that the genes characteristic of this population included more than double number of genes for proteins involved in cell – cell signalling than were characteristic of the two luminal populations combined. This suggests a key role for myoepithelial cells in mediating paracrine and juxtacrine cell – cell interactions in the mammary epithelium. Of particular interest were the expression patterns of Wnt and Notch signalling pathway components, known to be important in mammary gland development [64,65], which suggested a directionality of Notch signalling from basal to luminal and a directionality of Wnt signalling from luminal to basal. Notably, Notch signalling was recently shown to be important for determining luminal cell fate [19], possibly through regulation of luminal progenitors [18]. Activated Wnt signalling has also been shown to increase the stem/progenitor fraction of basal mammary epithelial cells in MMTV-Wnt1 transgenic mice [3]. Another gene from a family important for mammary development that, like Notch3, was expressed in both the luminal populations but not the basal/myoepithelial cells was Erbb3. A member of the epidermal growth factor receptor family, Erbb3 most effectively binds the neuregulins Nrg1 and Nrg2, but it has no intrinsic signalling activity of its own. It must therefore operate as a heterodimer with other family members. However, signalling complexes containing Erbb3 have a strong propensity to activate the phosphoinositide-3-kinase (PI3K) signalling pathway due to the presence of six binding sites for the p85 SH3 adaptor subunit of PI3K [51]. Erbb3 knockout animals were embryonic lethal but reduction of Erbb3 expression in the mammary epithelium caused a reduction in terminal end bud numbers, branching and ductal density [51,66]. Previous reports of Erbb3 expression have been inconsistent [67,68], most likely due to variable antibody quality, however, the ductal outgrowth defects that occur when Erbb3 expression is reduced, together with the observation that implantation of Nrg1-soaked pellets induced ductal elongation at puberty [69], support a role for Erbb3 in pubertal mammary development. Whether or not Erbb3 activation has the same consequences in both the luminal ER+ and luminal ER- cells remains to be determined and may depend on differential expression of dimerisation partners not detected by the microarray assays. Mammary epithelial cell subpopulations have distinct transcription factor profiles Mutual interactions between transcription factors associated with different cell lineages and involving positive and negative feedback loops have been demonstrated to be able to maintain haematopoietic cells in a small number of particular cell fates ('stable states') when an apparently large number of potential intermediate fates are available [54]. Transcription factors in the mammary epithelium which have the potential to interact but which are apparently expressed in different populations are therefore of particular interest in the regulation of mammary epithelial cell fate. We built a gene network to predict such interactions and identified a number of these which are potentially important. The most obvious is between Esr1 (luminal ER+) [56], Trp63 (basal) [70] and Myc (basal and luminal ER-) [55] and given the large number of transcription targets they share, it is likely that these three are key factors in determining cell fate in the mammary gland. However, there are also other interacting factors of interest such as the Runx2 (basal) [71] – Msx2 (luminal ER+) [72] pair, the Eya2 (basal) – Six4 (luminal ER-) pair [73] and the Foxa1 (luminal ER+) [57] – AR (basal and luminal ER+) [74] – Etv5 (basal) [75] triplet. Modelling these interactions in order to make predictions about how transcription factor behaviour determines mammary cell fate is an important challenge for the future. Sox6 is a determinant of luminal cell fate in the mammary gland In order to model these interactions, functional data on each individual factor will be required. Given the large number of factors of interest, a relatively rapid throughput assay will be required, ruling out the use of transgenic or knockout mice. Therefore, to demonstrate that functional information on the role in determining mammary cell fate of the transcriptional regulators identified in this study can be relatively rapidly generated, and to provide at the same time the first data required for this modelling, we examined the function of Sox6. A member of the SoxD group of the Sry-related, high mobility group box transcription factor family, Sox6 has two dimerisation domains and the HMG box domain, but no transactivation or transrepression domains [52]. Its action as an activator or repressor of transcription depends, therefore, on its binding partners [52] and it has been shown both to repress specification and differentiation of oligodendrocytes during gliagenesis [76] and promote differentiation and maturation of chondrocytes during skeletogenesis [52,77]. Sox6 has been shown to be upregulated in the mammary gland by 2-methoxyestradiol treatment [78] but its role in mammary differentiation has not been previously investigated. In this, study, Sox6 was specifically expressed in the luminal ER- cells and was undetectable in the basal and luminal ER+ cells. Over-expression of Sox6 in vitro caused an increase in Krt18 luminal marker gene expression and a slight, but significant increase in Esr1 expression. It did not change Krt14 gene expression. However, staining of the cells with antibodies to either K14 or K18 showed that while control primary mammary cells in short-term culture promiscuously expressed both K14 and K18 (as previously described) [59-61], Sox6 over-expressing cells were K18 positive but K14 negative. Therefore, Sox6 over-expression maintained the mammary epithelial cells in the luminal phenotype and prevented promiscuous K14 protein expression. Cleared fat pad transplant of Sox6 over-expressing primary cells mixed with wild-type cells failed to generate extensive GFP+ outgrowths, suggesting that Sox6 over-expression may block transplantation activity. However, rare GFP+ cells could be detected in cells isolated from Sox6 transplanted fat pads and analysed by flow cytometry. The phenotype of these rare cells was biased toward the luminal ER+ population. It is unlikely that this was due to transduction of different cell populations by the control and Sox6 viruses, as viral tropism is determined by envelope proteins and these are coded by the viral packaging plasmids, which were identical for the two viruses. The small numbers of cells which could be detected and the caveats associated with over-expression studies mean that caution must be exercised in interpreting these data. However, they are consistent with the in vitro data that Sox6 is involved in promoting or maintaining a differentiated luminal phenotype, a corollary of which is that it blocks stem cell behaviour (transplantation). A more detailed understanding of the function and mechanism of Sox6 action in the mammary epithelium must await knockdown and inducible over-expression studies. Nevertheless, our data are the first to suggest that Sox6 has a role in cell fate determination in the mammary epithelium. Conclusion This transcriptome analysis of mammary epithelial cell subpopulations has provided a framework for future studies of normal mammary epithelial cell homeostasis and the molecular pathology of breast disease. First, it has confirmed the existence of distinct luminal epithelial cell lineages with distinct gene expression patterns. Second, it has identified a novel functional specialisation within the mammary epithelium, that of non-professional immune cell. Third, it has highlighted the complexity of the potential paracrine interactions occurring within the mammary gland. Fourth, it has identified cell-type specific transcriptional regulators with potential roles in mammary epithelial cell lineage specification and fate determination and has shown how these factors are likely to operate in a complex network. Last, it has shown that one of the factors identified, Sox6, may be a determinant of luminal cell fate in the mammary epithelium. Future studies will use these data to explore the contribution of the three epithelial cell types to different tumour phenotypes. They will also focus on the role of the transcription factor network in cell fate choice and cellular homeostasis to model how perturbations in this network may lead to cancer. Methods Preparation and flow cytometry of single mammary cell suspensions All animal work was carried out under UK Home Office project and personal licences following local ethical approval and in accordance with local and national guidelines. Fourth mammary fat pads were harvested from 10 week old virgin FVB mice. Single mammary cells suspensions were prepared as described [4,5]. Mammary cell suspensions at 106 cells/ml were stained with anti-CD24-FITC (clone M1/69, BD Biosciences, Oxford, UK, 0.5 μg/ml), anti-CD45-PE-Cy5 (clone 30-F11, BD Biosciences, 0.25 μg/ml) and anti-Sca-1-PE (clone D7, BD Biosciences, 0.5 μg/ml) as described [4,5]. For anti-CD14 and CD61 staining, cells were stained with anti-CD14-PE (clone rmC5-3, BD Biosciences, 1.0 μg/ml), anti-CD61-FITC (clone 2C9.G2, BD Biosciences, 2.5 μg/ml), anti-CD24-PE-Cy5 (clone M1/69, eBioscience, Insight Biotechnology Limited, London, UK, 0.6 μg/ml), anti-Sca-1-APC (clone D7, eBioscience, 1.0 μg/ml) and anti-CD45-PE-Cy7 (clone 30-F11, BD Biosciences, 1.0 μg/ml). For analysis of fat pads transplanted with lentivirus-transduced cells, anti-CD24-PE-Cy5, anti-Sca-1-PE and anti-CD45-PE-Cy7 were used. Cells were sorted at low pressure (20 psi using a 100 μm nozzle) on a FACSAria (Becton Dickenson, Oxford, UK) equipped with violet (404 nm), blue (488 nm), green (532 nm), yellow (561 nm) and red (635 nm) lasers. Both cell sample and collection tubes were maintained at 4°C. Single stained samples were used as compensation controls. Dead cells, CD45+ leukocytes and non-single cells were excluded as described [4,5]. cDNA microarray gene expression analysis on freshly isolated mammary epithelial cells Freshly sorted mammary epithelial subpopulations were resuspended in RLT buffer (Qiagen, Crawley, West Sussex, UK) and stored at -80°C until required for RNA extraction. Total RNA was extracted using a RNeasy MinElute Kit (Qiagen), according to the manufacturers' instructions from CD24+/Low Sca-1-, CD24+/High Sca-1- and CD24+/High Sca-1+ cells isolated from three independent preparations of virgin mammary tissue. RNA quantity and purity was tested in an Agilent 2100 Bioanalyser (Agilent, Wokingham, Berkshire, UK). RNA was converted to cDNA using an oligo d(T) primer, amplified and biotin labeled using the Ambion MessageAmp II Biotin Enhanced kit (Applied Biosystems, Warrington, Cheshire, UK). The samples were fragmented to 35–200 bp and hybridized to Affymetrix Mouse Expression MOE430 2.0 arrays http://www.affymetrix.com for 16 hours at 45°C. The arrays were washed, labeled using an antibody bound to phycoerythrin and scanned according to the manufacturer's protocols. Primary array data, SAM outputs and normalised data complying with MIAME standards are available through ROCK [79]. Bioinformatic analysis Expression data were normalised and summarised by robust multi-array analysis (RMA) using the Affymetrix package in the statistical environment R 2.5 [80]. Probe sets with a standard deviation > 0.25 were used for a multiclass Significance Analysis of Microarray (SAM; version 3 Excel add-in) [30] to determine if their mean expression was different across the three subpopulations. Genes determined by this analysis to have a relative abundance of 2 or more in a population were considered characteristic of that population. Clustering analysis was carried out using CLADIST [81]. Data mining for genes of interest in paracrine signalling interactions or as transcription factors was carried out by uploading the lists of genes differentially expressed in the cell subpopulations into the DAVID Bioinformatics Resource [82] and searching the SP_PIR_KEYWORDS and GOTERM_BP_ALL lists. Network interactions maps were provided by uploading gene lists into a web-based in-house bioinformatics package and database, ROCK, developed based on the pSTIING server [81]. Interaction maps generated were manually curated to ensure no interpolated connecting genes were inappropriately added (so that, for instance, ESR1 did not appear as a connecting gene in the basal/myoepithelial network map). Quantitative PCR analysis For quanitative PCR-based gene expression analysis, cDNA synthesis was carried out using a Sensiscript RT kit (Qiagen). Up to 50 ng of RNA was transcribed into cDNA using an oligo dTn primer (Promega, Southampton, Hampshire, UK) per reaction. 0.5 μl of cDNA was used per qPCR reaction. Each analysis reaction was performed in triplicate on fresh RNA samples collected separately to those used for the microarray analysis. β-Actin was used as an endogenous control throughout all experimental analyses. Gene expression analysis was performed using TaqMan Gene Expression Assays (Applied Biosystems, Warrington, Chesire, UK) on an ABI Prism 7900HT sequence detection system (Applied Biosystems) [see Additional file 21]. Analysis was performed using the Δ-ΔCt method to determine fold changes ± 95% confidence limits in gene expression across three independently isolated samples relative to a comparator in a 'round robin' method in which each population was used in turn as the comparator. With this method, the data was separately plotted for each of the two non-comparator populations against the comparator. The non-comparator population was used to order the dataset on each graph in descending levels of relative expression from left to right. The point after which the differences in expression level between the two populations ceased to be significant (when the confidence intervals of one population overlap the mean of the other) [58] was plotted (vertical dotted line). All comparator population genes which fall to the right hand side of the vertical line in both graphs have similar or elevated levels of expression in the comparator population compared to both the non-comparator populations. Such genes were considered to be characteristic of the comparator population. Note that in cases where expression of the gene being analysed could not be detected in the comparator sample, an artificial Ct value of 40 was assigned purely to make the comparisons. The gene was still recorded as undetectable in the presented data. Lentivirus production Sox6 cDNA was kindly provided by Veronique Lefebvre (Cleveland Clinic Lerner Research Institute, Cleveland, Ohio) in pcDNA3.1 and was subcloned into pWPI lentivirus expression vector (Tronolab) [83] by PmeI digest. Viral supernatants were generated by co-transfection of the expression vector and two packaging vectors (psPAX2 and pMD2.G) into HEK293T cells. Cells were refed with fresh medium (Dulbecco's Modified Eagle's Medium, DMEM; Invitrogen, Paisley, UK) plus 10% foetal calf serum (FCS; PAA Laboratories, Yeovil, Somerset, UK) after 24 hours. Supernatants were harvested 48 and 72 hours after transfection and checked for absence of replication-competent virus. Supernatants were stored at -80°C until use. Mammary epithelial cell transduction and transplantation Freshly isolated primary mouse mammary epithelial cells were resuspended at 1 × 106 cells/ml in viral supernatant and plated at 1 ml/well in ultra-low attachment 24-well plates (Corning, Fisher Scientific, Loughborough, Leicestershire, UK) [84]. After 16 hours, the cells (now in clumps) were washed and replated in 1:1 DMEM: Ham's F12 medium (Invitrogen) with 10% foetal calf serum, 10 ug/ml insulin (Sigma), 100 ng/ml epidermal growth factor (Sigma) and 10 ng/ml cholera toxin (Sigma) (growth medium) [60] and transferred to ultra-low attachment 6-well plates (Corning). After a further 24 hours, the majority of the cell clumps were injected into cleared fat pads of 3-week old syngeneic FVB mice as described [4]. A proportion, however, were transferred to glass coverslips and/or normal tissue culture plastic and maintained in growth medium under low oxygen culture conditions [59,60] for one week. After this time, cells on plastic were trypsinized and flow sorted to isolate GFP+ cells for qPCR analysis. Cells on coverslips were fixed in 4% paraformaldehyde in PBS, stained for either keratin 14 (clone LLOO2; Abcam, Cambridge, UK) or keratin 18 (clone Ks18.04; Progen, Heidelberg, Germany), expression by standard techniques [59], counterstained with DAPI and then examined on a TCS SP2 confocal microscope with an Acousto-Optical Beam Splitter and lasers exciting at 405, 488, 555 and 633 nm (Leica Microsystems, Milton Keynes, UK). Analysis of transplanted fat pads Transplanted fat pads were analysed after eight weeks. Fat pads were stretched on glass slides and then examined under epifluorescent illumination and photographed. Fat pads injected with control transduced cells or cells transduced with Sox6 virus were then processed as separate batches to single cells as described [4,5], stained for CD45, CD24 and Sca-1 expression and analysed by flow cytometry. Multiple immunofluorescence staining of mouse mammary gland sections The protocol for multicolour immunostaining of paraffin embedded tissue has been recently described [85]. In brief, mammary fat pads from ten-week old virgin female FVB mice were fixed in 4% buffered formalin, overnight. Following standard processing, antigen retrieval was carried out on 4 μm paraffin-embedded sections by boiling in 0.01 M citrate buffer, pH 6, for 18 minutes in a microwave (900 W). Sections were then blocked for 1 hour in MOM mouse Ig blocking reagent (Vector Laboratories, Peterborough, UK; 9 μl stock MOM Ig blocking reagent in 250 μl TBS) and 30 minutes in DAKO protein block (DAKO, Ely, Cambridgeshire, UK). Sections were stained with antibodies against K14 (0.26 μg/ml; mouse IgG3 clone LLOO2; Abcam, Cambridge, UK) and K18 (diluted 1:2 from ready-to-use solution; mouse IgG1 clone Ks18.04; Progen, Heidelberg, Germany) or keratin 14 and ER (mouse IgG1 clone 1D5; 1:40 dilution; DAKO), overnight at 4°C. They were then stained with isotype-specific goat anti-mouse secondary antibodies conjugated to Alexa Fluor 488 or 555 fluorochromes, counterstained with DAPI and mounted in Vectashield (H1000; Vector Laboratories) mounting medium. Sections were examined at room temperature on the TCS SP2 confocal microscope. Multicolour images were collected sequentially in three channels. Images were captured using the Leica system and Leica TCS image acquisition software. Co-localisation overlays were generated using TCS software. Control single stained sections in which either the primary antibody was left out or the primary antibody was combined with the wrong secondary antibody showed no staining. Abbreviations APC: Allphycocyanin; DAPI: 4,6-diamidino-2-phenylindole dihydrochloride; DMEM: Dulbecco's Modified Eagle's Medium; EMT: Epithelial-mesenchymal transition; ER: Estrogen Receptor; FCS: Foetal Calf Serum; FITC: Fluorescein Isothiocyanate; K14: Keratin 14; K18: Keratin 18; L15: Leibowitz L15 medium; LPS: Lipopolysaccharide; PE: Phycoerythrin; PE-Cy5: Phycoerythrin-Cy5; PE-Cy7: Phycoerythrin-Cy7; qPCR: Quantitative real time rtPCR; TLR: Toll-Like Receptor. Authors' contributions HK collected primary mammary cell samples, isolated RNA, carried out qPCR analyses, helped analyse microarray data, carried out fat pad transplants and analysed transplant results. JR prepared virus and assisted with fat pad transplants. FM developed immunostaining protocols and stained and analysed tissue sections. AG carried out SAM analysis on array data. MZ and CM carried out clustering and network analysis on array data. MS designed the study, collected primary mammary cell samples, analysed microarray data, stained primary mouse cells in vitro and wrote the manuscript. Additional file 1 qPCR analysis of Krt14, Krt18 and Esr1 expression in mammary epithelial subpopulations. The data describes qPCR analysis of expression of Krt14, Krt18 and Esr1 in triplicate independent samples of CD24+/Low Sca-1- cells, CD24+/High Sca-1- cells and CD24+/High Sca-1+ cells. Each data point is the mean level of expression, ± 95% confidence intervals, across the three samples of that population relative to the comparator sample. A 'round robin' comparison was used as described in the Methods. Genes considered to be characteristic of the comparator population are indicated next to each pair of graphs. Krt14 expression was undetectable in the CD24+/High Sca-1- and CD24+/High Sca-1+ cells. Krt18 expression was undetectable in the CD24+/Low Sca-1- cells. Click here for file(520K, tiff) Additional file 2 Relative expression levels of 2182 Affymetrix probes across three virgin mouse mammary epithelial subpopulations. The spreadsheet gives the relative expression levels for all differentially expressed probes across all three populations. Expression levels are indicated by a relative abundance score for each populations. A high positive value indicates expression at a high level, a low negative score indicates very low expression levels. The Affymetrix probe ID, Gene Symbol and q-value (indicating the % false discovery rate) are also indicated. Click here for file(1.4M, xls) Additional file 3 Full heat map and hierarchical cluster analysis of differential gene expression across virgin mammary epithelial populations. The image shows heat map clustering of differentially expressed genes across the three cell populations. Red indicates high expression, green indicates low expression. Click here for file(5.6M, tiff) Additional file 4 Genes characteristic of basal/myoepithelial cells. The table shows all 861 genes in the basal/myoepithelial population with an abundance score of 2 or more when the 1427 differentially expressed gene set was sorted by descending abundance scores in the basal/myoepithelial population. Such genes were considered characteristic of the population. Where differential gene expression was indicated by more than one probe, an average value for each of the contrasts across the probes was calculated. The number of probes is indicated. Click here for file(669K, xls) Additional file 5 Genes characteristic of luminal ER- cells. The table shows all 326 genes in the luminal ER- population with an abundance score of 2 or more when the 1427 differentially expressed gene set was sorted by descending abundance scores in the luminal ER- population. Such genes were considered characteristic of the population. Where differential gene expression was indicated by more than one probe, an average value for each of the contrasts across the probes was calculated. The number of probes is indicated. Click here for file(323K, xls) Additional file 6 Genes characteristic of luminal ER+ cells. The table shows all 488 genes in the luminal ER+ population with an abundance score of 2 or more when the 1427 differentially expressed gene set was sorted by descending abundance scores in the luminal ER+ population. Such genes were considered characteristic of the population. Where differential gene expression was indicated by more than one probe, an average value for each of the contrasts across the probes was calculated. The number of probes is indicated. Click here for file(424K, xls) Additional file 7 Summarised gene expression microarray and qPCR gene expression analysis for 58 test genes. The table shows a comparison between the gene expression patterns determined by microarray analysis and those determined by qPCR. The gene expression microarray relative abundance scores are summarised as follows: --- = -32 to -22, -- = -22 to -12, - = -12 to -2, +/- = -2 to +2, + = 2 to 12, ++ = 12 to 22, +++ = 22 to 32, ++++ = 32 to 42. Where more than one identifier for a gene scored as differentially expressed, the mean score across all the identifiers was used to determine the summarised microarray score. The summarised score was in turn used to define the array-based expression pattern, with a score of +, ++, +++ or ++++ indicating that a gene was expressed in a particular population. In some cases, the genes were expressed in more than one population. The summarised qPCR expression pattern is based upon the patterns of gene expression determined from Figure Figure3.3 Click here for file(24K, xls) Additional file 8 Comparison of numbers of genes identified in common between basal/myoepithelial, luminal ER- and luminal ER+ cells and previously published datasets. The table compares previously published datasets with the subpopulation specific genes identified in the current analysis. Published lists of genes [27,28], probes [6] or SAGE tags [29] significantly differentially expressed between mouse basal mammary stem cell enriched/myoepithelial cells compared to in vitro colony forming cells (luminal cells) [6], human CD10- CD44+ basal cells compared to CD24+ luminal cells [29] or human CD10+ myoepithelial cells compared to EMA+ luminal cells [27,28] were condensed to remove multiple probes or tags against the same gene and to identify only well-annotated genes. The distribution of the differentially expressed genes in the basal/myoepithelial, luminal ER- and luminal ER+ gene lists from the current study was then determined. Click here for file(25K, xls) Additional file 9 List of genes common to basal/myoepithelial cells and previously published basal or myoepithelial datasets. The table lists those basal/myoepithelial genes found in previously published datasets which were also found the basal/myoepithelial population in the current study. Click here for file(40K, xls) Additional file 10 List of genes common to luminal cell subpopulations and previously published luminal datasets. The table lists those luminal genes found in previously published datasets which were also found the luminal populations in the current study. Click here for file(28K, xls) Additional file 11 Network interaction map for basal/myoepithelial specific genes. Interaction data between basal/myoepithelial specific genes based on physical interactions (black lines) and interactions in complexes (brown lines) with no interpolated genes used. The nodes are colour coded to indicate relative strengths of expression of the gene within the cell population. Brighter reds indicate highest levels of expression. Darker reds indicate genes less strongly expressed (although still with enriched expression within the population compared to the other cell types). Click here for file(3.9M, tiff) Additional file 12 Network interaction map for luminal ER- specific genes. Interaction data for luminal ER- specific genes based on physical interactions (solid lines) and transcriptional interactions (dashed lines). The nodes are colour coded to indicate relative strengths of expression of the gene within the cell population. Brighter reds indicate highest levels of expression. Darker reds indicate genes less strongly expressed (although still with enriched expression within the population compared to the other cell types). White nodes indicate interpolated genes used by the network mapping software to extend and link the network. Click here for file(3.3M, tiff) Additional file 13 Network interaction map for luminal ER+ specific genes. Interaction data for luminal ER+ specific genes based on physical interactions (solid lines) and transcriptional interactions (dashed lines). The nodes are colour coded to indicate relative strengths of expression of the gene within the cell population. Brighter reds indicate highest levels of expression. Darker reds indicate genes less strongly expressed (although still with enriched expression within the population compared to the other cell types). White nodes indicate interpolated genes used by the network mapping software to extend and link the network. Click here for file(4.3M, tiff) Additional file 14 Identification of prominent modules of differentially expressed genes in the luminal ER- and luminal ER+ networks. The results of the first, second and third pass analyses for network modules in the luminal ER- and luminal ER+ networks are shown. Rectangular nodes are first pass nodes, octagonal nodes are second pass nodes and small green oval nodes are third pass nodes. Thick red lines are first pass connections, medium size red lines are second pass connections and thin red lines are third pass connections. Black rectangles indicate differentially expressed hubs for which no modules could be built. Coloured rectangles indicate module groupings of differentially expressed genes. Solid lines indicate physical interactions, dotted lines transcriptional interactions. Click here for file(550K, tiff) Additional file 15 Topology of interaction modules within the luminal ER- network. Luminal ER- interaction modules are shown projected on to the luminal ER- network. First, second and third pass nodes are indicated as coloured rectangular, octagonal and oval nodes respectively. First, second and third pass connections are indicated as thick, medium and thin red lines respectively. Solid lines indicate physical interactions, dotted lines transcriptional interactions. The different colourings of the first and second pass nodes indicate module groupings of differentially expressed genes. Third pass nodes are coloured green. Click here for file(3.2M, tiff) Additional file 16 Topology of interaction modules within the luminal ER+ network. Luminal ER+ interaction modules are shown projected on to the luminal ER+ network. First, second and third pass nodes are indicated as coloured rectangular, octagonal and oval nodes respectively. First, second and third pass connections are indicated as thick, medium and thin red lines respectively. Solid lines indicate physical interactions, dotted lines transcriptional interactions. The different colourings of the first and second pass nodes indicate module groupings of differentially expressed genes. Third pass nodes are coloured green. Black rectangles indicate differentially expressed hubs for which no modules could be built. Click here for file(4.5M, tiff) Additional file 17 Network metrics for the luminal ER- and luminal ER+ module subnetworks. The table gives the values for the parameters describing the luminal ER- and luminal ER+ networks both with and without the identified modules. Network manipulations were performed in Cytoscape [111], and the average network clustering <C>, average connectivity <k>, power exponent γ and mean shortest path <l> were derived with the Cytoscape Random Networks plug-in. Note that due to the highly fragmented nature of the non-module subnetwork, the mean shortest path does not constitute a reliable metric. Click here for file(20K, xls) Additional file 18 List of genes common to mammary cell subpopulations and previously published datasets of estrogen-responsive genes. The table lists estrogen-responsive genes identified in previously published datasets which were also found in the mammary epithelial cell subpopulations in the current study. Click here for file(28K, xls) Additional file 19 Genes with potential roles in lineage selection/cell fate determination through paracrine signalling or as transcriptional regulators. The table lists those genes from the three populations whose protein products have potential roles in intercellular signalling or as transcriptional regulators. *Distribution confirmed by qPCR. Click here for file(41K, xls) Additional file 20 Sox6 over-expression maintains luminal differentiation in mammary epithelial cells in vitro. Additional images of immunofluorescence staining for keratin 14 (A, B) and keratin 18 (C, D) expression in primary mouse mammary epithelial cells transduced with lentivirus expressing Sox6 and GFP. A, C, bars = 30 μm. B, D, bars = 60 μm. Arrows in A indicate rare Sox6-GFP cells which are also weakly K14 positive. The majority of Sox6-GFP cells are K14 negative. Click here for file(4.7M, tiff) Additional file 21 Genes examined by qPCR analysis. The table gives the gene name, symbol, TAQMAN Assays on Demand (Applied Biosystems) assay reference and Unigene ID for all qPCR probes used. Click here for file(31K, xls) Acknowledgements The authors would like to thank Veronique Lefebvre for the gift of the Sox6 plasmid. They would also like to thank Fredrik Wallberg of the Institute of Cancer Research Flow Cytometry facility and Nipurna Jina of the Institute of Child Health Microarray facility for technical assistance. This work was funded by Breakthrough Breast Cancer. We acknowledge NHS funding to the NIHR Biomedical Research Centre. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||
J Pathol. 2001 Sep; 195(1):41-52.
[J Pathol. 2001]J Mammary Gland Biol Neoplasia. 2000 Apr; 5(2):227-41.
[J Mammary Gland Biol Neoplasia. 2000]Nature. 2006 Jan 5; 439(7072):84-8.
[Nature. 2006]Nat Cell Biol. 2008 Jun; 10(6):716-22.
[Nat Cell Biol. 2008]J Cell Biol. 2007 Jan 1; 176(1):19-26.
[J Cell Biol. 2007]EMBO J. 2005 Jun 1; 24(11):1942-53.
[EMBO J. 2005]J Cell Biol. 2007 Jan 1; 176(1):19-26.
[J Cell Biol. 2007]Annu Rev Immunol. 2006; 24():705-38.
[Annu Rev Immunol. 2006]Nat Cell Biol. 2007 Feb; 9(2):201-9.
[Nat Cell Biol. 2007]Cell. 2006 Dec 1; 127(5):1041-55.
[Cell. 2006]Genes Dev. 2008 Mar 1; 22(5):581-6.
[Genes Dev. 2008]Breast Cancer Res. 2004; 6(2):R92-109.
[Breast Cancer Res. 2004]Breast Cancer Res. 2004; 6(2):R75-91.
[Breast Cancer Res. 2004]Cell. 2006 Dec 1; 127(5):1041-55.
[Cell. 2006]Nature. 2006 Feb 23; 439(7079):993-7.
[Nature. 2006]Genome Biol. 2006; 7(4):R28.
[Genome Biol. 2006]J Cell Biol. 2007 Jan 1; 176(1):19-26.
[J Cell Biol. 2007]Proc Natl Acad Sci U S A. 2001 Apr 24; 98(9):5116-21.
[Proc Natl Acad Sci U S A. 2001]Nature. 2006 Feb 23; 439(7079):993-7.
[Nature. 2006]Cancer Res. 2004 May 1; 64(9):3037-45.
[Cancer Res. 2004]Cancer Cell. 2007 Mar; 11(3):259-73.
[Cancer Cell. 2007]Annu Rev Cell Dev Biol. 1996; 12():417-39.
[Annu Rev Cell Dev Biol. 1996]J Biol Chem. 2005 Apr 15; 280(15):14469-75.
[J Biol Chem. 2005]World J Urol. 1994; 12(5):292-7.
[World J Urol. 1994]Cell. 2008 Apr 4; 133(1):38-52.
[Cell. 2008]Genes Dev. 2008 May 15; 22(10):1276-312.
[Genes Dev. 2008]Cytokine. 2008 May; 42(2):145-51.
[Cytokine. 2008]Nat Cell Biol. 2007 Feb; 9(2):201-9.
[Nat Cell Biol. 2007]Adv Exp Med Biol. 2008; 630():94-111.
[Adv Exp Med Biol. 2008]Mol Cell Endocrinol. 2000 Jun; 164(1-2):225-37.
[Mol Cell Endocrinol. 2000]Endocr Relat Cancer. 2006 Jun; 13(2):617-28.
[Endocr Relat Cancer. 2006]Cancer Res. 2002 Aug 15; 62(16):4540-4.
[Cancer Res. 2002]Genome Biol. 2006; 7(4):R28.
[Genome Biol. 2006]Proc Natl Acad Sci U S A. 2007 Mar 27; 104(13):5455-60.
[Proc Natl Acad Sci U S A. 2007]Proc Natl Acad Sci U S A. 2006 Feb 14; 103(7):2196-201.
[Proc Natl Acad Sci U S A. 2006]Development. 2005 Sep; 132(17):3923-33.
[Development. 2005]Oncogene. 2008 Sep 1; 27(38):5148-67.
[Oncogene. 2008]Oncogene. 2008 Sep 1; 27(38):5092-8.
[Oncogene. 2008]J Mammary Gland Biol Neoplasia. 2001 Jan; 6(1):7-21.
[J Mammary Gland Biol Neoplasia. 2001]Genes Dev. 2005 Sep 1; 19(17):2078-90.
[Genes Dev. 2005]Proc Natl Acad Sci U S A. 2007 Mar 27; 104(13):5455-60.
[Proc Natl Acad Sci U S A. 2007]Development. 2005 Sep; 132(17):3923-33.
[Development. 2005]Mol Reprod Dev. 1995 Jul; 41(3):277-86.
[Mol Reprod Dev. 1995]Int J Biochem Cell Biol. 2007; 39(12):2195-214.
[Int J Biochem Cell Biol. 2007]Ann N Y Acad Sci. 2007 Jun; 1106():30-40.
[Ann N Y Acad Sci. 2007]Dev Biol. 2007 May 15; 305(2):695-713.
[Dev Biol. 2007]Ann N Y Acad Sci. 2007 Jun; 1106():30-40.
[Ann N Y Acad Sci. 2007]J Dermatol. 2004 May; 31(5):368-75.
[J Dermatol. 2004]Med Res Rev. 2008 Nov; 28(6):954-74.
[Med Res Rev. 2008]Cell Mol Life Sci. 2006 Oct; 63(19-20):2317-28.
[Cell Mol Life Sci. 2006]Int J Biochem Cell Biol. 2007; 39(12):2195-214.
[Int J Biochem Cell Biol. 2007]J Cell Biol. 2007 Apr 9; 177(1):7-11.
[J Cell Biol. 2007]In Vitro Cell Dev Biol Anim. 1998 Oct; 34(9):711-21.
[In Vitro Cell Dev Biol Anim. 1998]Stem Cell Rev. 2007 Jun; 3(2):124-36.
[Stem Cell Rev. 2007]J Cell Biol. 2007 Jan 1; 176(1):19-26.
[J Cell Biol. 2007]Breast Cancer Res. 2007; 9(6):R85.
[Breast Cancer Res. 2007]Nature. 2006 Feb 23; 439(7079):993-7.
[Nature. 2006]Cancer Res. 2002 Aug 15; 62(16):4540-4.
[Cancer Res. 2002]Cancer Res. 2004 May 1; 64(9):3037-45.
[Cancer Res. 2004]Cancer Cell. 2007 Mar; 11(3):259-73.
[Cancer Cell. 2007]Breast Cancer Res. 2005; 7(4):143-8.
[Breast Cancer Res. 2005]Breast Cancer Res. 2007; 9(6):R85.
[Breast Cancer Res. 2007]Cytokine. 2008 May; 42(2):145-51.
[Cytokine. 2008]Nat Cell Biol. 2007 Feb; 9(2):201-9.
[Nat Cell Biol. 2007]Breast Cancer Res. 2007; 9(6):R85.
[Breast Cancer Res. 2007]J Cell Biol. 2007 Jan 1; 176(1):19-26.
[J Cell Biol. 2007]J Mammary Gland Biol Neoplasia. 2004 Apr; 9(2):119-31.
[J Mammary Gland Biol Neoplasia. 2004]J Mammary Gland Biol Neoplasia. 2004 Apr; 9(2):145-63.
[J Mammary Gland Biol Neoplasia. 2004]Cell Stem Cell. 2008 Oct 9; 3(4):429-41.
[Cell Stem Cell. 2008]Cell Stem Cell. 2008 Jul 3; 3(1):109-18.
[Cell Stem Cell. 2008]Nature. 2006 Jan 5; 439(7072):84-8.
[Nature. 2006]J Mammary Gland Biol Neoplasia. 2008 Jun; 13(2):215-23.
[J Mammary Gland Biol Neoplasia. 2008]Genesis. 2006 Oct; 44(10):477-86.
[Genesis. 2006]J Histochem Cytochem. 2000 Jan; 48(1):63-80.
[J Histochem Cytochem. 2000]Cell Growth Differ. 1998 Jun; 9(6):451-64.
[Cell Growth Differ. 1998]Cell Growth Differ. 1996 Aug; 7(8):1031-8.
[Cell Growth Differ. 1996]Dev Biol. 2007 May 15; 305(2):695-713.
[Dev Biol. 2007]Med Res Rev. 2008 Nov; 28(6):954-74.
[Med Res Rev. 2008]Nat Cell Biol. 2006 Jun; 8(6):551-61.
[Nat Cell Biol. 2006]J Dermatol. 2004 May; 31(5):368-75.
[J Dermatol. 2004]J Cell Biochem. 2005 Oct 15; 96(3):484-9.
[J Cell Biochem. 2005]Int J Biochem Cell Biol. 2007; 39(12):2195-214.
[Int J Biochem Cell Biol. 2007]Dev Cell. 2006 Nov; 11(5):697-709.
[Dev Cell. 2006]Dev Cell. 2001 Aug; 1(2):277-90.
[Dev Cell. 2001]Endocrinology. 2007 Mar; 148(3):1266-77.
[Endocrinology. 2007]In Vitro Cell Dev Biol Anim. 1998 Oct; 34(9):711-21.
[In Vitro Cell Dev Biol Anim. 1998]Breast Cancer Res. 2006; 8(1):R7.
[Breast Cancer Res. 2006]J Cell Biol. 2007 Jan 1; 176(1):19-26.
[J Cell Biol. 2007]Bioinformatics. 2004 Feb 12; 20(3):307-15.
[Bioinformatics. 2004]Proc Natl Acad Sci U S A. 2001 Apr 24; 98(9):5116-21.
[Proc Natl Acad Sci U S A. 2001]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D527-34.
[Nucleic Acids Res. 2006]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D527-34.
[Nucleic Acids Res. 2006]J Cell Biol. 2007 Apr 9; 177(1):7-11.
[J Cell Biol. 2007]Cell Stem Cell. 2008 Jan 10; 2(1):90-102.
[Cell Stem Cell. 2008]J Histochem Cytochem. 1999 Dec; 47(12):1513-24.
[J Histochem Cytochem. 1999]Breast Cancer Res. 2006; 8(1):R7.
[Breast Cancer Res. 2006]In Vitro Cell Dev Biol Anim. 1998 Oct; 34(9):711-21.
[In Vitro Cell Dev Biol Anim. 1998]Breast Cancer Res. 2006; 8(1):R7.
[Breast Cancer Res. 2006]J Cell Biol. 2007 Jan 1; 176(1):19-26.
[J Cell Biol. 2007]BMC Cell Biol. 2008 Mar 19; 9():13.
[BMC Cell Biol. 2008]Cancer Res. 2004 May 1; 64(9):3037-45.
[Cancer Res. 2004]Breast Cancer Res. 2006; 8(5):R56.
[Breast Cancer Res. 2006]Nature. 2006 Feb 23; 439(7079):993-7.
[Nature. 2006]Cancer Cell. 2007 Mar; 11(3):259-73.
[Cancer Cell. 2007]