Logo of physiolgenomicsPublished ArticleArchivesSubscriptionsSubmissionsContact UsPhysiological GenomicsAmerican Physiological Society
Physiol Genomics. Mar 2009; 37(1): 12–22.
Published online Nov 18, 2008. doi:  10.1152/physiolgenomics.90341.2008
PMCID: PMC2661101

Gene expression in the human mammary epithelium during lactation: the milk fat globule transcriptome


The molecular physiology underlying human milk production is largely unknown because of limitations in obtaining tissue samples. Determining gene expression in normal lactating women would be a potential step toward understanding why some women struggle with or fail at breastfeeding their infants. Recently, we demonstrated the utility of RNA obtained from breast milk fat globule (MFG) to detect mammary epithelial cell (MEC)-specific gene expression. We used MFG RNA to determine the gene expression profile of human MEC during lactation. Microarray studies were performed using Human Ref-8 BeadChip arrays (Illumina). MFG RNA was collected every 3 h for 24 h from five healthy, exclusively breastfeeding women. We determined that 14,070 transcripts were expressed and represented the MFG transcriptome. According to GeneSpring GX 9, 156 ontology terms were enriched (corrected P < 0.05), which include cellular (n = 3,379 genes) and metabolic (n = 2,656) processes as the most significantly enriched biological process terms. The top networks and pathways were associated primarily with cellular activities most likely involved with milk synthesis. Multiple sampling over 24 h enabled us to demonstrate core circadian clock gene expression and the periodicity of 1,029 genes (7%) enriched for molecular functions involved in cell development, growth, proliferation, and cell morphology. In addition, we found that the MFG transcriptome was comparable to the metabolic gene expression profile described for the lactating mouse mammary gland. This paper is the first to describe the MFG transcriptome in sequential human samples over a 24 h period, providing valuable insights into gene expression in the human MEC.

Keywords: microarray, circadian clock genes, breastfeeding

despite all the real and purported benefits of breastfeeding, only 11% of women are still exclusively breastfeeding at 6 mo (1), with milk insufficiency being the most cited reason (10, 30). Because of ethical and practical issues that underlie obtaining mammary tissue during lactation, the molecular physiology and regulation of lactation remain to be fully elucidated in humans. In addition, in vitro studies of metabolism in human mammary tissue are complicated by the presence of multiple cell types (e.g., adipose and stromal tissue, epithelial tissue, macrophages, and lymphocytes) within the gland. Clearly, a noninvasive method would be desirable to analyze gene expression in the mammary epithelial cell (MEC) in lactating women.

During lactation, MECs are essentially “biofactories” of lipids, proteins, and carbohydrates for milk (15). Milk fat is secreted via a budding (apocrine) mechanism and brings with it a crescent of the MEC cytoplasm enveloped in plasma membrane (12). From 1 to 38% of milk fat globules (MFG) contain these crescents (24), which contain cytoplasmic organelles but no nuclei. We have previously demonstrated that substantial quantities of high-quality RNA can be isolated from this unique source. Utilizing RNA isolated from MFG, we have confirmed the expression of α-lactalbumin (LALBA), which codes for the milk protein, LALBA, a mammary epithelial-specific protein. In addition, we identified other gene transcripts, including housekeeping genes, several milk protein genes, and the insulin-like growth factor-1 receptor (IGF-1R) by qRT-PCR (17). Thus, the RNA isolated from the MFG most likely comes from the MEC. With this convenient sample source and the availability of microarray technology, it becomes feasible to determine the expression of literally thousands of genes simultaneously.

While gene expression profiling of mammary gland development has been performed extensively in mice (3, 2729), no data are available from humans. In the present studies, we sought to characterize the MFG RNA transcriptome and compare it to the MEC expression profile during lactation described in published literature for the mouse. This transcriptome will serve as a framework for future studies of MEC response to specific hormone signals in vivo and the ontogeny of gene expression from parturition to mammary gland involution.



Following approval from the Institutional Review Board and the Scientific Advisory Committee of the General Clinical Research Center (GCRC) at Baylor College of Medicine (Houston, TX), written informed consent was obtained from six healthy lactating women, five of whom completed the study. This was part of a larger, 96 h study investigating the effect of recombinant human growth hormone (rhGH) administration on the expression of milk protein and metabolic genes. Since the purpose of this paper is to describe the MFG transcriptome, only data from the first 24 h (day 1) with no rhGH treatment, were included in the analysis. Prior to inclusion, participants underwent screening tests to exclude diabetes or impaired glucose tolerance, anemia, renal or hepatic dysfunction, and current pregnancy. All women were 18–35 yr old and between 6 and 12 wk postpartum. They had singleton uncomplicated pregnancies, delivered at term (≥37 wk) and body mass index (BMI) ≤27 kg/m2. Their infants were healthy and being exclusively breastfed at the time of the study.

Study Design

Following admission to the GCRC, subjects were maintained on a regular diet (35 kcal/kg/day) divided into three meals and two snacks. Water and calorie-free drinks were available ad libitum. The women and their infants were admitted to the Texas Children's Hospital GCRC on the day of the study (day 1, 0800 h). After a light breakfast, the mothers breastfed their infants to obtain a similar baseline starting point for all women. An intravenous line was inserted into the antecubital vein under Ela-Max cream analgesia (Ferndale Laboratories, Ferndale, MI) and was infused with 0.9% NaCl at low rates to keep the vein open for blood sampling. Blood samples (2.5 ml) were collected at 1130 h and every 3 h (30 min after the start of breast pumping) until 0830 h on day 3. Blood was centrifuged at 3,000 rpm for 10 min at 4°C; the plasma was separated and transferred to a new tube then stored at −80°C. At 1100 h and every 3 h until 0800 h on day 3, breast milk was collected (see below). Thereafter, the frequencies of milk and blood sample collection were decreased to every 6 h until 0800 h of day 4. Plasma prolactin (PRL) was measured using an electrochemiluminescence assay (Elecsys 1010; Roche Diagnostics, Indianapolis, IN).

Milk Collection

Breast milk (10 ml) was collected simultaneously from both breasts via a standard breast pump (Playtex Embrace, Dover, DE) and immediately placed on ice. The infants then breastfed (~10–12 min each side), after which milk collection was resumed into the same bottles until the breasts were emptied. The milk bottles were weighed before and after milk collection on a Mettler AE50 balance (Mettler-Toledo, Greifensee, Switzerland) and kept on ice until finally processed. Approximately 10 ml milk was transferred into sterile, RNase-free tubes, tightly sealed, and then centrifuged (Sorvall Legend T) at 3,000 rpm for 10 min at 4°C. The supernatant fat layer was transferred using a sterile spatula to a new tube. TRIzol (1 ml; Invitrogen Life Technology, Carlsbad, CA) was added prior to storage at −80°C.

Milk Samples

RNA isolation.

Total RNA was isolated from TRIzol-treated milk fat following the manufacturer's suggested procedures. Total RNA concentration was measured using NanoDrop spectrophotometer (NanoDrop Technologies). RNA quality was assessed using the ExperionTM RNA StdSens Analysis Kit (Bio-Rad Laboratories, Hercules, CA).

Microarray Study Design

cRNA amplification and expression microarray.

The method of sample preparation was identical to that previously described (17). In brief, cRNA amplification and labeling with biotin were performed using Illumina TotalPrep RNA amplification kit (Ambion, Austin, TX) on an aliquot of 440 ng of total RNA as input material. In vitro transcription reaction of cDNA to cRNA was performed overnight (14 h) including biotin-11-dUTP for labeling of the cRNA product. cRNA yields were quantified with NanoDrop spectrophotometer. We hybridized 750 ng of labeled cRNAs to the Human Ref 8 V2 Beadchip arrays (Illumina, San Diego, CA) at 55°C overnight (~16 h) following the Illumina Whole-Genome Gene Expression Protocol for BeadStation (doc. #111226048 Rev. B, Illumina) and stained with 1 μg/ml streptavidin-Cy3 (Amersham Biosciences, Piscataway, NJ) for visualization. Gene expression analysis was performed using the Sentrix BeadChip and BeadStation system from Illumina. Over the 24 h study, 40 samples [8 time points × 5 subjects (biological replicates)], resulting in 40 microarrays, were analyzed. The human Ref-8 BeadChips contain sequences representing ~22,000 curated genes. Quality standards for hybridization, labeling, staining, background signal, and basal level of housekeeping gene expression for each chip were verified. After scanning the probe array, we analyzed the resulting image using the BeadStudio software (Illumina).

Data analysis.

Raw intensity data from BeadStudio were exported to GeneSpring GX 9 (Agilent Technologies, Santa Clara, CA). Signal data were q-spline normalized and filtered based on the criteria that flags for each gene should be present in all five subjects in at least one time point of sample collection. Of the ~22,000 gene probes, 14,070 were present in the MFG (from hereon called MFG gene list) based on which, further analyses were carried out. The data discussed in this publication have been deposited in National Center for Biotechnology Information's Gene Expression Omnibus (GEO) (5) and are accessible through GEO Series accession number GSE12669 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE12669).

Principal pathways and ontology.

The gene ontology (GO) option on GeneSpring GX9 was utilized to determine the most significant biological processes (corrected P < 0.05) represented in the MFG transcriptome. The MFG gene list containing gene accession numbers and symbols was uploaded into the Ingenuity Pathways Analysis (IPA, Ingenuity Systems, www.ingenuity.com) application. Each identifier was mapped to its corresponding gene object in the Ingenuity knowledge base. Networks were algorithmically generated based on their connectivity. Canonical pathways analysis identified pathways from the Ingenuity Pathways Analysis library of canonical pathways that were most significant (P < 0.05) to the dataset. The significance of the pathways represented was determined by calculating the number of genes from the dataset that met the expression value cutoff that map to the pathway divided by the total number of genes that exist in the canonical pathway displayed. A Fischer's exact test was then used to calculate a P value determining the probability that the representation of the genes in the dataset compared with the canonical pathway is greater than that explained by chance alone. For our analyses, a P value <0.05 was considered significant. In addition, we performed a gene set enrichment analysis (GSEA), as developed by Subramanian et al. (32) from the BROAD institute. The gene sets were downloaded and imported from the BROAD institute website: http://www.broad.mit.edu/gsea/msigdb/downloads.jsp into GeneSpring GX 9. The (C2) functional gene set and (C5) ontology gene set were used for comparison with the MFG gene list, which was analyzed with each time point compared with the baseline sample. The designated cutoffs were a minimum of 15 genes and a q value cutoff of 0.3.


Utilizing the clustering options of GeneSpring GX9, self-organizing maps using Euclidean metrics yielded 12 independent clusters. A cluster consisting of 559 genes with the highest expression values was considered as the most descriptive of the MEC during lactation. To further show trends in the data across time, a principal component analysis (PCA)-based clustering with mean centering and scaling was applied to the MFG gene list.

Genes relevant to lactation.

Rudolph et al. (28) categorized metabolic genes according to function in the mammary glands of lactating mice. These functions included glycolysis, citric acid cycle, fatty acid synthesis and oxidation, and lactose synthesis. Lemay et al. (15) in their paper, compiled a gene set taken from of all genes in literature associated with the biological process “lactation.” We compared our results to these gene sets and, in addition, confirmed the expression of well-known milk protein genes (21). For comparative purposes, the milk synthesis genes were ranked according to the raw fluorescence intensity and a comparison was performed between the mouse data and the human MFG to qualitatively assess the expression profile for the mouse mammary gland and the MFG.

Genes that change expression over time.

The MFG gene list was analyzed for expression changes over time by using repeated-measures ANOVA with Benjamini-Hochberg correction for multiple testing on GeneSpring GX9. Significant genes had a P < 0.05. The resulting gene list was further analyzed using hierarchical clustering [correlation, distance = (1 − P)] on the EDGE software (14) and determined the ontology and function for this list using GeneSpring GX9, GSEA, and DAVID database (4). In addition, we sought out the expression of the core circadian clock genes in the MFG gene list to determine their expression profiles within 24 h.


Clinical Parameters

Five women, 25 ± 3 yr (mean ± SE), with a BMI of 24.0 ± 0.5 kg/m2 were included in the study. Total milk volume collected during the 24 h period was 880 ± 123 ml (mean ± SE). The percentage of milk fat by weight was 6 ± 1% per 10 ml milk aliquot. The mean plasma PRL concentration (suckling-induced) was 251 ± 8 ng/ml with a nadir of 226 ± 49 ng/ml at 0800 h and a peak of 284 ± 49 ng/ml at 2300 h (Fig. 1).

Fig. 1.
Suckling-induced plasma prolactin concentration, taken 30 min after the start of breast pumping over 24 h, shows a peak during the late evening collection. The mean plasma prolactin concentration was 251 ± 8 ng/ml with a nadir of 226 ± ...

Global Gene Expression in the MFG


The MFG gene list (14,070 genes present) was analyzed for the most significant biological processes using the GO option of GeneSpring GX9 yielding 156 significant GO terms (corrected P < 0.05). The top biological process (Fig. 2) terms were 1) cellular process (corrected P = 0.0008, 3,379 genes), including cell cycle and cellular component and biogenesis; 2) metabolic process (corrected P = 0.0, 2,656 genes), including biosynthetic and macromolecule metabolic process; and 3) biological regulation (P = 0.0033, 1,302 genes). No terms were directly related to lactation or synthesis of milk proteins. For a complete list, see Supplement 1.1 By GSEA, carbohydrate biosynthetic process and exopeptidase activity gene sets were significantly enriched in the 891 gene sets containing at least 15 matching genes.

Fig. 2.
Gene Ontology (GO) output from GeneSpring GX 9 showing the biological processes for which the milk fat globule (MFG) gene list is enriched. GO terms enriched were cellular process, metabolic process, and biological regulation.

Networks and canonical pathways.

Using the IPA software, we found the five networks most highly associated with the MFG gene list to be: 1) cellular function and maintenance cell signaling, and nucleic acid metabolism (Fig. 3); 2) cancer, cell cycle, and respiratory disease; 3) DNA replication, recombination, and repair, nucleic acid metabolism, and small molecule biochemistry; 4) DNA replication, recombination, and repair, cancer, and cell cycle; and 5) protein synthesis, gene expression, and RNA trafficking. The top five metabolic canonical pathways represented in the MFG gene list were 1) purine metabolism [P = 9.88 E-08, (ratio) R= 0.54]; 2) oxidative phosphorylation (P = 3.74 E-05, R = 0.68); 3) pyrimidine metabolism (P = 3.98 E-05, R = 0.62); 4) inositol phosphate metabolism (P = 2.13 E-04, R = 0.54); and 5) propanoate metabolism (P = 2.13 E-04, R = 0.54). Ratio refers to the number of genes in the MFG gene list over the total number of genes in the respective canonical pathway. The top five signaling canonical pathways were 1) mitochondrial dysfunction (P = 3.25 E-06, R = 0.65); 2) ERK/MAPK signaling (P = 9.05 E-06, R = 0.84); 3) insulin-receptor signaling (P = 6.4 E-05, R = 0.72); 4) ephrin receptor signaling (P = 6.4 E-05, R = 0.72); and 5) NRF-2-mediated oxidative stress response (P = 1.22 E-04, R= 0.78). For a complete list of networks and canonical pathways, see Supplement 2a–c. GSEA did not yield significantly enriched functional gene sets in the 997 gene sets containing at least 15 matching genes.

Fig. 3.
Diagram of the top network enriched in the MFG gene list: cellular function and maintenance, cell signaling, and nucleic acid metabolism as analyzed by Ingenuity Pathways Analysis.


A partial list of the 559 genes belonging to the cluster with the highest expression (from self-organizing map clustering method) is shown (Table 1). Ten of the top 50 genes were genes involved in milk synthesis. Among these, the principal milk protein genes, CSN and LALBA, were the two most highly expressed genes. Ribosomal protein genes made up half of this gene list, while four genes, namely osteopontin, lysozyme, CD81, and CD36, which are known to be involved in immunological defense were also highly expressed. See Supplement 3 for a complete list. From this cluster, the most significant networks focused on 1) posttranslational modification and protein degradation; 2) cancer, GI disease, and cell-cell signaling; 3) lipid metabolism and small molecule biochemistry; and 4) cell cycle and cancer; and 5) molecular transport and small molecule biochemistry. The most significant canonical pathways associated with this top cluster were 1) oxidative phosphorylation (P = 1.27 E-08, R = 0.24), 2) pyruvate metabolism (P = 3.32 E-05, R = 0.10), 3) glycolysis/gluconeogenesis (P = 5.02 E-05, R = 0.11), 4) pentose phosphate pathway (P = 9.75 E-5, R = 0.08), and 5) ubiquinone biosynthesis (P = 1.20 E-03, R = 0.12). PCA-based clustering into three clusters was performed to further characterize the general trends in gene expression across time in the MFG gene list. There was little variance in gene expression across time, and this clustering method separated the data set predominantly according to fluorescence intensity (data not shown).

Table 1.
Partial list of genes belonging to the cluster of the most highly expressed genes in the milk fat globule gene list

Genes relevant to lactation.

A comparison of the 98 milk protein (21) and metabolic genes described by Rudolph et al. (28) in mice to the human MFG transcriptome showed that all the milk protein genes and most of the metabolic genes are common to both gene lists (Figs. 4, ,5,5, and and6).6). Although no direct comparisons may be made between the different platforms used, we ranked the MFG raw fluorescence data according to expression intensity within each category and inferred observations from this list. In both mice and the human MFG, the milk protein genes are the most highly expressed genes. Most plasma membrane transporter genes were expressed in the human MFG, except for SLC2A4 and SLC6A6. Genes for glycolysis, pentose phosphate shunt, gluconeogenesis, citric acid cycle fatty acid degradation and synthesis, and triglyceride and cholesterol synthesis were also detected in the MFG RNA. Of the 456 genes found to be associated with lactation in PubMed by Lemay et al. (15), 250 (55%) were found in the MFG RNA. See Supplement 4 for a complete list. By ranking gene expression among the milk synthesis genes, we found that 8 of the top 10 genes in the MFG were for milk protein synthesis (CSN3, LALBA, FASN, CEL, CSN2, XDH, MFGE8, MUC1, LTF, BTN1A1); while in the mouse 3 of 10 were milk protein genes (LALBA, CSN2, CSN3) and six were involved in fatty acid and triglyceride synthesis (FABP3, SCD9, FASN, ACLY, LPL, ADFP).

Fig. 4.
Milk protein genes and metabolic genes associated with milk synthesis based on mice studies by Rudolph et al. (28). Mice L9 values are taken from Rudolph et al. (28), raw expression values taken from the mammary glands of FVB mice on the 9th day of lactation. ...
Fig. 5.
Milk protein genes and metabolic genes associated with milk synthesis based on mice studies by Rudolph et al. (28). For details see Fig. 4 legend. NOC, not on chip.
Fig. 6.
Milk protein genes and metabolic genes associated with milk synthesis based on mice studies by Rudolph et al. (28). For details see Fig. 4 legend.

Genes that change expression over time.

Utilizing the MFG gene list, repeated-measures ANOVA analysis resulted in 1,029 genes that changed over time (P < 0.05). The top molecular functions (P < 0.005) associated with this gene list were 1) cell development (n = 74 genes), 2) growth and proliferation (n = 225 genes), 3) morphology (n = 93 genes), 4) assembly and organization (n = 56 genes), and 5) cell movement (n = 119 genes). Two canonical pathways were significantly enriched and associated with 1) chondroitin sulfate biosynthesis (P = 6.19 E-03, R = 0.07) and 2) fatty acid elongation in mitochondria (P = 4.69 E-02, R = 0.07). For a complete list of pathways and functions, see Supplement 5. No ontology terms were significantly enriched (P < 0.05) by GeneSpring and GSEA analysis. DAVID functional annotation analysis yielded the following enriched functions: 1) cell part/ cytoplasm (score = 4.33), 2) apoptosis/cell differentiation (score = 3.84), 3) developmental process (score = 3.63), 4) intracellular signaling cascade (score = 3.26), and 5) cellular homeostasis (score = 3.21).

Hierarchical clustering [correlation, distance = (1 − P)] of these 1,029 genes resulted in two distinct clusters (heat map, Fig. 7A). One cluster was downregulated in the middle of the day (Fig. 7A), and the other cluster was upregulated from early afternoon to the early evening hours. Circadian clock genes, which have been identified in peripheral tissues, including the mouse mammary gland, were also determined in the MFG. The period genes (Per1, Per2, Per3); cryptochrome genes (Cry1, Cry2), circadian locomotor output cycles kaput (CLOCK), aryl hydrocarbon receptor nuclear translocator-line (ARNTL), casein kinase 1 epsilon (CSNK1ε), and thyrotrophic embryonic factor (TEF) were among the genes that changed expression over time. Figure 7, B and C, shows the nine core clock genes and the changes in expression during a 24 h period.

Fig. 7.
A: heat map of 1,029 genes whose expression significantly changed across time generated by RM ANOVA from the MFG gene list of 14,070 genes. Each gene is represented by a row of colored boxes, and each time point is represented by a column. Red indicates ...


Because of the virtual lack of any data on the molecular physiology of human lactation, we have initially set out to describe global gene expression in the MEC. Perhaps the biggest barrier to global gene expression analysis, as far as human lactation is concerned, is the availability of the lactating tissue (34). Breast milk not only is ideal nutrition for the infant but contains a wealth of gene expression information (17). Laser capture microdissection and immunomagnetic sorting are among the technologies that have been used to isolate mammary epithelium from surrounding tissue (2, 33). Using microarray studies, we are the first to utilize RNA from MFG in humans to determine the molecular events in the MEC. This eliminates the need for breast tissue biopsies and the separation of the mammary epithelium from the rest of the mammary gland constituents for gene expression studies. Our group has validated the Illumina microarray data against TaqMan qRT-PCR (Supplement 6).

Although many of the protein activities in milk have been co-opted from other physiological processes, the LALBA and casein (CSN) proteins are only expressed in the lactating mammary gland. Each diploid cell contains two copies of these genes, yet they are expressed only in the MEC of women in response to specific input of a number of hormones (34). We have previously demonstrated the expression of LALBA in the MFG by qRT-PCR (17). In addition, CSN3 and LALBA are the two most highly expressed genes of the 14,070 genes expressed (of 22,184 present in the array). Lemay et al. (15) identified a lactation literature gene set consisting of 456 genes that have been reported to be associated with lactation. Presumably, a large portion of the genes reported in the literature set, but that we did not find, may well be derived from the other cells found in the mammary gland since many published lactation studies deal with whole mammary gland explants or biopsies. Jones et al. (13) performed microarray studies to determine the difference in gene expression between luminal (milk-secreting) and myoepithelial cells in the mammary gland, utilizing samples from reduction mammoplasty in humans and separated the cell types by immunomagnetic sorting. Genes considered specific for the luminal epithelium, e.g., LCN2 (lipocalin), CLDN4 (claudin 4), and MUC1 (mucin), were expressed in our MFG samples (8). A few myoepithelial-specific genes such as TIMP1 and TIMP3 (TIMP metallopeptidase inhibitor-1 and -3) and SERPINB5 (serpin peptidase inhibitor) were expressed as well. Of the 60 gene probes for collagens in the Human Ref 8 V2 array, 14 were expressed in MFG RNA. Adipocyte-specific genes (27) [resistin (RETN), adipocyte-specific protein adipoQ (AdipoQ), and insulin-activated amino acid transporter (Slc1a7)] were not expressed in the MFG RNA. All together, this is compelling evidence that the RNA obtained from the MFG comes from the MEC.

The MFG gene list comprises 14,070 genes present in all five subjects in at least one of the eight time points (within 24 h) when milk was collected. With this gene list, we are able to describe not only the genes present in the MFG, but also the changes that occur throughout the day. The principal significant ontologies referred to cellular and metabolic processes (Fig. 2). This is not surprising since the MEC is a highly synthetic cell during lactation. Self-organizing maps clustered the 14,070 genes according to expression intensity. Table 1 shows the top 50 genes based on fluorescence expression intensity. It is worth noting that 10 of these genes are involved in milk synthesis and 24 are ribosomal proteins, providing a validation of the array. Supporting this too is the enrichment of the carbohydrate biosynthetic process gene set by GSEA. Thus, the MEC centers its synthetic activities on the production of milk components. Osteopontin, lysozyme, CD36, and CD81 are genes that are highly expressed and have been reported to possibly play a role in the infant's immunologic defense (11, 20, 35). Actin, tubulin, and thymosins are genes that encode for cytoskeletal proteins and are highly expressed in the MFG possibly for transporting and trafficking products for eventual secretion in milk. The most significant ontologies associated with this top cluster involved translation, biosynthetic processes, and ribosomal constituents. However, no terms specific for lactation were found. Lemay et al. (15) have pointed out the problem of lack of annotation for lactation-specific functions. We further analyzed the data set according to the most significant networks and canonical pathways.

The top networks (Fig. 3) were associated with cellular function and maintenance, cell cycle, DNA replication, and protein synthesis, all linked with the high metabolic and synthetic activity of the MEC. Of the 129 analyzed pathways, 46 were found to be significantly enriched in the MFG gene list; the most significant being purine metabolism, supporting the highly synthetic processes in the mammary epithelium. None of the functional gene sets by GSEA were significantly enriched using the MFG gene list despite 997 gene sets with at least 15 matching genes. ERK/MAPK signaling pathway, known to be activated by several growth factors, was also highly significant. In mice, PI3-AKT, integrin, and ubiquitination pathways were the top pathways found to be significant during lactation (15). These were also found to be significantly overrepresented in our data set. The ERK/MAPK signaling pathway is thought not to be activated by insulin, IGF-1, or PRL. These hormones are known to influence mammary differentiation and synthesis of milk components primarily via PI3-AKT and STAT signaling (9, 21). That the ERK/MAPK pathway is significantly enriched in the MFG gene list is very interesting and may support findings by Finlay et al. (6) of its importance in MEC survival in the presence of insulin and cell-extracellular matrix (ECM) association. This ECM-determined MAPK-dependent mammary epithelial cell survival pathway may be independent of the PI3 kinase-AKT associated intracellular signaling pathway. Furthermore, epidermal growth factor (EGF), a ligand that acts through ERK/MAPK signaling is central in the top network, shown in Fig. 3. In contrast to MEC survival, Zhao et al. (36, 37) have found a role for MAPK in mammary involution and apoptosis in the presence of IL-6.

Of the 14,070 expressed genes in the MFG, analysis by RMANOVA yielded 1,029 (7%) genes that were significantly changed throughout the day (P < 0.05). A heat map of these 1,029 genes shows that the genes cluster into two groups, those that are highly expressed late in the evening until early morning and a set that is initially under expressed and get turned on later in the day (Fig. 7A). These genes were involved in cell development, growth, and proliferation, apoptosis, and intracellular signaling cascade, suggesting that the expression of these genes change within a 24 h period. The pathway for fatty acid elongation in the mitochondria was enriched, suggesting that there may be diurnal variation in mitochondrial fatty acid metabolism. PRL, a lactogenic hormone whose role in established lactation is still unclear, had a peak concentration late at night (Fig. 1), supporting other studies that show that PRL circadian rhythm persists in lactation (31). The mammary gland has long been known to be influenced by hormones, some of which are secreted rhythmically during the day, e.g., PRL, growth hormone, and cortisol. These hormones are known to influence gene expression (7, 16, 26). However, in this study, we are unable to demonstrate whether PRL influences gene expression directly.

Recently, circadian clock gene expression has been reported in various peripheral tissues, including the mammary gland in mice (19). These genes are endogenous oscillators that generate transcriptional rhythms for daily timing of physiological processes. We demonstrated the expression of the core clock genes (CLOCK, ARNTL, PER1-3, CRY1, CRY2, CSNK1epsilon, and TEF) and have shown their expression profiles to cycle within 24 h in the MEC. It is known that CLOCK and ARNTL dimerize and drive the rhythmic transcription of three Period and two Cryptochrome genes (19). Figure 7, B and C, clearly demonstrated this in that CLOCK and ARNTL cluster together and the rest of the circadian clock genes are grouped together. This is the first report of the expression of peripheral clock genes in the human mammary epithelium during lactation. The importance of these genes during lactation remains to be determined.

Because of the lack of annotation for genes associated with lactation, we made use of mouse data by Rudolph et al. (28) to study the expression of milk protein and metabolic genes involved in milk synthesis in the MFG. Milk protein genes are the most highly expressed genes in both mice and human MFG. CSN expression is ranked higher in the mouse than in humans, which reflects the lower CSN fraction of human milk, ~0.2% by weight while it makes up as much as 12% of rodent milk (22). Milk lactose content is species specific, and human breast milk is known to contain the highest concentration of lactose (25). By ranking the milk synthesis genes, human MFG expresses milk protein genes to a higher extent than genes involved in other metabolic processes. The mouse expression data showed that more genes involved in fatty acid and triglyceride synthesis were highly expressed. Mouse milk is known to have a higher fat concentration (23), which may explain the expression data. Most genes involved in glycolysis, gluconeogenesis, citric acid cycle, fatty acid degradation, fatty acid, triglyceride synthesis, and cholesterol synthesis were expressed in the MFG with a few exceptions: fatty acid desaturase (FADS1) and phosphoenolpyruvate carboxykinase 1 (PCK1), glucose transporter 4 (SLC2A4) and solute carrier family 6, neurotransmitter transporter, taurine (SLC6A6), and fatty acid binding protein 2 and 5 (FABP2, FABP5). Others like fatty acid binding protein 9 (FABP9), acyl-coenzyme A thioesterase 1 (ACOT1), and stearoyl-CoA desaturase 2 (SCD2) were not represented in the chip. The absence of SLC2A4 expression is consistent with reports that this transporter is expressed by adipocytes in the mammary gland (28). PCK2 but not PCK1 was expressed, and we have previously confirmed this by qRT-PCR (data not shown). FADS1 was not expressed in the MFG but a closer look at our data showed that it was present in three of five subjects. It was excluded since the criteria we set for data refinement using Gene Spring GX9 was that each gene be present in all subjects in at least one time point. The human mammary epithelium has been reported to synthesize medium chain fatty acids de novo (18, 29). The main function of FADS1 is to catalyze the desaturation of long-chain fatty acids. Utilizing the milk fat for RNA isolation precluded further analysis of the fatty acid chain lengths in the milk samples. We are currently studying the de novo synthesis of fatty acids in the mammary gland through different techniques.

In summary, the MFG RNA is a unique and rich source of easily obtainable sample for exploring changes in gene expression in humans. We have demonstrated the expression of metabolic genes involved in milk synthesis that have been previously reported from animal studies. Finally, data collection over 24 h demonstrated the expression of core clock genes and that many genes change expression over time. We believe that analysis of the MFG will provide important insights into gene regulation not only as it relates to human lactation but to other human tissues as well. We advocate the need for better annotation and for designing studies which take into account the variation of gene expression throughout the day.


This project was supported by National Institute of Diabetes and Digestive and Kidney Diseases Grant 5 RO1 DK-055478. This work is a publication of the United States Department of Agriculture (USDA)/Agricultural Research Service, CNRC, Department of Pediatrics, Baylor College of Medicine, Houston, TX. The contents of this publication do not necessarily reflect the views of policies of the USDA, nor does mention of trade names, commercial products, or organizations imply endorsement from the US Government. The authors declare that there is no conflict of interest that would prejudice the impartiality of scientific work.

Supplementary Material

[Supplemental Tables]


The authors thank the following for invaluable help: Karen Jones, William Jeong, Michael Kueht, Brent Manning, Susan Sharma, Amy Pontius, Cindy Bryant, Linda Peasant, and the nursing staff of the GCRC.


Address for reprint requests and other correspondence: M. W. Haymond, Dept. of Pediatrics - Nutrition, Children's Nutrition Research Center #7062, Baylor College of Medicine, 1100 Bates, Houston, TX 77030 (E-mail: ude.cmt.mcb@dnomyahm).

The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.


1The online version of this article contains supplemental material.


1. (CDC) CfDCaP. Breastfeeding trends and updated national health objectives for exclusive breastfeeding—United States, Birth Years 2000–2004 In: MMWR Morb Mortal Wkly Rep 2007, p. 760–763. [PubMed]
2. Balogh GA, Heulings R, Mailo DA, Russo PA, Sheriff F, Russo IH, Moral R, Russo J. Genomic signature induced by pregnancy in the human breast. Int J Oncol 28: 399–410, 2006. [PubMed]
3. Clarkson RW, Wayland MT, Lee J, Freeman T, Watson CJ. Gene expression profiling of mammary gland development reveals putative roles for death receptors and immune mediators in post-lactational regression. Breast Cancer Res 6: R92–R109, 2004. [PMC free article] [PubMed]
4. Dennis G, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 4: P3, 2003. [PubMed]
5. Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30: 207–210, 2002. [PMC free article] [PubMed]
6. Finlay D, Healy V, Furlong F, O'Connell FC, Keon NK, Martin F. MAP kinase pathway signalling is essential for extracellular matrix determined mammary epithelial cell survival. Cell Death Differentiation 7: 302–313, 2000. [PubMed]
7. Golden KL, Rillema JA. Effects of prolactin on galactosyl transferase and alpha-lactalbumin mRNA accumulation in mouse mammary gland explants. Proc Soc Exp Biol Med 209: 392–396, 1995. [PubMed]
8. Grigoriadis A, Mackay A, Reis-Filho JS, Steele D, Iseli C, Stevenson BJ, Jongeneel CV, Valgeirsson H, Fenwick K, Iravani M, Leao M, Simpson AJ, Strausberg RL, Jat PS, Ashworth A, Neville AM, O'Hare MJ. Establishment of the epithelial-specific transcriptome of normal and malignant human breast cells based on MPSS and array expression data. Breast Cancer Res 8: R56, 2006. [PMC free article] [PubMed]
9. Hadsell DL, Olea W, Lawrence N, George J, Torres D, Kadowaki T, Lee AV. Decreased lactation capacity and altered milk composition in insulin receptor substrate null mice is associated with decreased maternal body mass and reduced insulin-dependent phosphorylation of mammary Akt. J Endocrinol 194: 327–336, 2007. [PubMed]
10. Hamlin B, Brooker S, Oleinikova K, Wands S. Infant Feeding 2000. The Stationery Office, 2002.
11. Hennart PF, Brasseur DJ, Delogne-Desnoeck JB, Dramaix MM, Robyn CE. Lysozyme, lactoferrin, and secretory immunoglobulin A content in breast milk: influence of duration of lactation, nutrition status, prolactin status, and parity of mother. Am J Clin Nutr 53: 32–39, 1991. [PubMed]
12. Huston GE, Patton S. Factors related to the formation of cytoplasmic crescents on milk fat globules. J Dairy Sci 73: 2061–2066, 1990. [PubMed]
13. Jones C, Mackay A, Grigoriadis A, Cossu A, Reis-Filho JS, Fulford L, Dexter T, Davies S, Bulmer K, Ford E, Parry S, Budroni M, Palmieri G, Neville AM, O'Hare MJ, Lakhani SR. Expression profiling of purified normal human luminal and myoepithelial breast cells: identification of novel prognostic markers for breast cancer. Cancer Res 64: 3037–3045, 2004. [PubMed]
14. Leek JT, Monsen E, Dabney AR, Storey JD. EDGE: extraction and analysis of differential gene expression. Bioinformatics 22: 507–508, 2006. [PubMed]
15. Lemay DG, Neville MC, Rudolph MC, Pollard KS, German JB. Gene regulatory networks in lactation: identification of global principles using bioinformatics. BMC Syst Biol 1: 56, 2007. [PMC free article] [PubMed]
16. Lkhider M, Petridou B, Aubourg A, Ollivier-Bousquet M. Prolactin signalling to milk protein secretion but not to gene expression depends on the integrity of the Golgi region. J Cell Sci 114: 1883–1891, 2001. [PubMed]
17. Maningat PD, Sen P, Sunehag AL, Hadsell DL, Haymond MW. Regulation of gene expression in human mammary epithelium: effect of breast pumping. J Endocrinol 195: 503–511, 2007. [PubMed]
18. Mather IH, Keenan TW. Origin and secretion of milk lipids. J Mammary Gland Biol Neoplasia 3: 259–273, 1998. [PubMed]
19. Metz RP, Qu X, Laffin B, Earnest D, Porter WW. Circadian clock and cell cycle gene expression in mouse mammary epithelial cells and in the developing mouse mammary gland. Dev Dyn 235: 263–271, 2006. [PMC free article] [PubMed]
20. Nagatomo T, Ohga S, Takada H, Nomura A, Hikino S, Imura M, Ohshima K, Hara T. Microarray analysis of human milk cells: persistent high expression of osteopontin during the lactation period. Clin Exp Immunol 138: 47–53, 2004. [PMC free article] [PubMed]
21. Naylor MJ, Oakes SR, Gardiner-Garden M, Harris J, Blazek K, Ho TW, Li FC, Wynick D, Walker AM, Ormandy CJ. Transcriptional changes underlying the secretory activation phase of mammary gland development. Mol Endocrinol 19: 1868–1883, 2005. [PubMed]
22. Neville MC Milk secretion: an overview. Denver, CO: http://mammary.nih.gov/Reviews/lactation/Neville001/index.html, 1998.
23. Oftedal OT, Iverson S. Comparative Analysis on Nonhuman Milks. New York: Academic, 1995.
24. Patton S, Huston GE. Incidence and characteristics of cell pieces on human milk fat globules. Biochim Biophys Acta 965: 146–153, 1988. [PubMed]
25. Picciano MF Comparative Lactation-Humans. (2007). http://classes.ansci.uiuc.edu/ansc438/Lactation/humans.html (21 May 2008).
26. Rosen JM, Wyszomierski SL, Hadsell D. Regulation of milk protein gene expression. Annu Rev Nutr 19: 407–436, 1999. [PubMed]
27. Rudolph MC, McManaman JL, Hunter L, Phang T, Neville MC. Functional development of the mammary gland: use of expression profiling and trajectory clustering to reveal changes in gene expression during pregnancy, lactation, and involution. J Mammary Gland Biol Neoplasia 8: 287–307, 2003. [PubMed]
28. Rudolph MC, McManaman JL, Phang T, Russell T, Kominsky DJ, Serkova NJ, Stein T, Anderson SM, Neville MC. Metabolic regulation in the lactating mammary gland: a lipid synthesizing machine. Physiol Genomics 28: 323–336, 2007. [PubMed]
29. Rudolph MC, Neville MC, Anderson SM. Lipid synthesis in lactation: diet and the fatty acid switch. J Mammary Gland Biol Neoplasia 12: 269–281, 2007. [PubMed]
30. Segura-Millan S, Dewey KG, Perez-Escamilla R. Factors associated with perceived insufficient milk in a low-income urban population in Mexico. J Nutr 124: 202–212, 1994. [PubMed]
31. Stern JM, Reichlin S. Prolactin circadian rhythm persists throughout lactation in women. Neuroendocrinology 51: 31–37, 1990. [PubMed]
32. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102: 15545–15550, 2005. [PMC free article] [PubMed]
33. Thompson PA, Kadlubar FF, Vena SM, Hill HL, McClure GH, McDaniel LP, Ambrosone CB. Exfoliated ductal epithelial cells in human breast milk: a source of target tissue DNA for molecular epidemiologic studies of breast cancer. Cancer Epidemiol Biomarkers Prev 7: 37–42, 1998. [PubMed]
34. Ward RE, German JB. Understanding milk's bioactive components: a goal for the genomics toolbox. J Nutr 134: 962S–967S, 2004. [PubMed]
35. Yabe U, Sato C, Matsuda T, Kitajima K. Polysialic acid in human milk. CD36 is a new member of mammalian polysialic acid-containing glycoprotein. J Biol Chem 278: 13875–13880, 2003. [PubMed]
36. Zhao L, Hart S, Cheng J, Melenhorst JJ, Bierie B, Ernst M, Stewart C, Schaper F, Heinrich PC, Ullrich A, Robinson GW, Hennighausen L. Mammary gland remodeling depends on gp130 signaling through Stat3 and MAPK. J Biol Chem 279: 44093–44100, 2004. [PubMed]
37. Zhao L, Melenhorst JJ, Hennighausen L. Loss of interleukin 6 results in delayed mammary gland involution: a possible role for mitogen-activated protein kinase and not signal transducer and activator of transcription 3. Mol Endocrinol 16: 2902–2912, 2002. [PubMed]

Articles from Physiological Genomics are provided here courtesy of American Physiological Society

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...