• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Cancer Res. Author manuscript; available in PMC May 1, 2011.
Published in final edited form as:
PMCID: PMC2866518

Proteomic Characterization of Novel Alternative Splice Variant Proteins in HER2/neu-induced Breast Cancers


Multifaceted alternative splicing in cancer cells greatly diversifies protein structure independently of genome changes, but characterization of cancer-associated splice variants is quite limited. In this study, we used mass spectrometric data to interrogate a custom-built database created with three-frame translations of mRNA sequences from Ensembl and ECgene to find alternative splice variant proteins. In mass spectrometric files from LC-MS/MS analyses of normal mouse mammary glands or mammary tumors derived from MMTV-Her-2/neu transgenic mice, we identified a total of 608 alternative splice variants, of which peptides from 216 proteins were found only in the tumor sample. Among the 608 splice variants were 68 novel proteins that were not completely matched to any known protein sequence in mice, for which we found known functional motifs. Biological process enrichment analysis of the splice variants identified suggested involvement of these proteins especially in cell motility and translation initiation. The cancer-associated differentially-expressed splice variant proteins offer novel biomarker candidates that may function in breast cancer progression or metastasis.

Keywords: Alternative splice variants, proteomics, peptides, novel, breast cancer, biomarker candidates


By means of alternative splicing and post-translational modifications, one gene can generate a variety of proteins. Alternative splice events that affect the protein coding region of the mRNA will give rise to proteins which differ in their sequence and activities. Alternative splicing within the non-coding regions of the RNA can result in changes in regulatory elements such as translation enhancers or RNA stability domains, which may dramatically influence protein expression (1). Alternative splicing has been associated with such diseases as growth hormone deficiency, Fraser syndrome, cystic fibrosis, spinal muscular atrophy, and myotonic dystrophy (2, 3). In cancers, there are examples of every kind of alternative splicing, including alternative individual splice sites, alternative exons, and alternative introns (4).

We have devised a proteomic informatics approach to identify alternative splice variants of both known and novel proteins. Briefly, we search mass spectrometric data against a custom-built, non-redundant database created with three-frame translations of mRNA sequences from ECgene and Ensembl databases (5, 6). The ECgene database is a large publicly-available alternative splice variant database. The peptide sequences identified are analyzed using Blast and Blat searches and integrated to distinct proteins.

We are analyzing proteomic datasets from mouse models for several human cancers. Our study of the KRas G12D /Ink4a-Arf mouse model of human pancreatic ductal adenocarcinoma identified known and novel alternative splice variants from mouse plasma and found significant differential expression in proteins of the mutant compared with normal (6). Here we present the alternative splice variant analysis of data from LC-MS/MS analyses of tumor and normal mammary tissue from a mouse model of Her-2/neu-driven breast cancer (7). Many novel and known splice variants were detected only in the tumor sample. These variants may affect mechanisms of cancer progression and metastasis.

Materials and Methods

Whiteaker et al (7) performed LC-MS/MS of tumor and normal mammary tissue from a conditional HER2/neu-driven mouse model of breast cancer, identifying 6758 peptides representing >700 proteins. We downloaded their mzXML files containing the spectral information from the PeptideAtlas (8). The original study reported that cancerous and normal tissues were harvested from 5 doxycycline-inducible, MMTV-rtTA/TetO-NeuNT mice and 5 normal mice, respectively, processed separately into tissue lysates. Two pools were prepared, containing equal mass of protein, and digested by trypsin for mass spectrometric analysis. The mzXML files were searched against our modified ECgene database for alternative splice variant analysis using X!Tandem software (9). The modified ECgene database was constructed by combining Ensembl 40 and ECgene databases (mm8, build 1), as described previously (6).

Figure 1 summarizes the analytical work flow which was slightly modified from the method used previously that employed both TransProteomicPipeline (TPP) and Michigan Peptide-to-Protein Integration (MPPI) analyses; TPP Q3Ratio and XPRESS applications were necessary for the quantitation of acrylamimde-labeled plasma samples (6). No such labeling was performed in the Her2/neu breast cancer analysis, so MPPI was sufficient. Peptides that were identified by X!Tandem search with false discovery rate (FDR) < 1% (based on peptides identified from reverse sequences) were used in our first peptide to protein integration (Michigan Peptide-to-Protein Integration |MPPI|) analysis. The threshold applied on peptide identifications to give FDR < 1% was that they have to be identified either with X!Tandem expect value < 0.001 or with three or more spectra with expect value < 0.01. We applied a threshold to the final integrated alternative splice proteins so that each protein had to be identified by two or more distinct peptides or, if identified by a single peptide, by three or more spectra with expect value < 0.01.

Figure 1
The Flow chart displaying the analytical work flow of the X!Tandem search results for the identification of Alternative Splice Variant proteins.

qRT-PCR Validations of Novel Peptides

Quantitative RT-PCR (qRT-PCR) was performed to observe the differential mRNA expression of novel peptides in tumor versus normal samples. The mRNA samples from the mammary tissues of tumor-bearing and normal mice were provided by the laboratory of Dr. Amanda Paulovich at the Fred Hutchinson Cancer Research Center in Seattle, WA. RNA purification and the qRT-PCR methods were the same as described (6).

Among the 45 novel peptides identified only in the tumor sample, we were able to design primer pairs with optimal properties for 32; these primers amplified the novel mRNA sequences corresponding to each of those 32 peptides (Supplementary Table 1). The expression of rodent glyceraldehyde 3-phosphate dehydrogenase (gapdh) was determined as an internal control. The analyses were done on tissue samples from 5 pairs of normal and tumor containing mice and the mean mRNA expressions values and standard deviations were calculated accordingly. According to Whiteaker et al (7) tumor-containing and normal mice were paired at weaning and were matched with respect to age, sex, litter, cage, and treatment protocols.

Annotation of Novel Peptides

To characterize the novel peptides identified, tools including ELM and Motif Scan were used. ELM is a resource for predicting functional sites in eukaryotic proteins (10). Motif Scan scans a sequence against protein profile databases (11). The parameters used for searching functional motifs in Motif Scan were frequent match producers (“prosite patterns”) and prosite profiles. We also used the Berkeley Drosophila Genome Project’s Splice Site Prediction by Neural Network (12) predicting alternative splice sites which may have generated these novel peptides.

Differential Expression of Known Alternative Splice Variants

The MPPI method integrates the peptide identifications to a set of proteins and reduces the final set of proteins while retaining all of the peptide identifications (6). The integrated list of proteins is given in the Supplementary Table 2, combining those identified with unique peptide and those based on the total number of distinct peptides and spectra (6). For analysis of differential expression analysis, we focused our attention to the proteins that were identified by unique peptides. If the known variant with unique peptides was identified only in one sample type, we considered it as differentially expressed if its unique peptide was identified by three or more spectra with X!Tandem expect value < 0.01.

A spectral counting method was used to determine the differential expression of proteins based on peptides that were identified from both tumor and normal tissue samples. The total number of spectra that were identified with X!Tandem expect value < 0.01 for a particular protein was used as the spectral count value. The normalized spectral counts for the proteins were statistically validated using G test (13). G test is a likelihood ratio test for goodness-of-fit. The calculated G-value was then used to assess whether the protein was differentially expressed according to the chi-square distribution table with one degree of freedom. The proteins with G larger than 3.84 are differentially expressed with P < 0.05.

Protein Interactions

Ontology enrichment analysis was performed to assess overall functional character of the tumor-associated variants. The gene symbols of the tumor-associated variants, including the differentially-expressed variants and the novel variants found only in the tumor sample, were uploaded into MetaCore™, a systems biology pathway analysis tool (14). MetaCore™ enrichment analysis matches the dataset to terms in GeneGO functional ontologies, providing a ranked representation of ontologies that are most saturated or "enriched" with the input data. For this sample set, we used a general enrichment category, GeneGO Biological Processes. This ontology represents prebuilt networks of manually curated protein-protein, protein-compound/metabolite, or protein-nucleic acid interactions, assembled by GeneGO scientific annotators based on curated literature evidence. Each GeneGO Process Network represents a comprehensive biological process with a specific functional theme. Similar analysis was done using the gene symbols of all alternative splice variants identified in the tumor sample.

Direct protein interactions were displayed by the Cytoscape MiMI plugin (15) using the 460 parent gene symbols of the 505 alternative splice variants identified from the tumor sample. The Cytoscape MiMI plugin enables one to connect to MiMI database and view the interactions. Michigan Molecular Interactions database (MiMI) gathers data from well-known protein interaction databases and deep-merges the information. We used the protein interactions from the Human Protein Reference Database (HPRD) in MiMI. All the information in HPRD has been manually extracted from the literature by expert biologists who read, interpret and analyze the published data.


Summary of Alternative Splice Variant Identifications

Numbers of alternative splice variant protein identifications from the X!Tandem search analyses (Fig 1) are summarized in Table 1. The total of 608 distinct alternative splice variants includes 540 known and 68 novel proteins with peptides that did not align to any known mouse protein sequence; details of each protein are given in Supplementary Table 2. Based on one or more distinct peptide identifications, we found 216 more distinct proteins from the tumor sample than from the normal sample (505/289). A similar ratio was found when the threshold for protein identification based on peptides was more stringent (Table 1).

Table 1
Summary of total number of Alternative Splice proteins after the analyses shown in Figure 1. The novel protein identifications are given in parentheses.

Novel Protein Identifications

Detailed peptide sequence analysis showed 68 proteins (54 proteins from the tumor sample and 23 proteins from the control sample) with peptide sequences that did not align with known mouse protein sequences. These proteins are considered novel identifications. There were 9 novel proteins found in common from both samples. Among these 68 novel proteins we found splice variants resulting from new translation start sites, new splice sites, extension or shortening of exons, deletion or switch of exons, intron retention, and translation in an alternative reading frame (Supplementary Table 3).

qRT-PCR Validations of Novel Variants

As described in Methods, we were able to design optimal primers for 32 of the 45 novel peptides found only in the tumor sample. RT-PCR successfully amplified each of the mRNAs corresponding to these 32 peptide sequences (Fig 2a) in both normal and tumor samples. The novel mRNA for eukaryotic translation initiation factor 4B (eIf4b) showed amplification only in the tumor sample in the sample pair shown in Fig 2a; three of the other four sample pairs assayed for this eIf4b novel mRNA did amplify in the normal. Even though the novel peptide from lethal (3) malignant brain tumor like protein 3 (l3mbt3) variant shows weak amplifications in the gel image, faint bands of correct size were visible in the gel under ultraviolet light.

Figure 2
a: Electrophoresis gel images showing the RT-PCR amplifications. The original picture was cropped using Adobe Photoshop. RT-PCR analyses of 31 novel peptides in breast tissue lysates from Her2/neu mice with breast tumors and normal mice. The amplifications ...

Curiously, the RT PCR gel showed that primers we designed for the novel peptide ‘EYPDRIMNTFSLTTPTYGDLNHLVSATMSGVTTCLR’ of tubulin, beta 2c (tubb2c) amplified a product of 201 bp size instead of 63 bp expected. The primers were ‘gagtacccagaccgcatcat’ (EYPDRI) and ‘tggttgagatcgccataggt’ (TYGDLN). These primer sequences occur with 5’ and 3’ ends of a peptide sequence (67 aa) of known variant of tubb2c gene; the amplification of 201 bp band we observed was actually the mRNA sequence that translated to this known peptide. Hence, we validated only 31 instead of 32 novel peptides at the mRNA level.

Figure 2b shows the relative mRNA expressions corresponding to the 31 novel peptides by qRT-PCR. Except for peptides from clathrin heavy peptide (cltc) and superoxide dismutase 1 soluble (sod1), the mean RNA expression values of all peptides, along with their standard deviations, from the sets of 5 pairs of samples indicated increased mRNA expression in the tumor samples.

LC MS/MS data can be of low accuracy; however, the validation of 31 of the 32 novel peptides by qRT-PCR provides confidence to our findings. Supplementary Figure 1 contains the MS/MS spectra of the following 16 novel peptides.

Annotation of Novel Peptides Up-regulated in Tumor Sample

ELM and Motif Scan analyses were used (see Methods) to find functional motifs in the novel peptides. The following 16 novel peptides identified only in the tumor sample by proteomic analysis, with increased mRNA expression by PCR, had functional motif annotations that may be potentially significant in cancers.

2 variants with interesting annotations for BRCA

  • The peptide sequence ‘FSRAEAEGPGQACPPRPFPC’ is in the second intronic region of leucine-zipper-containing LZF (rogdi) gene. Using Splice Site Prediction by Neural Network, we found a predicted donor splice site ‘gactgaggtgaggtg’ where the novel peptide was identified as coding sequence, with a Splice Site Prediction score 0.93. Many functional motifs were identified in this section of intronic sequence including LIG_BRCT_BRCA1_1, a phosphopeptide motif which directly interacts with low affinity with the BRCT (carboxy-terminal) domain of the breast cancer gene BRCA1.
  • UCSC Blat analysis showed that the peptide ‘GSGLVPTLGRGAETPVSGAGATRGLSR’ identified from tumor sample aligns to the first intronic region of transcription factor sox7. Predicted splice sites and the same LIG_BRCT_BRCA1_1 motif were found in this intronic region.

2 variants annotated with tyrosine-based sorting signal motif

Tyrosine-based sorting signal motif mediates rapid internalization from the cell surface (16). Tyrosine-based internalization of the neu proto-oncogene product (17) makes the following 2 novel variants interesting.

  • A novel variant of tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein (ywhah) was identified by peptide ‘RARLAEQASAMKAVTELNEP’; this peptide has 7 amino acids missing when compared to the known ywhah peptide sequence ‘RARLAEQAERYDDMASAMKAVTELNEP’. The missing ‘ERYDDMA’ sequence has the tyrosine-based sorting signal motif.
  • Peptide ‘IYYSFGALKLGCFNFPLLKFL’, identified by three distinct spectra in tumor samples, did not align to any known mouse gene region by UCSC Blat, but aligned perfectly to a region in mouse chromosome 7. Blat alignment showed sequence conservation in dog, horse, human, orangutan, and rat. ELM functional motifs TRG_ENDOCYTIC_2 and LIG_MAPK_2 were found in this peptide. TRG_ENDOCYTIC_2 is a tyrosine-based sorting signal responsible for interaction with the mu subunit of AP (Adaptor Protein) complex. LIG_MAPK_2 is a MAP kinase docking motif.

12 variants with casein kinase II (CK2) phosphorylation, protein kinase phosphorylation (PKC), and N-myristoylation sites (listed alphabetically)

Increased level of phosphorylation of proteins related to breast cancers has been observed in many studies (18, 19).

  • Peptide ‘AICPLVPPLPGQVIHHCQSLS’ aligned to the 5’UTR region of Rho GDP dissociation inhibitor (GDI) alpha (arhgdia) gene, this region contains CK2 phosphorylation and N-myristoylation sites.
  • A variant of casein alpha s1 (csn1s1) was identified by peptide ‘SEEQAMASAQEAMTP’, translated from end of exon 8 (ENMUSE00000597730), exon 9 (ENMUSE00000695124), and end of intron 9. This sequence contains a CK2 phosphorylation site, USP7 NTD domain binding motif variant, and major TRAF2-binding consensus motif.
  • Peptide ‘RHSPSVNFHPDSTFD’ aligned to the first intronic region of the CTD small phosphatase-like protein (ctdspl) gene. Glutamine amidotransferase type 1 domain and a CK2 phosphorylation site were found in this peptide.
  • A variant of ATP-dependent RNA helicase (ddx17) was identified by ‘DSAAPAAAPTAEAPPPPSVITRPEPQALPSSVIR’ that aligned to the 5’UTR region. In contrast to peptides identified from the known Ensembl variant ENSMUSP00000055535, this peptide was translated in a different frame and contains a CK2 phosphorylation site.
  • Amino acids ‘YPPSSAGERGGFNKPG’ of the peptide ‘YPPSSAGERGGFNKPGGPMDEGPDL’ aligned to the end of intronic 9 region of RNA-binding protein EWS (ewsr1) and the remaining part of the novel peptide matched to the exon 10 sequence. The part of the peptide from the intronic region has a CK2 phosphorylation site.
  • Amino acids ‘DAP’ of the peptide ‘ITFDDHKNGSCGVSYIAQEPDAP’ are from intron 40 of filamin beta (flnb); a CK2 phosphorylation site is found at this intronic region. In addition, the amino acid sequence ‘AQEPDAP’ is a motif recognized by SH3 domains with a non-canonical class I recognition specificity.
  • Peptide ‘EARSLSDGGPADSVEAAK’, which identified the novel variant of nucleosome assembly protein 1-like 4 (nap1l4), is translated from the Ensembl exon1-exon 3 junction of nap1l4 gene. We found that the ‘SLSD’ amino acids match a CK2 phosphorylation site.
  • A novel variant of pyruvate kinase muscle (pkm2) was identified by the peptide ‘GHPGPEVWGGAGCGHGVCIFPAAVGAVEASFK’; ‘GHPGPEVWGGAGCGHGVCIF’ is from the middle section of exon 6 (ENMUSE00000218447) and ‘PAAVGAVEASFK’ is from the middle section of exon 9 (ENMUSE00000533471). Two N-myristoylation and one PKC phosphorylation sites were found in this peptide sequence.
  • A variant of ferritin heavy chain 1 (fth1) was identified by ‘ATETARLLPGTALAEAQSPLRRLTLTQAPPR’, which aligned to the 5’UTR sequence; PKC phosphorylation and N-myristoylation sites are present here.
  • Peptide ‘PPPSLSLLAPSPSLLALGALAAAWASAAGPLSGRFSMVIDNGIV is from the translation of entire intron 5 of peroxiredoxin-5 (prdx5) gene; N-myristoylation and PKC phosphorylation sites are found in this peptide sequence.
  • Peptide ‘GTRGDGEGDGGDPVTAR’ is translated from Ensembl exon 1 (ENMUSE00000468886) of Ras-related protein Rap-1b (rap1b). Currently annotated rap1b protein does not contain the exon 1 sequence. Exon 1 of rap1b contains CK2 phosphorylation, N-myristoylation, and PKC phosphorylation sites.
  • Peptide ‘RGQKPPAMPQPVPTA’ aligned to known ribosomal protein S3 (rps3) with a deletion of 87 amino acids between ‘RGQ’ and ‘PPAMPQPVPTA’. Unlike the peptides annotated above, this peptide shows deletions of 2 CK2 and one PKC phosphorylation sites.

Differentially-Expressed Known Alternative Splice Variants

We found 53 known splice variants that were identified by unique peptides as differentially expressed by our analysis (Table 2; Supplementary Table 4 lists these variants with their unique peptides).

Table 2
Differentially-expressed known Alternative Splice Variants that were identified by unique peptides

Motif Scan Analysis

Supplementary Table 5 shows all the motifs identified in differentially expressed known splice variants using Motif Scan with prosite patterns and prosite profiles as search parameters. Table 3a shows the top 5 frequently occurring prosite patterns in the 53 known differentially expressed splice variants. CK2 phosphorylation, PKC phosphorylation and N-myristoylation sites were found 1.5 times more frequently in differentially expressed variants than in 53 randomly selected normal proteins. We refer to these 53 known alternative splice variants along with the 45 novel proteins found only in tumor sample as “tumor-associated variants”.

Protein Interaction Networks

Statistically significant biological process networks by Metcore™ analyses of tumor-associated variants (Table 3b) and all variants identified in tumor sample (Supplementary Table 6). Cytoskeleton rearrangement, integrin mediated cell adhesion, and translation initiation are found in common among the top ranking networks from both analyses. Figure 3 shows the direct protein interactions displayed by Cystoscape MiMI plugin. This figure shows 177 of 460 input gene symbols interacting. The gene names in bold denote the differentially expressed alternative splice variants, including many of the variants annotated above. Nevertheless, it is important to mention here the limitation of these interaction analyses as we are using only the alternative splice variants identified in the tumor sample, which is a small subset of the total proteome in this tissue.

Figure 3
Protein Interaction Network displayed by MiMI-Cytoscape plugin. The parent gene symbols of the alternative splice variants found only in the tumor sample were used as the input gene list. Only the direct interactions between the input genes are shown. ...


Alternative splicing allows a single gene to generate multiple mRNAs, which can be translated into functionally and structurally diverse proteins (20). In this study of the Her2/neu mouse model of human breast cancers, we identified a total of 608 distinct splice variants from tumor and control samples; 68 proteins were proteins with novel peptides which were not previously reported in protein databases. With qRT-PCR, the mRNA sequences of 31 novel peptide sequences selected for validation were successfully amplified from tumor and control tissue lysates. In the original study by Whiteaker et al (7) a quantitative MRM-MS analysis was performed to confirm over-expression of 15 biomarker candidates in tumor tissue lysate. We were able to identify 10 of these proteins in our splice variants analysis (Supplementary Table 7). Overall, we found 216 more distinct proteins in the tumor sample than in the normal tissue. Even though differential detection of a peptide in two such samples may be due to sampling issues in LC-MS/MS experiments, the large difference observed here is probably due to a higher abundance of many cellular proteins and greater cellularity in tumor tissue versus normal.

Functional motif analyses of the differentially expressed known variant and novel sequences showed frequent occurrences of N- myristoylation, PKC and CK2 phosphorylation sites. Elevated levels of casein kinase II have long been associated with increased cell growth and proliferation both in normal and cancer cells (21). The field of protein myristoylation is still in its infancy; however, Shrivastav et al reported a potent inhibitor for N-myristoylation as a novel molecular target for cancer (22). PKC is a family of serine/threonine kinases that is involved in the transduction of signals for cell proliferation and differentiation. Mackay et al (23) suggested PKCs as potential target for anti breast cancer drug.

The direct protein-protein interactions of 179 proteins identified in the tumor sample displayed by Cytoscape plugin (Fig 3) point to the possible involvement of these variants in Her2/neu breast cancer mechanisms. The interactions between differentially expressed variants of cell division cycle 42 (cdc42), radixin (rdx), rho GDP dissociation inhibito (GDI) beta (arhgdia), and methionyl aminopeptidase (metap2) are noteworthy (Fig 3). Known and novel peptides of arhgdia were identified by our analysis. Arhgdia directly interacts with ezrin/radixin/moesin (ERM)-CD44 system, initiating the activation of the Rho subfamily members including cdc42 which then regulate reorganization of actin filaments (24). Actin reorganization is a key step in metastasis (25). The novel peptide of arhgdia is from the 5’UTR sequence which contains a CK2 phosphorylation site and a N-myristoylation site. Arhgdia, cdc42, and rdx are previously implicated in breast cancer mechanisms(2628). But the role of metap2 is not yet clear. Tucker et al (29) analyzed the expression of metap2 in cancer patient samples by immunohistochemistry; moderate-to-high staining was identified in the majority of breast, colon, lung, ovarian, and prostate carcinomas examined. In our analysis, the peptide that identified the variant for metap2 is a novel peptide from the currently annotated 5’UTR region. The qRT-PCR analysis showed an increased mRNA expression of this novel peptide in tumor sample (Fig 2b).

The common top ranking biological processes by GeneGO enrichment analysis of tumor-associated and all variants identified in the tumor sample were cytoskeleton rearrangement, integrin-mediated cell adhesion, and translation initiation (Table 3b). Cytoskeleton rearrangement and integrin-mediated cell adhesion are essential parts of cell motility processes (30). We found biologically interesting functional annotations for variants involved in cell motility and translation initiation processes.

Cell Motility

Motility and invasiveness of breast cancer cells are the result of a number of cell activities: directional migration underpinned by the dynamic organization of cytoskeletal components (actin micro filaments and microtubules), establishment and disruption of cell-matrix and homotypic/heterotypic cell-cell adhesions, and extracellular proteolysis (31). GeneGO enrichment analysis showed cdc42, fibulin 1 (fbln1), rdx, talin (tln), vinculin (vcl), and zyxin (zyx) involved in cell motility (Table 3b).

CDC42 is a small GTPase of the Rho-subfamily, which regulates signaling pathways that control diverse cellular functions including cell morphology, migration, endocytosis and cell cycle progression (32) and is known to be increased in human breast tumors (26). Phosphorylation activates cdc42; a potential site for protein kinase-mediated phosphorylation, corresponding to serine185, has been identified in the amino acid sequence of Cdc42 (33). The Ensembl variant of cdc42, ‘ENSMUSP00000054634’, which was over-expressed in tumor, had the same protein length of 191 amino acids as its other known variant ‘ENSMUSP00000030417’. However, ‘ENSMUSP00000054634’ has an additional protein kinase phosphorylation site (aa 185–187).

Fbln1 is known to be up-regulated in breast cancer (34). By MotifScan search we found that the fbln1 variant ENSMUSP00000054583, identified in tumor, did not have the cAMP- and cGMP-dependent protein kinase phosphorylation site reported for the other known Ensembl variant of fbln1, ENSMUSP00000105058. Instead this variant had additional one CK2 and three PKC phosphorylation sites. cGMP-dependent protein kinase II inhibits cell proliferation (35) whereas CK2 and PKC phosphorylations are known to activate many oncoproteins (36, 37).

The proline-rich motifs in specific variants of rdx (ENSMUSP00000000590), vcl (ENSMUSP00000022369), and zyx (ENSMUSP00000070427) participate in delivering actin monomers to specific cellular locations where actin-rich membrane protrusions, such as ruffles, filopodia and microspikes, are formed. These protrusions are necessary for cell motility (38). Interestingly the other known variants of these genes did not contain the proline rich region. The ENSMUSP00000070427, variant of zyx had two more CK2 phosphorylation sites than the other known variant of zyx.

We found ENSMUSP00000103533, the shortest variant of tln1 (121 aa long), only in the tumor sample, whereas the peptides from the longer variant, ENSMUSP00000030187 (2541 aa long), were found in both samples. The longer variant is up-regulated in tumor (Table 2). Talin is involved in cytoskeleton rearrangement and integrin-mediated cell adhesion (Table 3b). Talin's amino-terminal head, which consists of a FERM domain, binds an NPxY motif within the cytoplasmic tail of most integrin subunits and strengthens integrin adhesion to the extra-cellular matrix. According to Huang et al (39), the Cdk5 phosphorylation of the longer variant of tln1 at Ser425 controls its turnover, adhesion stability, and, ultimately, cell migration. The absence of FERM domain and cdk5 binding sites in the shorter variant suggests an alternative role of this protein in cell adhesion and cell migration.

Translation Initiation

Translation initiation was another top ranking process in the GeneGo enrichment analysis. Ribosomal proteins and translational regulation have been implicated in control of cellular transformation, tumor growth, aggressiveness, and metastasis (40), including differential expression in breast cancer (4143). We identified differentially-expressed known and novel variants of ribosomal proteins and eukaryotic translation intiation factors (Supplementary Table 3 and 4). We identified novel peptides from 5’UTR regions of 12 genes (Supplementary Table 3); which suggests alternative translation start sites in these genes. The identification of 13 eukaryotic translation initiation factors and 19 ribosomal proteins (Supplementary Table 2) from tumor sample compared to 4 eukaryotic translation initiation factors and 4 ribosomal proteins in control sample (Supplementary Table 2) indicates more complex RNA translation mechanisms in this Her2/neu cancer model. The expression of ENSMUSP00000032992, the longer eIf3c variant, is increased in the tumor sample. This protein contains a bipartite nuclear localization signal (Supplementary Table 5) which suggests that it may be transported to the nucleus. There is some evidence of eIf3c occurring in the nucleus consistent with reports of intranuclear protein translation and regulation of protein translation by interaction with the COP9 signalosome (44). A novel variant of eukaryotic translation initiation factor 4 subunit b (eIf4b) was up-regulated almost 6-fold in tumor, as shown in Figure 2b by qRT-PPCR; furthermore, its amplification failed in 2 control samples of the 5 sample pairs assayed. eIf4b is a RNA-binding protein that greatly enhances the activity of eIf4a.

A known variant of ribosomal protein 3 (rps3), ENSMUSP00000032998, identified by a unique peptide, showed increased expression in tumor. This variant has a proline-rich region which is absent in the other known variant of the same gene. RpS3 is critically involved in translation as a component of the 40S ribosomal subunit and participates in the processing of DNA damage, functioning as a damage DNA endonuclease (45). A novel variant of the same gene identified by ‘RGQKPPAMPQPVPTA’, with increased mRNA expression by qRT-PCR, has a deletion of 87 aa when compared to the known variant. The absence of two CK2 and one PKC phosphorylation sites in this novel variant may influence its function in translation or DNA repair.

Differentially-Expressed Novel Variants

Except for the novel peptides of cltc and sod1, all the other 29 novel peptides validated by qRT-PCR showed increased mRNA expressions in the tumor sample (Fig 2b). These novel peptides were identified only in the tumor sample by proteomic analysis which suggests their potential roles in breast cancer mechanisms. The inverse correlation observed for mRNA-protein expression of cltc and sod1 peptides might be due to numerous factors, including negative feedback by the protein variant on mRNA synthesis, post-transcriptional control of protein translation, protein modifications, and different time-course features. There are several studies showing varying correlation between mRNA and protein abundance ratios (46, 47).

Identification of a novel peptide from the pyruvate kinase (pkm2) gene is of interest as pkm2 is known to play metabolic roles in breast cancer (48). Two N-myristoylation sites and a PKC phosphorylation site were found in this peptide sequence. The novel peptides of arhgdia, fth1, pkm2, prd5, and rap1b had N-myristoylation sites. The novel peptide of tar DNA binding protein (tardbp) differed from the canonical peptide sequence by 4 amino acids (‘GSMQ‘ instead of ‘ACGL’) . Tardbp is a DNA and RNA-binding protein which regulates transcription and splicing. The substitution of the amino acids in the novel variant may affect its DNA-binding function.

We found tumor-associated peptides from the non-coding intronic regions of genes; these variants may be involved in breast cancer mechanisms. We noted in Results the occurrence of functional motifs that bind with breast cancer gene brca1 in the intronic regions of rogdi and sox7 genes, where we identified novel peptides. mRNA expression of the novel peptides from rogdi and sox7 gene showed increased mRNA expressions in tumor samples (Fig 2b).

Novel peptides aligned to 5’UTR regions suggest alternative translation initiation sites. Variants of ddx17 and l3mbt3 were identified by such peptides. Ddx17 (p72) is reported to be part of the transcriptional complex that binds to many Sp1 sites on the BRCA2 promoter to activate its transcription by inducing histone acetylations (49). The human-l (3)mbt3 gene has been mapped to chromosome 20q12. Major genetic alterations of this region, including large deletions and translocations, have not been reported; however, amplification of 20q11-13 is common in breast cancers and correlates with poor prognosis (50).

In summary, this study has identified known and novel splice variants including many with higher expression in tumor tissue from mice with doxycycline-inducible, MMTV-rtTA/TetO-NeuNT-mediated breast cancer versus wild-type mice. These tumor-associated splice variant proteins may have roles in many mechanisms related to breast cancer progression and metastasis. The functional motif analyses of novel peptides show presence or absence of many relevant functional sites. These data suggest that alternative splice variant proteins are a potentially source of candidate biomarkers for Her2/neu and other breast cancers. Further studies will be necessary to elucidate the splice mechanisms, to delineate major subtypes of breast cancer, and to evaluate these variant proteins as potential biomarkers in humans.

Supplementary Material


The authors thank Dr. Amanda Paulovich and Mr. Travis Lorentzen for providing the tissue samples for RT-PCR validations. We thank Dr. Ram Menon for providing access to his laboratory to perform RT-PCRs. Finally, we thank Dr. David States for early encouragement in this series of studies.

Supported by NCI/SAIC contract N01-CO-12400/Sub-k 23XS110, MTTC GR 687, U54 DA021519 National Center for Integrative Biomedical Informatics


1. Bracco L, Kearsey J. The relevance of alternative RNA splicing to pharmacogenomics. Trends in Biotechnology. 2003;21:346–353. [PubMed]
2. Faustino NA, Cooper TA. Pre-mRNA splicing and human disease. Genes & Development. 2003;17:419–437. [PubMed]
3. Garcia-Blanco MA, Baraniak AP, Lasda EL. Alternative splicing in disease and therapy. Nat Biotech. 2004;22:535–546. [PubMed]
4. Venables JP, Klinck R, Bramard A, et al. Identification of Alternative Splicing Markers for Breast Cancer. Cancer Res. 2008;68:9525–9531. [PubMed]
5. Fermin D, Allen B, Blackwell T, et al. Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics. Genome Biology. 2006;7:R35. [PMC free article] [PubMed]
6. Menon R, Zhang Q, Zhang Y, et al. Identification of Novel Alternative Splice Isoforms of Circulating Proteins in a Mouse Model of Human Pancreatic Cancer. Cancer Res. 2009;69:300–309. [PMC free article] [PubMed]
7. Whiteaker JR, Zhang H, Zhao L, et al. Integrated Pipeline for Mass Spectrometry-Based Discovery and Confirmation of Biomarkers Demonstrated in a Mouse Model of Breast Cancer. Journal of Proteome Research. 2007;6:3962–3975. [PubMed]
8. PeptideAtlas. 2009. [cited; Available from: http://www.peptideatlas.org/repository/
9. Craig R, Beavis RC. TANDEM: matching proteins with tandem mass spectra. Bioinformatics. 2004;20:1466–1467. [PubMed]
10. The Eukaryotic Linear Motif resource for Functional Sites in Proteins: Functional site prediction. 2009. [cited; Available from: http://elm.eu.org/
11. MyHits: MotifScan. 2009. [cited; Available from: http://myhits.isb-sib.ch/cgi-bin/motif_scan.
12. Berkeley Drosophila Genome Project: Splice Site Prediction by Neural Network. 2008. [cited; Available from: http://www.fruitfly.org/seq_tools/splice.html.
13. Ambatipudi KS, Lu B, Hagen FK, Melvin JE, Yates JR. Quantitative Analysis of Age Specific Variation in the Abundance of Human Female Parotid Salivary Proteins. Journal of Proteome Research. 2009;8:5093–5102. [PMC free article] [PubMed]
14. Nikolsky Y, Ekins S, Nikolskaya T, Bugrim A. A novel method for generation of signature networks as biomarkers from complex high throughput data. Toxicology Letters. 2005;158:20–29. [PubMed]
15. Gao J, Ade AS, Tarcea VG, et al. Integrating and annotating the interactome using the MiMI plugin for cytoscape. Bioinformatics. 2009;25:137–138. [PMC free article] [PubMed]
16. Bonifacino JS, Dell'Angelica EC. Molecular bases for the recognition of tyrosine-based sorting signals. J Cell Biol. 1999;145:923–926. [PMC free article] [PubMed]
17. Gilboa L, Ben-Levy R, Yarden Y, Henis YI. Roles for a Cytoplasmic Tyrosine and Tyrosine Kinase Activity in the Interactions of Neu Receptors with Coated Pits. Journal of Biological Chemistry. 1995;270:7061–7067. [PubMed]
18. Cicenas J, Urban P, Vuaroqueaux V, et al. Increased level of phosphorylated akt measured by chemiluminescence-linked immunosorbent assay is a predictor of poor prognosis in primary breast cancer overexpressing ErbB-2. Breast Cancer Research. 2005;7:R394–R401. [PMC free article] [PubMed]
19. Lee MY, Joung YH, Lim EJ, et al. Phosphorylation and activation of STAT proteins by hypoxia in breast cancer cells. The Breast. 2006;15:187–195. [PubMed]
20. Wang H, Hubbell E, Hu J-s, et al. Gene structure-based splice variant deconvolution using a microarry platform. Bioinformatics. 2003;19:i315–i322. [PubMed]
21. Trembley J, Wang G, Unger G, Slaton J, Ahmed K. Protein Kinase CK2 in Health and Disease. Cellular and Molecular Life Sciences. 2009;66:1858–1867. [PubMed]
22. Shrivastav A, Pasha MK, Selvakumar P, et al. Potent inhibitor of N-myristoylation: a novel molecular target for cancer. Cancer Res. 2003;63:7975–7978. [PubMed]
23. Mackay HJ, Twelves CJ. Protein kinase C: a target for anticancer drugs? Endocr Relat Cancer. 2003;10:389–396. [PubMed]
24. Takahashi K, Sasaki T, Mammoto A, et al. Direct interaction of the Rho GDP dissociation inhibitor with ezrin/radixin/moesin initiates the activation of the Rho small G protein. J Biol Chem. 1997;272:23371–23375. [PubMed]
25. Daisuke Y, Shusaku K, Tadaomi T. Regulation of cancer cell motility through actin reorganization. Cancer Science. 2005;96:379–386. [PubMed]
26. Fritz G, Brachetti C, Bahlmann F, Schmidt M, Kaina B. Rho GTPases in human breast tumours: expression and mutation analyses and correlation with clinical parameters. Br J Cancer. 2002;87:635–644. [PMC free article] [PubMed]
27. Fernando HMT, Douglas-Jones A, Kynaston HG, Mancel RE, Jiang WG. Expression of the ERM family members (ezrin, radixin and moesin) in breast cancer. Experimental and Therapeutic Medicine. 2010;1:153–160. [PMC free article] [PubMed]
28. Ellenbroek S, Collard J. Rho GTPases: functions and association with cancer. Clinical and Experimental Metastasis. 2007;24:657–672. [PubMed]
29. Tucker LA, Zhang Q, Sheppard GS, et al. Ectopic expression of methionine aminopeptidase-2 causes cell transformation and stimulates proliferation. Oncogene. 2008;27:3967–3976. [PubMed]
30. Lauffenburger DA, Horwitz AF. Cell migration: a physically integrated molecular process. Cell. 1996;84:359–369. [PubMed]
31. Bracke ME, Maeseneer D, Marck V, et al. Cell motility and breast cancer metastasis. Metastasis of Breast Cancer. 2007:47–75.
32. Vega FM, Ridley AJ. Rho GTPases in cancer cell biology. FEBS Letters. 2008;582:2093–2101. [PubMed]
33. Forget MA, Desrosiers RR, Gingras D, Beliveau R. Phosphorylation states of Cdc42 and RhoA regulate their interactions with Rho GDP dissociation inhibitor and their extraction from biological membranes. Biochem J. 2002;361:243–254. [PMC free article] [PubMed]
34. Greene LM, Twal WO, Duffy MJ, et al. Elevated expression and altered processing of fibulin-1 protein in human breast cancer. Br J Cancer. 2003;88:871–878. [PMC free article] [PubMed]
35. Swartling FJ, Ferletta M, Kastemar M, Weiss WA, Westermark B. Cyclic GMP-dependent protein kinase II inhibits cell proliferation, Sox9 expression and Akt phosphorylation in human glioma cell lines. Oncogene. 2009;28:3121–3131. [PMC free article] [PubMed]
36. Luscher B, Kuenzel EA, Krebs EG, Eisenman RN. Myc oncoproteins are phosphorylated by casein kinase II. Embo J. 1989;8:1111–1119. [PMC free article] [PubMed]
37. Boyle DM, van der Walt LA. Enhanced phosphorylation of progesterone receptor by protein kinase C in human breast cancer cells. Journal of Steroid Biochemistry. 1988;30:239–244. [PubMed]
38. Holt MR, Koffer A. Cell motility: proline-rich proteins promote protrusions. Trends in Cell Biology. 2001;11:38–46. [PubMed]
39. Huang C, Rajfur Z, Yousefi N, Chen Z, Jacobson K, Ginsberg MH. Talin phosphorylation by Cdk5 regulates Smurf1-mediated talin head ubiquitylation and cell migration. Nat Cell Biol. 2009;11:624–630. [PMC free article] [PubMed]
40. Zhu Y, Lin H, Li Z, Wang M, Luo J. Modulation of expression of ribosomal protein L7a (rpL7a) by ethanol in human breast cancer cells. Breast Cancer Research and Treatment. 2001;69:29–38. [PubMed]
41. Sharma P, Sahni N, Tibshirani R, et al. Early detection of breast cancer based on gene-expression patterns in peripheral blood cells. Breast Cancer Research. 2005;7:R634–R644. [PMC free article] [PubMed]
42. Henry JL, Coggin DL, King CR. High-Level Expression of the Ribosomal Protein L19 in Human Breast Tumors That Overexpress erbB-2. Cancer Res. 1993;53:1403–1408. [PubMed]
43. Al-Maghrebi MAY, Anim JT, Olalu AA. Up-regulation of Eukaryotic Elongation Factor-1 Subunits in Breast Carcinoma. Anticancer Research. 2005;25:2573–2577. [PubMed]
44. Yahalom A, Kim TH, Winter E, Karniol B, von Arnim AG, Chamovitz DA. Arabidopsis eIF3e (INT-6) associates with both eIF3c and the COP9 signalosome subunit CSN7. J Biol Chem. 2001;276:334–340. [PubMed]
45. Kim T-S, Kim HD, Kim J. PKC[delta]-dependent functional switch of rpS3 between translation and DNA repair. Biochimica et Biophysica Acta (BBA) - Molecular Cell Research. 2009;1793:395–405. [PubMed]
46. Gygi SP, Rochon Y, Franza BR, Aebersold R. Correlation between Protein and mRNA Abundance in Yeast. Mol Cell Biol. 1999;19:1720–1730. [PMC free article] [PubMed]
47. Washburn MP, Koller A, Oshiro G, et al. Protein pathway and complex clustering of correlated mRNA and protein expression analyses in Saccharomycescerevisiae. Proceedings of the National Academy of Sciences of the United States of America. 2003;100:3107–3112. [PMC free article] [PubMed]
48. Luftner D, Mesterharm J, Akrivakis C, et al. Tumor type M2 pyruvate kinase expression in advanced breast cancer. Anticancer Res. 2000;20:5077–5082. [PubMed]
49. Jin W, Chen Y, Di GH, et al. Estrogen receptor (ER) beta or p53 attenuates ERalpha-mediated transcriptional activation on the BRCA2 promoter. J Biol Chem. 2008;283:29671–29680. [PMC free article] [PubMed]
50. Koga H, Matsui S, Hirota T, Takebayashi S, Okumura K, Saya H. A human homolog of Drosophila lethal(3)malignant brain tumor (l(3)mbt) protein associates with condensed mitotic chromosomes. Oncogene. 1999;18:3799–3809. [PubMed]
PubReader format: click here to try