Human basal-like breast cancer is represented by one of the two mammary tumor subtypes in dogs

Background About 20% of breast cancers in humans are basal-like, a subtype that is often triple negative and difficult to treat. An effective translational model for basal-like breast cancer (BLBC) is currently lacking and urgently needed. To determine if spontaneous mammary tumors in pet dogs could meet this need, we subtyped canine mammary tumors and evaluated the dog-human molecular homology at the subtype level. Methods We subtyped 236 canine mammary tumors from 3 studies by applying various subtyping strategies on their RNA-seq data. We then performed PAM50 classification with canine tumors alone, as well as with canine tumors combined with human breast tumors. We investigated differential gene expression, signature gene set enrichment, expression association, mutational landscape, and other features for dog-human subtype comparison. Results Our independent genome-wide subtyping consistently identified two molecularly distinct subtypes among the canine tumors. One subtype is mostly basal-like and clusters with human BLBC in cross-species PAM50 classification, while the other subtype does not cluster with any human breast cancer subtype. Furthermore, the canine basal-like subtype recaptures key molecular features (e.g., cell cycle gene upregulation, TP53 mutation) and gene expression patterns that characterize human BLBC. It is enriched histological subtypes that match human breast cancer, unlike the other canine subtype. However, about 33% of canine basal-like tumors are estrogen receptor negative (ER−) and progesterone receptor positive (PR+), which is rare in human breast cancer. Further analysis reveals that these ER-PR+ canine tumors harbor additional basal-like features, including upregulation of genes of interferon-γ response and of the Wnt-pluripotency pathway. Interestingly, we observed an association of PGR expression with gene silencing in all canine tumors, and with the expression of T cell exhaustion markers (e.g., PDCD1) in ER-PR+ canine tumors. Conclusions We identify a canine mammary tumor subtype that molecularly resembles human BLBC overall, and thus could serve as a vital spontaneous animal model of this devastating breast cancer subtype. Our study also sheds light on the dog-human difference in the mammary tumor histology and the hormonal cycle.


1
The effective use of the canine model is, however, complicated by issues including the 2 dog-human hormonal cycle difference, e.g., the luteal phase lasts ~14 days for humans 3 but ~2 months for dogs. Another difference is histology. About 50% of canine 4 mammary tumors are complex or mixed, with multiple cell lineages (e.g., epithelial and 5 myoepithelial cells) proliferating [14,19,20]. These histologies (e.g., 6 adenomyoepithelioma) are, however, are very rare (<1%) in human breast cancers [14, 7 21, 22]. It remains unknown how these differences shape the molecular homology and 8 difference between canine and human mammary tumors. 9

19
For dog-human comparison, our previous study indicates that one histological subtype, 20 simple carcinoma, molecularly resembles hBLBC in cross-species PAM50 classification 21 with human and canine tumors [14]. However, this study is limited by its small sample 22 size. Another group has performed PAM50 classification on the RNA-seq data recently 23 stats, ConsensusClusterPlus, and pvclust respectively [44][45][46] were used to subtype the 1 143 samples using the top 10% most variable genes. The same process was repeated 2 to subtype the samples in the validation set. was then cut at the minimum number of clusters that maximally separate hBLBC tumors 18 from hLumA tumors using the R package dendextend [47]. The number of tumors of 19 each canine or human subtype in each cluster was counted. If a cluster contains the 20 majority of tumors of a human subtype as well as the majority of a canine subtype, the 21 human and canine subtypes were considered matched. This process was repeated 100 22 times to ensure each human tumor was sampled at least once. 23 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 3, 2023. ; https://doi.org/10.1101/2023.03.02.530622 doi: bioRxiv preprint 1 Multidimensional scaling on the Euclidean distance matrix was performed for each of 2 the 100 random samplings, from which the mahalanobis distance between the centers 3 of any two subtypes was calculated using the R package 'GenAlgo' v2.2.0 [48]. 4 5 The above process was repeated for microarray studies [29][30][31], except that only 40 6 canine homologues of the 50 PAM50 genes were identified, and 30 tumors per subtype 7 were randomly sampled from the human dataset [29].  repeated with all 13,416 genes that are expressed in at least one tumor, as well as with 10 the top 5000, 2000, 1000, and 500 most variable genes among the 13,416 expressed 11 genes. The analysis consistently clustered 143 of 179 tumors into two subtypes (Fig.  12 1). To validate this, we also subtyped these tumors using other popular strategies, 13 including K-means, consensus clustering, and hierarchical clustering via multiscale 14 bootstrap resampling [44,45]. These strategies consistently identified the same two 15 subtypes as the NMF approach. 16

17
We then performed the same subtyping analyses on the other large study of canine 18 mammary tumors (n=57) (the validation set), whose RNA-seq data are single-end [27], 19 differing from the discovery set. These tumors also clustered into two subtypes, 20 consistent with the discovery set. 21

22
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The two subtypes differ in tumor histology and invasiveness. One subtype is 1 significantly ( = 0.007) enriched in simple adenomas/carcinomas (where only one cell 2 lineage proliferates prominently), while the other subtype is enriched in complex or 3 mixed adenomas/carcinomas ( < 0.001) (where more than one cell lineages are 4 proliferating prominently) (Fig. 1) [14,19,25]. Moreover, the simple 5 adenomas/carcinomas-enriched subtype contains significantly ( < 10 !" ) more cases 6 with lymph node invasion ( Fig. 1). Interestingly, the other subtype contains significantly 7 ( = 0.0014) more Maltese dogs (Fig. 1). 8 9 The two subtypes display distinct molecular features. Among the top most mutated 10 genes in this cohort [33], one subtype is enriched in PIK3CA hotspot mutation 11 H1047R/L ( = 0.0034) and KAT6B mutation ( = 0.036), while the other subtype 12 (simple adenomas/carcinomas-enriched and with more lymph node invasion) is 13 enriched in TP53 mutation ( = 0.043) (Fig. 1). However, we did not observe any 14 significant difference in KRAS mutation and TMB between the two subtypes ( Fig. 1). 15 For pathways, the two subtypes differ in mutations and copy number alterations of 16 genes in PI3K signaling ( = 0.0039) (Fig. 1), the most altered pathway in canine 17 mammary tumors [33]. 18 19 Other molecular differences between the two subtypes include PAM50 classification, 20 ER and PR expression status, and others, and will be described in more details below. 21

22
Canine and human basal-like tumors cluster together in PAM50 classification 23 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made We performed PAM50 classification [55, 56] on the 179 canine tumors from the 1 discovery set. About 74% of the tumors were classified as either the basal-like (n=62, 2 35%) or luminal A (n=71, 39%) subtype ( Fig. 2A; Table 1). Importantly, 90% of the 3 basal-like tumors belong to one of the two subtypes shown in Fig. 1, while 90% of the 4 luminal A tumors belong to the other subtype. For this reason and reasons described 5 below, the two subtypes shown in Fig. 1 are named canine basal-like mammary tumor 6 (cBLMT) and canine non-basal-like mammary tumor (cNBLMT) respectively. 7 8 To quantitatively assess the canine-human homology at the subtype level, we 9 performed cross-species PAM50 classification as described [14]. Briefly, we randomly shown by the example provided in Fig. 2D. We then calculated the mahalanobis 23 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 3, 2023. ; https://doi.org/10.1101/2023.03.02.530622 doi: bioRxiv preprint distance [48] between the centers of canine and/or human subtypes for each of the 100 1 random sampling analyses. The distributions clearly indicate that the mahalanobis 2 distances between hBLBC and cBLMT are significantly shorter than those between 3 hBLBC and human luminal A (hLumA) or cNBLMT, as well as those between hLumA 4 and cBLMT or cNBLMT (Fig. 2E). These results support that cBLMT molecularly 5 resembles hBLBC, but cNBLMT molecularly differs from hLumA. 6 7 To validate this finding, we attempted to conduct the same analyses on the 57 tumors 8 from the validation set [27]. PAM50 analysis of these canine tumors alone indeed 9 classified a majority of the tumors as basal-like (n=21) or luminal A (n=17), consistent 10 with the discovery set ( Fig. 2A). Moreover, many of the basal-like canine tumors have a 11 PAM50 gene expression patten that closely matches hBLBC. However, likely due to 12 having single-end RNA-seq data, these canine tumors could not co-cluster with human 13 breast tumors (whose RNA-seq data are paired-end) in cross-species PAM50 14 classification even after batch correction. The cBLMT subtype contains ER-PR-, ER-PR+, and ER+PR+ tumors 2 hBLBCs consist of approximately 70% ER-PR-(expressing neither ER nor PR) and 3 30% ER+PR-tumors, with ER-PR+ and ER+PR+ tumors extremely rare, based on the 4 ESR1 and PGR transcript abundance levels (Figs. 4A-B). However, among cBLMTs, 5 ER-PR+ and ER+PR+ tumors are significantly more frequent, accounting for 33% and 6 29% respectively, while ER-PR-and ER+PR-tumors only make up 29% and 9% 7 respectively (Figs. 4C-D; Table 1). Notably, about 78% of ER-PR+ cBLMTs were 8 classified as basal-like in PAM50 analysis, similar to the percentage for ER-PR-cBLMT 9 (85%) ( Table 1). PAM50 classification also categorized 30% of ER+PR+ cBLMTs as 10 basal-like, a proportion significantly higher than in cNBLMTs (7%) (96% of cNBLMTs 11 are ER+PR+) ( Table 1) We compared each of ER-PR-, ER-PR+, and ER+PR+ cBLMT subgroups to cNBLMT 20 (the ER+PR-subgroup contains only 6 samples and was thus excluded from the 21 analysis; see Table 1). We found that the DE genes are enriched in functions that 22 characterize hBLBC [3, 7, 58] (Figs. 5A-C; Table 1). Briefly, cell cycle and Wnt 23 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  ER-PR+ cBLMTs than in ER+PR+ cBLMTs (Fig. 6A). This indicates that T cell 22 exhaustion may occur in ER-PR+ cBLMTs [59]. To investigate this possibility, we 23 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 3, 2023. ; https://doi.org/10.1101/2023.03.02.530622 doi: bioRxiv preprint examined the expression of 8 known T cell exhaustion markers [61], but did not find a 1 significant difference among the three subgroups (Fig. 6A). 2 3 As 6 out of 8 T cell exhaustion markers, including PDCD1 (encoding PD-1), express 4 higher in ER-PR+ and/or ER+PR+ cBLMTs (Fig. 6A), we examined the association of 5 each marker with PGR or ESR1 in expression. We found that four markers, including 6 PDCD1, HAVCR2, CTLA4, and TIGIT, have a significant positive association with PGR 7 in ER-PR+ cBLMTs (Fig. 6B). No such associations were found for PGR in other 8 cBLMT subgroups or the cNBLMT subtype, or for ESR1 in any cBLMT subgroup or 9 cNBLMT (Figs. 6C). 10 11 PGR expression is associated with gene silencing 12 To further understand PR in canine tumors, we investigated genes that correlate with 13 PGR or ESR1 in transcript abundance in both human and canine tumors. The same as 14 in hLumA tumors, over 1000 genes were found to be positively correlated with ESR1 in 15 cNBLMTs (Fig. 7A). Importantly, both sets of genes are enriched in the same functions, 16 including cell cycle (Fig. 7A). For PGR, about 2.7 times as many (776 versus 289) 17 positively correlated genes were identified for cNBLMT than hLumA (Fig. 7A). (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 3, 2023. ; https://doi.org/10.1101/2023.03.02.530622 doi: bioRxiv preprint independent (not using any known biomarkers) subtyping of this cancer common in 1 bitches that are intact or spayed late. The study identifies two subtypes, and further 2 shows that one subtype molecularly resembles hBLBC, while the other subtype appears 3 not to match any human breast cancer subtypes. This conclusion is consistent with our 4 previous study [14], but differs from a recent publication reporting that canine and 5 human luminal A tumors have more molecular homology than canine and human basal- (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 3, 2023. ; https://doi.org/10.1101/2023.03.02.530622 doi: bioRxiv preprint and ER+PR+ tumors, which are rare in hBLBC. Moreover, while ER+PR-tumors are 1 common in hBLBC and other breast cancer subtypes, ER-PR+ tumors are nearly 2 nonexistent in any human breast cancer subtype. This is because that in human breast 3 cells, PR is induced by ER [62, 63] and thus, without ER, PR will not be expressed. 4 One notable difference between the estrous cycle in bitches and the menstrual cycle in 5 women is the luteal phase, which lasts 14 days for humans but 2 months for dogs. As 6 the canine mammary glands are constantly exposed to a high level of progesterone 7 during the luteal phase [64, 65], it is possible that the PGR gene is still actively 8 transcribed after the ESR1 gene is silenced in dogs, resulting in the ER-PR+ tumors. 9 More studies are needed to investigate this possibility. (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 3, 2023. ; https://doi.org/10.1101/2023.03.02.530622 doi: bioRxiv preprint One notable finding from our study is that in ER-PR+ cBLMTs, interferon-g response 1 genes are upregulated, but the interferon-g gene (IFNG) itself is downregulated. (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made We identify two molecular subtypes in spontaneous canine mammary tumors. One 1 subtype, cBLMT, molecularly and histologically resembles hBLBC, a breast cancer 2 subtype that lacks an effective treatment and has the worst clinical outcomes. The 3 other subtype, cNBLMT, appears not to match any human breast cancer subtype 4 molecularly and histologically. While cBLMTs also consist of ER-PR+ and ER+PR+ 5 tumors (which may be related to the long luteal phase of the estrous cycle in dogs), a 6 difference from hBLBC, we note that these tumors capture the key molecular features of 7 hBLBCs, the same as ER-PR-cBLMTs. Thus, cBLMTs could serve as a much-needed 8 spontaneous animal model for hBLBC, filling a critical gap in breast cancer research. 9 Moreover, while much more studies are needed, ER-PR+ cBLMTs may provide a 10 valuable system to study T cell exhaustion, as well as estrogen/ER-independent roles of 11 progesterone and PR in gene silencing.

Availability of data and materials 19
The RNA-seq data generated from this study are submitted to the SRA database at 20 https://www.ncbi.nlm.nih.gov/bioproject/PRJNA912710. All other RNA-seq and 21 microarray data sets analyzed in this study are obtained from the SRA and GEO 22 databases at: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA489087/, 23 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Competing interests 7
The authors declare that they have no competing interests. (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  identified from the NMF analysis (see Methods), are ordered by hierarchical clustering. 7 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Lymph invasion is defined as having tumor cells in peritumoral lymphatic vessels and/or 1 regional lymph nodes. Tumor mutation burden (TMB), defined as the number of 2 somatic base substitutions and small indels per megabase (Mb) of callable coding 3 sequence, is obtained from a previous publication [33]. For ER/PR status, a tumor is 4 considered "negative" or "positive" if its ESR1/PGR has a FPKM value of ≤ 1 or > 1, 5 respectively. For HER2 enrichment, a sample is considered "not enriched" or "enriched" 6 if its ERBB2 has a FPKM value of ≤ 35 or >35, respectively. "Other PIK3CA" 7 represents all non-H1047 coding mutations in PIK3CA. "No data" represents samples 8 with no mutation data. Annotation row titles marked with a "*" indicate a significant ( ≤ 9 0.05) difference in enrichment between the subtypes. 10 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  The p-values were obtained from Wilcoxon tests 17 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  value of < 0.05 and an expression-fold change of > 2. The heatmaps are presented as 7 described in Fig. 3A, along with TMB and the most mutated genes indicated as 8 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 3, 2023. ; https://doi.org/10.1101/2023.03.02.530622 doi: bioRxiv preprint described in Fig. 1. A tumor was classified ER+ or PR+ if its ESR1 or PGR gene has a 1 FPKM value of > 1, respectively; otherwise, the tumor was classified ER-or PR-. 2 D-F. Heatmaps of DE genes identified between ER+/-PR+/-cBLMT subgroups, 3 presented as described in A-C. 4 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted March 3, 2023. ; https://doi.org/10.1101/2023.03.02.530622 doi: bioRxiv preprint 1 Fig. 7. PGR is associated with gene silencing in canine tumors but not in human tumors. 1

A.
Venn diagrams indicating the number of genes that are positively associated with ESR1 and/or PGR in mRNA 2 expression in each human or canine subtype specified. These genes were identified as those having correlation 3 coefficient ≥ 0.3 and BH-adjusted ≤ 0.05 in both Pearson and Spearman correlation analyses. Indicated are also the 4 top enriched functions of genes that are correlated with only ESR1 or PGR. 5

B.
Venn diagrams for genes negatively correlated with ESR1 and/or PGR, identified with ≤ −0.3 and other cutoffs 6 and presented as described in A. 7