Transcriptomic Analysis of the Anthocyanin Biosynthetic Pathway Reveals the Molecular Mechanism Associated with Purple Color Formation in Dendrobium Nestor

Dendrobium nestor is a famous orchid species in the Orchidaceae family. There is a diversity of flower colorations in the Dendrobium species, but knowledge of the genes involved and molecular mechanism underlying the flower color formation in D. nestor is less studied. Therefore, we performed transcriptome profiling using Illumina sequencing to facilitate thorough studies of the purple color formation in petal samples collected at three developmental stages, namely—flower bud stage (F), half bloom stage (H), and full bloom stage (B) in D. nestor. In addition, we identified key genes and their biosynthetic pathways as well as the transcription factors (TFs) associated with purple flower color formation. We found that the phenylpropanoid–flavonoid–anthocyanin biosynthesis genes such as phenylalanine ammonia lyase, chalcone synthase, anthocyanidin synthase, and UDP-flavonoid glucosyl transferase, were largely up-regulated in the H and B samples as compared to the F samples. This upregulation might partly account for the accumulation of anthocyanins, which confer the purple coloration in these samples. We further identified several differentially expressed genes related to phytohormones such as auxin, ethylene, cytokinins, salicylic acid, brassinosteroid, and abscisic acid, as well as TFs such as MYB and bHLH, which might play important roles in color formation in D. nestor flower. Sturdy upregulation of anthocyanin biosynthetic structural genes might be a potential regulatory mechanism in purple color formation in D. nestor flowers. Several TFs were predicted to regulate the anthocyanin genes through a K-mean clustering analysis. Our study provides valuable resource for future studies to expand our understanding of flower color development mechanisms in D. nestor.


Introduction
The Orchidaceae family comprises orchids from 736 genera and represents the second largest family of flowering plants [1,2]. Dendrobium species represent important orchids that are either epiphytic or lithophytic and have frequent buds [1,3]. The species is distributed and cultivated in tropical Asia, Australasia, and Australia [4]. More than 74 species and two widely cultivated genotypes of this genus were reported in China, some of which are famed for their flowers during Father's Day in Asia [4]. Dendrobium nestor is derived from a cross between Dendrobium parishii x D. anosmum, and its flowers present color diversity, spanning from white, red, to purple [5]. The color variations in the D. nestor flower contribute to the release of fragrance as attractants for pollinators [6]. Color variations in orchids account for their aesthetic and commercial values in the ornamental industry [2]. The purple color variant in D. nestor substantially accounts for the visual appeal and market value of the flowers. Dendrobium orchids are also highly employed for their medicinal values. Several bioactive compounds and metabolites housed in these species are useful for human health, including nourishing the kidney, enhancing the body's immunity, and resisting cancer [4].
3 three different stages of flower development, i.e., the flower bud stage (F, 5 mm long); half blooming stage (H, 10 mm long) and blooming stage (B, ˃11 mm long), and mixed to constitute a biological replicate (Figure 1). Sampling was conducted in three biological replicates at each flower developmental stage. Samples were immediately lyophilized in liquid nitrogen and stored at -80 °C and used in future experiments.

RNA Extraction, Sequencing Libraries, and RNA-Seq
Total RNA was extracted from each sample (3 samples × 3 biological repeats) of D. nestor using Trizol reagents (Invitrogen, Carlsbad, CA, USA). The transcriptome library was constructed using the Illumina TruSeq Stranded total RNA with Ribo-Zero Globin (Illumina, San Diego, CA, USA), according to the manufacturer's instructions. All cDNA libraries were subjected to high throughput sequencing using the Illumina HiSeq™ 4000 platform at Biomarker Technologies (Beijing, China). The raw sequencing data were cleaned and assembled using the Trinity software [32]. Gene expressions were normalized to the fragments per kilobase of transcript per million (FPKM) by HTSeq [33]. The expressions of genes were heatmapped and visualized using the Toolkit for Biologists, integrating various biological data handling tools [34].

Sequenced Data Filtering, Assembly, and Functional Annotation
The raw sequenced data with low-quality linker contaminants and reads with too high and unknown base contents were filtered using the FastQC software version 0.11.9 [35] to ensure reliability of results. The clean reads were used as candidates for the transcriptome assembly. The polymerase chain reaction (PCR) duplicates were removed with the Trinity program (version 2.8.4) [32]. The assembled transcripts were then clustered and a cluster deduplication was performed to obtain the final unigenes. The unigenes were named according to the cluster number, followed by its serial order. The assembled unigenes were annotated using the Clusters of Orthologous Groups of proteins (COG) [36], Gene Ontology (GO) (https://www.geneontology.org [37]), Kyoto Encyclopedia of Genes and Genomes (KEGG) [38], Pfam [39], translated European Molecular Biology Laboratory (TrEMBL; [40]), the NCBI non-redundant protein sequence database (Nr) (https://ftp.ncbi.nih.gov/blast/db/FASTA/nr.gz; [41]), and Eukaryotic Clusters of the Orthologous Groups (KOG, https://www.ncbi.nlm.nih.gov/COG/new/shokog.cgi). Principal Components Analysis (PCA) and Pearson correlation were performed on the basis of gene expression profiles obtained from FPKM, among the nine samples, in R [42].

Detection of Differentially Expressed Genes and Functional Enrichment Analysis
The uniquely mapped reads were retained for the unigene expression analysis. Differential expression analysis between treatments was performed based on FPKM, using the DESeq R package (1.10.1) with the Benjamini and Hochberg's correction [43]. The unigenes with a false discovery rate (FDR) of ≤ 0.001 and a fold change of ≥ 2.0 or ≤ 0.5 in any

Sequenced Data Filtering, Assembly, and Functional Annotation
The raw sequenced data with low-quality linker contaminants and reads with too high and unknown base contents were filtered using the FastQC software version 0.11.9 [35] to ensure reliability of results. The clean reads were used as candidates for the transcriptome assembly. The polymerase chain reaction (PCR) duplicates were removed with the Trinity program (version 2.8.4) [32]. The assembled transcripts were then clustered and a cluster deduplication was performed to obtain the final unigenes. The unigenes were named according to the cluster number, followed by its serial order. The assembled unigenes were annotated using the Clusters of Orthologous Groups of proteins (COG) [36], Gene Ontology (GO) (https://www.geneontology.org [37]), Kyoto Encyclopedia of Genes and Genomes (KEGG) [38], Pfam [39], translated European Molecular Biology Laboratory (TrEMBL; [40]), the NCBI non-redundant protein sequence database (Nr) (https://ftp.ncbi. nih.gov/blast/db/FASTA/nr.gz; [41]), and Eukaryotic Clusters of the Orthologous Groups (KOG, https://www.ncbi.nlm.nih.gov/COG/new/shokog.cgi). Principal Components Analysis (PCA) and Pearson correlation were performed on the basis of gene expression profiles obtained from FPKM, among the nine samples, in R [42].

Detection of Differentially Expressed Genes and Functional Enrichment Analysis
The uniquely mapped reads were retained for the unigene expression analysis. Differential expression analysis between treatments was performed based on FPKM, using the DESeq R package (1.10.1) with the Benjamini and Hochberg's correction [43]. The unigenes with a false discovery rate (FDR) of ≤ 0.001 and a fold change of ≥ 2.0 or ≤ 0.5 in any pairwise comparison (F_vs_H, F_vs_B and H_vs_B) were considered as differentially expressed unigenes, which were designated as differentially expressed genes (DEGs). The DEGs were clustered by the Short Time-series Expression Miner (STEM) clustering analysis (STEM software), by setting 30 as the maximum number of model profiles, and 1 as the maximum unit change in model profiles between time-points [44]. The Kyoto Encyclopedia of Genes and Genomes (KEGG) database (https://www.genome.jp/kegg) was used for the functional annotation of specific purple color-conferring DEGs with the BLAST software [45] to identify anthocyanin biosynthetic pathways among the samples, in pairwise comparisons with a minimum threshold of statistical significance at p < 0.05.

Identification of Transcription Factors and Simple Sequence Repeats
To predict transcription factors (TFs) involved in purple color formation in D. nestor, we utilized the getorf database (mini-size 150) to find the open reading frame (ORF) [46] and then used the HMM search database (version 3.0) to align the ORFs to the TF protein domain [47]. The aligned sequences were described according to the TF families available on the PlantTF database version 3.0 [48]. The online Perl script program, MIcroSAtellite identification tool (MISA; http://pgrc.ipk-gatersleben.de/misa/) was employed to identify SSRs in D. nestor using the default settings. Primer3 software (ver 2.2.2) was used to design primers for the detected SSRs [49].

Real-Time Quantitative Polymerase Chain Reaction Analysis
Primers were designed using the Primer 4.0 tool (Additional File 8: Table S6). The qRT-PCR was performed using the LightCycler 480 with Unique AptamerTM qPCR SYBR®Green Master Mix (Novogene Technology Co. Ltd., Beijing, China), according to the manufacturer's instructions. The gene Actin (JX294908) was used as the housekeeping gene. All reactions were performed in biological triplicates and technical triplicates. Relative expression values were computed using the 2 −∆∆ Ct method [50]. Student t-test was used to separate the means, thus, the difference was considered to be statistically significant at p < 0.05.

Sequencing Summary, Assembly, and Unigene Annotation
Flower samples collected in triplicates at three stages of development (F = flower bud stage; H = half blooming stage; and B = full blooming stage) from D. nestor plants were used for RNA-seq ( Figure 1). The purple coloration increased gradually from F to B ( Figure 1). The RNA-seq was conducted on nine cDNA libraries (3 samples × 3 repeats) and generated, on average, 50,047,108 (F), 48,759,280 (H), and 51,171,054 (B) raw reads, (Table 1). After filtering the contaminants, the average clean reads ranged from 46,903,317 (95.05% of the raw reads) to 49,217,131 (96.28% of the raw reads) corresponding to 7.03-7.38 Gb clean reads. Each sample had a Q30 and GC content above 93% and 46%, respectively ( Table 1).
The clean libraries were assembled with the program Trinity [32]. The assembler generated a total of 220,258 transcripts, out of which 59,986 (27.23%) had a length of 200-300 bp (Additional File 1: Figure S1). Subsequently, a cluster deduplication was performed to obtain the final unigene for analyses. As a result, a total of 161,228 unigenes were obtained, of which 26,998 (16.75%) had a mean length ≥2000 bp (Additional File 1: Figure S1). The overall distribution of the gene expression was based on the fragments per kilobase of transcript per million fragments mapped (FPKM) procedure ( Figure 2a). The use of relative unigene expression obtained from FPKM for principal component analysis (PCA) showed 47.49% variability among the three samples (F, H, and B) ( Figure 2b). However, biological replicates had a very strong correlation co-efficient (r ≥ 0.79), as evidenced in Figure 2c. Overall, we detected 46,770, 45,739, and 38,601 expressed unigenes in F, H, and B samples, respectively.
From above, 112,581 unigenes (representing 69.83%) were successfully annotated to at least one of the six public gene functional annotation databases, including KEGG, Nr, SWISS Prot, KOG, GO, and Pfam via BLAST. Figure 2d gives annotation across the six databases. The high-quality reads, variability among the sample groups, and similarity within biological replicates confirmed the reliability of our data for further analyses.
Life 2020, 10, x FOR PEER REVIEW 5 of 19

5
From above, 112,581 unigenes (representing 69.83%) were successfully annotated to at least one of the six public gene functional annotation databases, including KEGG, Nr, SWISS Prot, KOG, GO, and Pfam via BLAST. Figure 2d gives annotation across the six databases. The high-quality reads, variability among the sample groups, and similarity within biological replicates confirmed the reliability of our data for further analyses.    Transcriptome assembly of D. nestor provides the opportunity to detect simple sequence repeats (SSR), which could be useful for a variety of identification and molecular assisted breeding. In total, we identified 41,286 SSR markers, dominated by mono-nucleotide (68.47%), di-nucleotide (17.47%), and tri-nucleotide (12.16%) SSR types (Additional file 2: Table S1, Figure 3).
Life 2020, 10, x FOR PEER REVIEW 6 of 19 6 Transcriptome assembly of D. nestor provides the opportunity to detect simple sequence repeats (SSR), which could be useful for a variety of identification and molecular assisted breeding. In total, we identified 41,286 SSR markers, dominated by mono-nucleotide (68.47%), di-nucleotide (17.47%), and tri-nucleotide (12.16%) SSR types (Additional file 2: Table S1, Figure 3).

Differentially Expressed Genes and Functional Enrichment Analyses
Differentially expressed genes (DEGs) were selected based on a threshold of log2 fold change (log2FC) ≥ 1 and false discovery rate (FDR) ≤ 0.05. We obtained a total of 8,627, 8,993, and 1,418 DEGs in F_vs_H, F_vs_B, and H_vs_B, respectively, from 46,770, 45,739, and 38,601 unigenes (Figure 4a; Additional file 3: Table S2a-c). Based on the DEGs, two main clusters of flower samples were obtained ( Figure 4b). Cluster one mainly comprised F samples, whereas cluster two consisted of H and B samples. This implied that H and B samples had some similarity in transcriptional activity. This similarity might be due to relatively similar anthocyanin synthesis and accumulation at the half blooming stage (H) and full blooming stage (B). The ten DEGs selected for qRT-PCR to validate our RNA-seq data had similar relative expression pattern with the RNA-seq data (Additional file 4: Figure S2).
The DEGs obtained in each pairwise group were employed for the KEGG pathway enrichment analysis. From the KEGG pathway enrichment analysis based on the p-value of significance (Additional File 5: Table S3), the most significant pathways were phenylpropanoid biosynthesis, biosynthesis of secondary metabolites, plant hormone signal transduction, metabolic pathways, carotenoid biosynthesis, and the circadian rhythm of the plant. Given that phenylpropanoid (particularly anthocyanin biosynthesis pathway) and plant hormone signal transduction are well known to modulate color formation in plants, we focused on these as the candidate pathways to elucidate their involvement in petal/flower color formation in D. nestor.

Differentially Expressed Genes and Functional Enrichment Analyses
Differentially expressed genes (DEGs) were selected based on a threshold of log2 fold change (log 2 FC) ≥ 1 and false discovery rate (FDR) ≤ 0.05. We obtained a total of 8627, 8993, and 1418 DEGs in F_vs_H, F_vs_B, and H_vs_B, respectively, from 46,770, 45,739, and 38,601 unigenes (Figure 4a; Additional file 3: Table S2a-c). Based on the DEGs, two main clusters of flower samples were obtained ( Figure 4b). Cluster one mainly comprised F samples, whereas cluster two consisted of H and B samples. This implied that H and B samples had some similarity in transcriptional activity. This similarity might be due to relatively similar anthocyanin synthesis and accumulation at the half blooming stage (H) and full blooming stage (B). The ten DEGs selected for qRT-PCR to validate our RNA-seq data had similar relative expression pattern with the RNA-seq data (Additional file 4: Figure S2).
The DEGs obtained in each pairwise group were employed for the KEGG pathway enrichment analysis. From the KEGG pathway enrichment analysis based on the p-value of significance (Additional File 5: Table S3), the most significant pathways were phenylpropanoid biosynthesis, biosynthesis of secondary metabolites, plant hormone signal transduction, metabolic pathways, carotenoid biosynthesis, and the circadian rhythm of the plant. Given that phenylpropanoid (particularly anthocyanin biosynthesis pathway) and plant hormone signal transduction are well known to modulate color formation in plants, we focused on these as the candidate pathways to elucidate their involvement in petal/flower color formation in D. nestor.

DEGs Involved in Anthocyanins Biosynthesis Pathway
The key determinant of flower color is the composition and concentrations of anthocyanins, carotenoids, and betalains as key pigments [25]. Anthocyanins are responsible for orange, pink, red, purple, blue, and blue-black flower colors [25,51]. Anthocyanins are flavonoids, which are a major offshoot of the highly branched phenylpropanoid pathway with several enzymes involved [52]. The main amino acid precursor for phenylpropanoids is phenylalanine (phenylalanine ammonia lyase, PAL). In our present study, four key genes (Cluster-30752.53926, Cluster-30752.76093, Cluster-30752.77980, and Cluster-30752.49408) linked to PAL were highly up-regulated in the H and B samples, as compared to the F sample, with the exception of Cluster-30752.49408, which had no expression in the P sample ( Figure 5). Similarly, two out of six genes (Cluster-30752.43378 and Cluster-30752.49231) linked to cinnamate 4-hyroxylase (C4H) in converting cinnamic acid to coumaric acid were highly expressed in the H and B samples relative to the F sample. In addition, enzymes: 4-coumarateCoA ligase (4CL), chalcone synthase (CHS), chalcone isomerase (CHI), flavone 3-hydroxylase (F3H), flavonoid 3′-hydroxylase (F3′H) and flavonol synthase (FLS) were associated with 5, 6, 10, 12, 13, and 7 DEGs, respectively. Out of these, most were highly expressed in the H and B samples, as compared to the F sample ( Figure  5). Surprisingly, three genes (Cluster-30752.31801, Cluster-30752.44110, and Cluster-30752.31798) linked to dihydroflavonol reductase (DFR) were more highly expressed in the F sample than either in the H or B sample; this enzyme converts Dihydroquercetin (DHQ) to Leucoanthocyanidins ( Figure 5). Anthocyanidin synthase (ANS) converts the Leucoanthocyanidins to anthocyanidin and we identified five DEGs including, Cluster-30752.47462, Cluster-30752.77085, Cluster-30752.74730, Cluster-30752.74682, and Cluster-30752.79274. These genes were highly expressed in the H and B samples relative to the H sample ( Figure 5). Thirteen DEGs encoding the UDP-flavonoid glucosyl transferase (UFGT) were more highly expressed in the H or B samples than in the F sample ( Figure  5). These results showed that globally, most genes involved in the early and late steps of the anthocyanin biosynthetic pathway were strongly upregulated over the flower development stages in D. nestor and might be critical for the purple color formation.

DEGs Involved in Anthocyanins Biosynthesis Pathway
The key determinant of flower color is the composition and concentrations of anthocyanins, carotenoids, and betalains as key pigments [25]. Anthocyanins are responsible for orange, pink, red, purple, blue, and blue-black flower colors [25,51]. Anthocyanins are flavonoids, which are a major offshoot of the highly branched phenylpropanoid pathway with several enzymes involved [52]. The main amino acid precursor for phenylpropanoids is phenylalanine (phenylalanine ammonia lyase, PAL). In our present study, four key genes (Cluster-30752.53926, Cluster-30752.76093, Cluster-30752.77980, and Cluster-30752.49408) linked to PAL were highly up-regulated in the H and B samples, as compared to the F sample, with the exception of Cluster-30752.49408, which had no expression in the P sample ( Figure 5). Similarly, two out of six genes (Cluster-30752.43378 and Cluster-30752.49231) linked to cinnamate 4-hyroxylase (C4H) in converting cinnamic acid to coumaric acid were highly expressed in the H and B samples relative to the F sample. In addition, enzymes: 4-coumarateCoA ligase (4CL), chalcone synthase (CHS), chalcone isomerase (CHI), flavone 3-hydroxylase (F3H), flavonoid 3 -hydroxylase (F3 H) and flavonol synthase (FLS) were associated with 5, 6, 10, 12, 13, and 7 DEGs, respectively. Out of these, most were highly expressed in the H and B samples, as compared to the F sample ( Figure 5). Surprisingly, three genes (Cluster-30752.31801, Cluster-30752.44110, and Cluster-30752.31798) linked to dihydroflavonol reductase (DFR) were more highly expressed in the F sample than either in the H or B sample; this enzyme converts Dihydroquercetin (DHQ) to Leucoanthocyanidins ( Figure 5). Anthocyanidin synthase (ANS) converts the Leucoanthocyanidins to anthocyanidin and we identified five DEGs including, Cluster-30752.47462, Cluster-30752.77085, Cluster-30752.74730, Cluster-30752.74682, and Cluster-30752.79274. These genes were highly expressed in the H and B samples relative to the H sample ( Figure 5). Thirteen DEGs encoding the UDP-flavonoid glucosyl transferase (UFGT) were more highly expressed in the H or B samples than in the F sample ( Figure 5). These results showed that globally, most genes involved in the early and late steps of the anthocyanin biosynthetic pathway were strongly upregulated over the flower development stages in D. nestor and might be critical for the purple color formation.

Plant Hormone Signal Transduction
Beside environmental stimuli such as light and temperature, phytohormones (auxin, ethylene, cytokinins, salicylic acid, brassinosteroid, and abscisic acid (ABA)) significantly influence anthocyanin biosynthesis [54][55][56][57][58][59]. Most DEGs involved in auxin and ethylene were largely upregulated in the H and B samples relative to the F sample (Figure 6a,b). On the contrary, DEGs related to brassinosteroid, ABA, cytokinins, and salicylic acid were mostly repressed in the H and B samples, as compared to the F sample (Figure 6c-f). All phytohormones related to DEGs were clustered under the three samples (F, H, and B) into two sub-clusters. With the exception of cytokinins, all phytohormones clustered with the H and B samples into one sub-cluster, while the F sample was in another sub-cluster. This supports the trend of increasing purpleness, as observed in Figure 1. These trends showed various modulating roles of different phytohormones in color formation in D. nestor. In addition, these signify the important roles of plant hormone signal transduction in modulating petal color development in D. nestor.
10, x FOR PEER REVIEW 9 of 19 9

Plant Hormone Signal Transduction
Beside environmental stimuli such as light and temperature, phytohormones (auxin, ethylene, cytokinins, salicylic acid, brassinosteroid, and abscisic acid (ABA)) significantly influence anthocyanin biosynthesis [54][55][56][57][58][59]. Most DEGs involved in auxin and ethylene were largely upregulated in the H and B samples relative to the F sample (Figure 6a,b). On the contrary, DEGs related to brassinosteroid, ABA, cytokinins, and salicylic acid were mostly repressed in the H and B samples, as compared to the F sample ( Figure 6 c-f). All phytohormones related to DEGs were clustered under the three samples (F, H, and B) into two sub-clusters. With the exception of cytokinins, all phytohormones clustered with the H and B samples into one sub-cluster, while the F sample was in another sub-cluster. This supports the trend of increasing purpleness, as observed in Figure 1. These trends showed various modulating roles of different phytohormones in color formation in D. nestor. In addition, these signify the important roles of plant hormone signal transduction in modulating petal color development in D. nestor.

Identification of Transcription Factors Regulating Color Formation in the Petals/Flowers of D. nestor
Transcription factors (TFs) are the primary regulators of gene expression [60], of which many were predicted to modulate anthocyanin accumulation and biosynthesis in flowering plants. The sequences of DEGs obtained in each pairwise group were submitted to the plant transcription factor database to identify DEGs encoding for TFs. A total of 601 (F_vs_H), 583 (F_vs_B), and 116 (H_vs_B) DEGs encoding TF were detected, with varied degree of regulation (Additional File 6: Table S4a-c). The AP2/ERF, bHLH, bZIP, C2H2, C3H, HB, MADS-MIKC/MADS-M-type, NAC, and WRKY TFs were largely upregulated in F_vs_H and F_vs_B but were downregulated in H_vs_B (Table 2). However, C2C2, MYB, and Tify TFs were mostly downregulated in the three pairwise groups ( Table 2).
In addition, a K-means clustering analysis based on the FPKM of 12,625 unique genes among the samples (F, H, and B) was conducted by following the procedure outlined by [61] to identify co-expressed modules. Importantly, the TFs co-expressed with the anthocyanin biosynthetic genes could play important roles in modulating purple coloration in the petal/flower of D. nestor. In all, six clusters of genes were detected (Figure 7). Most genes involved in the anthocyanin biosynthesis pathway were grouped in Cluster 1 with several TF family members, including MYB, bHLH, WRKY, and AP2/ERF (Additional file 7: Table S5). For instance, Cluster-30752.54898, Cluster-30752.54899, and Cluster-30752.54900 in Cluster 1 have bHLH-MYC and R2R3-MYB TFs N-terminal domain.
Genes involved in the anthocyanin pathways are differentially modulated in monocots and dicots by MYB, bHLH, and WD40 TFs [62,63]. Hence, we studied MYB and bHLH TFs further to deepen our understanding of their involvement in modulating purple flower color formation at three different stages in D nestor. A total of 53 and 50 genes with MYB and bHLH were profiled for their expressions among the F, H, and B samples, based on standardized FPKM values (Figure 8 a,b). Most of these genes were largely upregulated (

Identification of Transcription Factors Regulating Color Formation in the Petals/Flowers of D. nestor
Transcription factors (TFs) are the primary regulators of gene expression [60], of which many were predicted to modulate anthocyanin accumulation and biosynthesis in flowering plants. The sequences of DEGs obtained in each pairwise group were submitted to the plant transcription factor database to identify DEGs encoding for TFs. A total of 601 (F_vs_H), 583 (F_vs_B), and 116 (H_vs_B) DEGs encoding TF were detected, with varied degree of regulation (Additional File 6: Table S4a-c). The AP2/ERF, bHLH, bZIP, C2H2, C3H, HB, MADS-MIKC/MADS-M-type, NAC, and WRKY TFs were largely upregulated in F_vs_H and F_vs_B but were downregulated in H_vs_B (Table 2). However, C2C2, MYB, and Tify TFs were mostly downregulated in the three pairwise groups ( Table 2).
In addition, a K-means clustering analysis based on the FPKM of 12,625 unique genes among the samples (F, H, and B) was conducted by following the procedure outlined by [61] to identify co-expressed modules. Importantly, the TFs co-expressed with the anthocyanin biosynthetic genes could play important roles in modulating purple coloration in the petal/flower of D. nestor. In all, six clusters of genes were detected (Figure 7). Most genes involved in the anthocyanin biosynthesis pathway were grouped in Cluster 1 with several TF family members, including MYB, bHLH, WRKY, and AP2/ERF (Additional file 7: Table S5). For instance, Cluster-30752.54898, Cluster-30752.54899, and Cluster-30752.54900 in Cluster 1 have bHLH-MYC and R2R3-MYB TFs N-terminal domain.

Discussion
Flowers offer visual appeal, and the commercial value of ornamental plants is markedly determined by petal color. Previous studies on floral coloration revealed species-specific distinctiveness in pigment regulation [25,[64][65][66][67]. Additionally, the variations in floral coloration emanates from different processes, such as competition among pathways, expression patterns of structural genes involved in pigment formation, and mutations in structural or regulatory genes [24,68]. Thus, species-specific studies of flower color formation and the elucidation of their regulatory mechanisms are essential to deepen our understanding of flower color development in orchids. In plant breeding and horticulture, transcriptome sequencing is highly employed for predicting novel genes, gene function, and genome evolution [11]. Here, we performed whole-transcriptome sequencing to study the mechanism of purple color development in D. nestor petals, which is the first report, as compared to other Dendrobium species [13,[69][70][71][72]. H and B samples might have relatively similar anthocyanins accumulation and synthesis, hence the high similarity as evident in Figure 1a,b and Figure 4b. We also reported the first ever SSR markers useful for future molecular breeding efforts in D. nestor (Figure 3, Additional file 2: Table S1).

Discussion
Flowers offer visual appeal, and the commercial value of ornamental plants is markedly determined by petal color. Previous studies on floral coloration revealed species-specific distinctiveness in pigment regulation [25,[64][65][66][67]. Additionally, the variations in floral coloration emanates from different processes, such as competition among pathways, expression patterns of structural genes involved in pigment formation, and mutations in structural or regulatory genes [24,68]. Thus, species-specific studies of flower color formation and the elucidation of their regulatory mechanisms are essential to deepen our understanding of flower color development in orchids. In plant breeding and horticulture, transcriptome sequencing is highly employed for predicting novel genes, gene function, and genome evolution [11]. Here, we performed whole-transcriptome sequencing to study the mechanism of purple color development in D. nestor petals, which is the first report, as compared to other Dendrobium species [13,[69][70][71][72]. H and B samples might have relatively similar anthocyanins accumulation and synthesis, hence the high similarity as evident in Figure 1a,b and Figure 4b. We also reported the first ever SSR markers useful for future molecular breeding efforts in D. nestor (Figure 3, Additional file 2: Table S1).

Pathways Involved in Purple Color Formation in Petals of D. nestor at Different Stages of Development
Flowering plants exhibit a wide variation in their flora, foliage, and fruit colors, as a result of genetic factors and variations in environments. Flower color formation in orchids are controlled largely by anthocyanins, carotenoids, and betalains [25]. In this present study, petals/flowers of D. nestor were sampled at different developmental stages to conduct transcriptome sequencing. From the KEGG pathway enrichment analyses, the most prominent pathways were phenylpropanoid-flavonoid-anthocyanin biosynthesis, biosynthesis of secondary metabolites, plant hormone signal transduction, metabolic pathways, carotenoid biosynthesis, and the circadian rhythm plant biosynthesis pathway (Figure 4a-c; Additional file 5: Table S3). Ma et al. [73] and Luo et al. [74] reported these to be the predominant pathways in color formation in Vitus vinifera and Punica granatum, which was consistent with the results of this study.
Anthocyanins are flavonoids, which are a major offshoot of the highly branched phenylpropanoid pathway, with several prominent genes including, PAL, C4H, 4CL, CHS, CHI, F3H, F3 H, FLS, DFR, ANS, and UFGT [52]. Our results revealed most of the PAL, CHS, ANS, and UFGT genes involved in anthocyanins biosynthesis were up-regulated and increased in abundance in the H and B samples, as compared to the F sample ( Figure 5). These genes might have contributed to the increasing purple coloration from the F sample to the H and B samples, as evidenced in Figure 1. The CHS provides the precursor for 4-coumaroyl-CoA to naringenin chalcone in the flavonoid biosynthesis pathway. The CHS encoded genes (Cluster-30752.8254, Cluster-30752.8253, Cluster-30752.53070, Cluster-30752.77300, and Cluster-30752.51008), which were highly expressed in the B sample, relative to the F and H samples. However, the latter two genes were more highly expressed in the H sample than F sample ( Figure 5). These genes are involved in the key regulatory step during the synthesis of flavonoids, which is a major pigment in many flowers, leaves, and fruits [75,76]. The Naringenin is converted into several anthocyanin-related substances like dihydrokaempferol, by the action of the F3H encoded genes, which were mostly expressed the most in the B samples than either the F or H samples ( Figure 5). Earlier study identified PaCHS as one of the modulators of purple coloration in Phalaenopsis amabilis [77]. SlCHS contributes to red coloration in tomato [78], while CaCHS regulates anthocyanin-pigmented fruits in Capsicum annuum [79]. The ANS converts the leucoanthocyanidins to anthocyanidins ( Figure 5). It was previously reported that CaANS increases from the young stage and reaches maximum at the late unripe stage, prior to ripening in C. annuum [79]. A similar trend was observed for the five ANS genes (Cluster-30752.47462, Cluster-30752.77085, Cluster-30752.74730, Cluster-30752.74682, and Cluster-30752.79274) in the three samples of flowers ( Figure 5). Similarly, expression levels of genes encoding ANS were upregulated and the abundance of their corresponding proteins also increased during fruit color development [74]. Another prominent gene, UFGT, which glycolyzes anthocyanidin into anthocyanin [80], gene13307, and gene43584 were downregulated in the low anthocyanin Yunyan 87 mutant of tobacco [81]. In our study, ten UFGT genes (Cluster-30752.65670, Cluster-30752.65955, Cluster-30752.49374, Cluster-30752.55935, Cluster-30752.75787, Cluster-30752.80534, Cluster-30752.53196, Cluster-30752.32440, Cluster-30752.78839, Cluster-30752.32439, Cluster-30752.53515, and Cluster-30752.54954) were highly expressed in the H and the B sample, as compared to that of the F sample ( Figure 5).
Anthocyanin biosynthesis pathway is modulated by a number of phytohormones [54][55][56][57][58][59]. He et al. (2020) highlighted that endogenous auxin (indole acetic acid) and ABA play diverse and specific roles in D. officinale color development. In our study, 25 of the 43 genes involved in auxin signaling pathway were more highly expressed in either the H or the B samples than in the F samples (Figure 6a). This is not surprising as auxins (naphthalene acetic acid) or 2,4-dichlorophenoxyacetic acid modulates secondary metabolic pathways, including those related to phenylpropanoid, flavonoid, and anthocyanin metabolism [82,83]. Additionally, most genes involved in ethylene signaling were upregulated in the largely anthocyanin-accumulated samples (H or B) (Figure 6b). This contradicted reports by Jeong et al. [55] that ethylene signaling negatively regulates anthocyanin accumulation. Our results provide evidence of phenylpropanoid-flavonoid-anthocyanin biosynthesis, and plant hormone signal transduction pathways modulating flower coloration in D. nestor. Although in this study we provided a good insight into the expression patterns of the anthocyanin-biosynthesis-pathway-related structural genes, we did not perform any analytical analysis to identify the key pigments conferring the purple coloration in D. nestor petals. Hence, we are designing a future study that would clarify this to consolidate the molecular data reported in the present work.

Transcriptional Regulation of Purple Color Formation in D. nestor
Gene expression is a complex process and involves the coordination of multi-dynamic events, which are subject to multi-level regulation [84], including transcriptional, posttranscriptional, translational, and post-translational events. Transcription factors (TF) perform two important roles in flowering plants-they recognize and bind to short, specific sequences of DNA within the regulatory region. Second, they recruit or bind to proteins that participate in transcriptional regulation [85]. The most abundant TFs predicted in this study were AP2/ERF, AUX/IAA, bHLH, bZIP, MYB, NAC, Tify, WRKY, and other families ( Table 2; Additional File 6: Table S4a-c). The class of TFs identified were previously implicated in the modulation of color formation in petals/flowers of orchids, such as roses [29][30][31]. For example, five bHLH-MYC and R2R3-MYB were more highly expressed in the H and B samples than in the T samples (Additional File 6: Table S4b,c). These indicate that some bHLH and MYB TFs were positive regulators of anthocyanins biosynthesis in D. nestor. These were in consonance with earlier reports that anthocyanins biosynthesis is transcriptionally regulated by the MYB-bHLH-WD40 complex [22,86,87]. In apple, MdMYB1 and its alleles, MdMYB10 and MdMYBA, act as positive modulators of anthocyanin biosynthesis, by activating the expressions of MdDFR and MdUF3GT [88][89][90]. On the contrary, downregulation of MdMYB1 inhibits anthocyanin accumulation mediated by ethylene, abscisic acid (ABA), wounding, drought, and different light intensities [91][92][93][94]. Another abundant TF family, WRKY, was highly expressed in the H and B samples, as compared to the F samples, indicating that WRKY play a significant role in modulating anthocyanins biosynthesis. Recent studies on proanthocyanin and anthocyanin biosynthesis pathways in Arabidopsis and Petunia revealed that WRKY regulates the color development in concert with the MYB-bHLH-WD40 complex [95,96].
Similarly, most genes with NAC TFs increased in expression from the F, to the H and B samples. In peach (Prunus persicae), PpNAC1 activates the transcription of PpMYB10.1, leading to anthocyanin pigmentation in tobacco and apple [97,98]. In contrast, PpNAC1 was silenced resulting in a reduction in anthocyanin pigmentation in blood-fleshed peaches [97]. We employed a K-means clustering, as proposed earlier by Handhayani and Hiryanto [61], which permitted the clustering of 12,625 unique genes among the samples (F, H, and B ) into six sub-clusters with some members in Cluster 1 associated with genes from the MYB and bHLH TFs. The members in the six sub-clusters might be exploited for further downstream analyses to unravel the regulatory mechanisms of flower color formation in D. nestor.

SSR Markers Discovered for Practical D. nestor Breeding
Transcriptome data provide fast and cost-effective development of molecular markers for practical plant breeding [99,100]. Specifically, SSR markers are useful tools for genomic studies and plant breeding. We detected a total of 41,286 SSR markers and developed 41,285 primers for these markers (Figure 3, Additional File 2: Table S1). It is expected that these molecular markers would facilitate genotyping and marker-assisted breeding in D. nestor and other related species.

Conclusions
We performed transcriptome analyses to study the molecular mechanisms that regulate flower color development in D. nestor. We showed that flower coloration in D. nestor is largely controlled by anthocyanins biosynthetic genes. The structural genes and coexpressed TFs reported in this study would serve as useful genetic resources for further functional characterization and molecular breeding programs in D. nestor.
Supplementary Materials: The following are available online at https://www.mdpi.com/2075-1 729/11/2/113/s1, Figure S1. Distribution of transcript lengths and unigenes (n = total number). Figure S2. Relative expression of selected differentially expressed genes for quantitative polymerase chain reaction (qRT-PCR), among the three samples (F = flower bud stage; H = half blooming stage; B = full blooming stage). The error bar represents standard error of means. The correlation plot was constructed using qRT-PCR and RNA-seq data. *,**,*** represent significant difference between F samples and H / B samples at p < 0.05, 0.01, 0.001, respectively. Table S1. Primers for specific simple sequence repeats detected in each unigene (designed usingPrimer3 software). Table S2a. Differentially expressed genes detected in F_vs_H and their functional annotations. Table S2b. Differentially expressed genes detected in F_vs_B and their functional annotations. Table S2c. Differentially expressed genes detected in H_vs_B and their functional annotations. Table S3. Predominant pathways detected from Kyoto Encyclopedia of Genes and Genomes database with differentially expressed genes mutually detected among the three pairwise groups. Table S4a. Transcription factors identified among the differentially expressed genes in F_vs_H. Table S4b. Transcription factors identified among the differentially expressed genes in F_vs_B. Table S4c. Transcription factors identified among the differentially expressed genes in H_vs_B. Table S5. K-means clustering analysis based on the FPKM of 12625 unique genes among the samples (F, H and B). Table S6. Primers of selected genes used for the qRT-PCR.  Data Availability Statement: The datasets used or analyzed during the current study are available from the corresponding author upon reasonable request. The transcriptome raw reads were deposited in the Sequence Reads Archive (SRA) of the National Center for Biotechnology Information (NCBI), under the BioProject number PRJNA680853. The transcriptome raw reads have been deposited in the Sequence Reads Archive of the National Center for Biotechnology Information under the BioProject number PRJNA680853.

Conflicts of Interest:
The authors declare no conflict of interest.