Genome-Wide Analysis of Terpene Synthase Gene Family in Mentha longifolia and Catalytic Activity Analysis of a Single Terpene Synthase

Terpenoids are a wide variety of natural products and terpene synthase (TPS) plays a key role in the biosynthesis of terpenoids. Mentha plants are rich in essential oils, whose main components are terpenoids, and their biosynthetic pathways have been basically elucidated. However, there is a lack of systematic identification and study of TPS in Mentha plants. In this work, we genome-widely identified and analyzed the TPS gene family in Mentha longifolia, a model plant for functional genomic research in the genus Mentha. A total of 63 TPS genes were identified in the M. longifolia genome sequence assembly, which could be divided into six subfamilies. The TPS-b subfamily had the largest number of genes, which might be related to the abundant monoterpenoids in Mentha plants. The TPS-e subfamily had 18 members and showed a significant species-specific expansion compared with other sequenced Lamiaceae plant species. The 63 TPS genes could be mapped to nine scaffolds of the M. longifolia genome sequence assembly and the distribution of these genes is uneven. Tandem duplicates and fragment duplicates contributed greatly to the increase in the number of TPS genes in M. longifolia. The conserved motifs (RR(X)8W, NSE/DTE, RXR, and DDXXD) were analyzed in M. longifolia TPSs, and significant differentiation was found between different subfamilies. Adaptive evolution analysis showed that M. longifolia TPSs were subjected to purifying selection after the species-specific expansion, and some amino acid residues under positive selection were identified. Furthermore, we also cloned and analyzed the catalytic activity of a single terpene synthase, MlongTPS29, which belongs to the TPS-b subfamily. MlongTPS29 could encode a limonene synthase and catalyze the biosynthesis of limonene, an important precursor of essential oils from the genus Mentha. This study provides useful information for the biosynthesis of terpenoids in the genus Mentha.


Introduction
Terpenoids are the largest and a structurally diverse group of natural products in plants [1]. To date, more than 80,000 terpenoid compounds, including monoterpenes, sesquiterpenes, and diterpenes, have been identified [2,3]. Terpenoids play important

Data Retrieval and Identification of TPSs
The proteome data of the sequenced Labiatae plants were downloaded from http: //www.ndctcm.org/shujukujieshao/2015-04-23/27.html (Salvia miltiorrhiza) [24], http: //caps.ncbs.res.in/Ote/ (Ocimum tenuiflorum) [25], http://ocri-genomics.org/Sinbase/ (Sesamum indicum) [26], and http://gigadb.org/dataset/100463 (Salvia splendens) [27] (Accessed data: 21 July 2020). For the identification of TPSs, the TPS specific Pfam N-terminal domain model (PF01397) and C-terminal domain model (PF03936) were downloaded from the Pfam website (http://pfam.xfam.org/) [28]. Then, an HMM search (v3.1b2) [29] was conducted to search the proteome using the PF01397 and PF03936 domain model data as queries. Candidate genes with both N-terminal and C-terminal domains were considered as complete TPSs and used for further analysis. The Arabidopsis TPS sequences were downloaded from TAIR (https://www.arabidopsis.org/) (Accessed data: 21 July 2020). The genome data of M. longifolia were downloaded from Mint Genomics Resource (http://langelabtools.wsu.edu/mgr/) (Accessed data: 5 May 2020). The assembly of the M. longifolia genome contains 12 large scaffolds encompassing 462. 6 Mb, which is consistent with the previously reported genome size (400~500 Mb) [22]. The new assembly corresponds to at least 92.5% of the predicted genome size. Due to the lack of gene prediction of the M. longifolia genome sequence assembly, a BLAT-based method was used to identify TPSs in M. longifolia genome sequence assembly [30]. The protein query set representing the TPS family used for BLAT was constructed based on the PF01397 and PF03936 seed sequences. The target sequences and flanking sequences in the M. longifolia genome sequence were extracted and then imported to Genscan for gene prediction [31]. The conserved N-terminal and C-terminal domains of M. longifolia TPSs were confirmed on the SMART website (http://smart.embl-heidelberg.de/).

Multiple Sequence Alignment and Phylogenetic Analyses
The multiple sequence alignment of TPSs from M. longifolia and other plants was performed using the MUSCLE3.6 software [32]. The alignment results were imported to MGEA X to construct the phylogenetic tree [33]. The phylogenetic tree was constructed using the maximum likelihood method with the Jones Taylor Thornton (JTT) model. The bootstrap value for the phylogenetic tree was 1000 replicates. The phylogenetic tree was further modified using iTOL (https://itol.embl.de/) [34].

Adaptive Evolution Analysis of M. longifolia TPSs
Based on the phylogenetic tree and duplication gene analysis of the M. longifolia TPS gene family, 14 paralog pairs were selected to calculate the nonsynonymous-to-synonymous substitution ratio (Ka/Ks). The calculation was conducted using a KaKs-Calculator 2.0 [38] with the sliding window method (90 bp window and 30 bp slide). Then, the site model of EasyCodeML [39] was used to conduct adaptive evolution analyses on each subfamily of M. longifolia TPSs. Three pairs of models (M0 (one-ratio) vs. M3 (discrete), M1a (neutral) vs. M2a (positive selection), and M7 (β) vs. M8 (β + ω)) were chosen to test positive selection using the likelihood ratio test (LRT) and the Bayes empirical Bayes (BEB) method [40,41].

RNA Isolation and MlongTPS29 Cloning
The M. longifolia used to extract RNA was introduced from the Botanical Garden Berlin-Dahlem in Germany with the accession number of ES-0-B-0180887 and then cultivated at the Germplasm Nursery in the Institute of Botany, Jiangsu Province and Chinese Academy of Sciences, Nanjing, Jiangsu Province. Total RNA of M. longifolia leaves was extracted using a FastPure Plant Total RNA Isolation Kit (Vazyme, Nanjing, China) according to the manufacturer's instructions. After quality and concentration detection, 1 µg of total RNA was used to synthesize the first strand cDNA with a HiScript II 1st Strand cDNA Synthesis Kit (Vazyme, Nanjing, China). To identify the candidate limonene synthase in M. longifolia genome sequence, limonene synthases of M. spicata (AAC37366.1) and M. piperita (ABW86881.1) were used as queries to BLAST in M. longifolia TPSs. Polymerase chain reaction (PCR) was performed to amplify MlongTPS29 with a gene-specific forward primer (5 -ATGGCTTTCAAAGTGTTTAGTG-3 ) and reverse primer (5 -TCATGCAAAGGGCTCGAAT-3 ). The amplified fragments were purified using the TaKaRa MiniBEST Agarose Gel DNA Extraction Kit Ver.4.0 (Takara, Dalian, China) and then cloned into the pClone007 Blunt Simple Vector (Tsingke, Beijing, China). The positive clones were screened and sequenced for confirmation.

Expression of Recombinant MlongTPS29 in Escherichia coli and Enzyme Assays
The coding sequence of MlongTPS29 was cloned into the prokaryotic expression vector pET28a using the homologous recombination method. Briefly, MlongTPS29 was amplified with primers containing homology arms. The forward primer was 5 -CAAATGGGTCGCGG ATCCATGGCTTTCAAAGTGTTTAGTG-3 , and the reverse primer was 5 -GGCCGCAAGC TTGTCGACTCATGCAAAGGGCTCGAAT-3 (Italic indicates homology arms). The pET28a vector was digested with the restriction endonuclease BamHI and SalI. Then, the homologous recombination was performed with a Trelief™ SoSoo Cloning Kit Ver.2 (Tsingke, Beijing, China) according to the manufacturer's instructions. The recombinant vector was transformed into E. coli BL21 (DE3), and the expression of recombinant MlongTPS29 was induced by addition of isopropyl-β-D-thiogalactoside (IPTG) to a final concentration of 1 mM. After cultured at 16 • C for 20 h, the cells were collected by centrifugation and washed twice using reaction buffer (50 mM HEPES, pH 7.5, with 5 mM MgCl 2 , 2 mM MnCl 2 , 200 mM KCl, 5 mM dithiothreitol, and 10% (v/v) glycerol). Then, the cells were resuspended in reaction buffer and disrupted by sonication. After centrifugation at 16,000× g at 4 • C for 15 min, the supernatant was collected and used for further enzyme assays.
The enzyme activity of MlongTPS29 was detected according to an earlier report with minor modification [42]. Briefly, the supernatant of E. coli with recombinant MlongTPS29 was added to a 200 µL reaction mixture, and then 10 µM of GPP was added to initiate the reaction. The reaction mixture was incubated at 30 • C for 1 h. Products of the reaction were extracted with dichloromethane and then detected by an Agilent 8860/5977B GC-MS system equipped with a DB-5MS column (30 m × 0.25 mm i.d.). The oven temperature was isothermal at 45 • C, then increased at a rate of 10 • C/min to 220 • C, and maintained at 220 • C for 2 min.

Identification of TPS Genes in M. longifolia Genome Sequence
The HMM-based method and BLAST-based method are commonly used to identify the TPS gene family in plants. In this study, due to the lack of gene prediction of the M. longifolia genome, a BLAT-based method was used to identify TPS family. Using the conserved TPS N-terminal domain (PF01397) and C-terminal domain (PF03936) seed sequences as queries, 89 and 99 TPS-N and TPS-C genes were identified after gene model prediction, respectively. By comparing the two results, 78 candidate TPS genes were obtained. After confirming the conserved domains manually, we finally identified 63 TPSs containing both TPS N-terminal and TPS C-terminal domains in the M. longifolia genome sequence (Table 1, File S1).

Phylogenetic Analyses of TPSs from M. longifolia and Other Lamiaceae Plants
To examine the evolutionary relationships of M. longifolia TPSs, a phylogenetic tree was constructed using the M. longifolia TPSs and TPSs from Arabidopsis thaliana and the other four sequenced Lamiaceae plants, namely, O. teruiflorum, S. indicum, S. miltiorrhiza, and S. splendens. The phylogenetic tree demonstrated that TPS proteins were clustered into six subfamilies, including TPS-a, TPS-b, TPS-c, TPS-e, TPS-f, and TPS-g ( Figure 1). No TPS-d or TPS-h gene was identified because TPS-d was gymnosperm specific, and TPS-h was only observed in Selaginella moellendorffii [12]. Some species-specific clades were observed, for example, 22 TPS-a subfamily genes of A. thaliana clustered into a clade and 11 TPS-b subfamily genes of S. splendens clustered into a clade. Among the Lamiaceae plants analyzed in this study, the TPS-a subfamily had the largest number of genes except for M. longifolia, the gene number of TPS-b subfamily of which was more than that of the TPS-a subfamily ( Table 2). Comparing the gene numbers of each subfamily, it is worth noting that the gene number of the TPS-e subfamily in M. longifolia genome sequence assembly was much higher than that of the other Lamiaceae plants, and there was a significant species-specific expansion for the TPS-e subfamily in M. longifolia (Table 2).

Phylogenetic Analyses of TPSs from M. longifolia and Other Lamiaceae Plants
To examine the evolutionary relationships of M. longifolia TPSs, a phylogenetic tree was constructed using the M. longifolia TPSs and TPSs from Arabidopsis thaliana and the other four sequenced Lamiaceae plants, namely, O. teruiflorum, S. indicum, S. miltiorrhiza, and S. splendens. The phylogenetic tree demonstrated that TPS proteins were clustered into six subfamilies, including TPS-a, TPS-b, TPS-c, TPS-e, TPS-f, and TPS-g ( Figure 1). No TPS-d or TPS-h gene was identified because TPS-d was gymnosperm specific, and TPS-h was only observed in Selaginella moellendorffii [12]. Some species-specific clades were observed, for example, 22 TPS-a subfamily genes of A. thaliana clustered into a clade and 11 TPS-b subfamily genes of S. splendens clustered into a clade. Among the Lamiaceae plants analyzed in this study, the TPS-a subfamily had the largest number of genes except for M. longifolia, the gene number of TPS-b subfamily of which was more than that of the TPS-a subfamily ( Table 2). Comparing the gene numbers of each subfamily, it is worth noting that the gene number of the TPS-e subfamily in M. longifolia genome sequence assembly was much higher than that of the other Lamiaceae plants, and there was a significant species-specific expansion for the TPS-e subfamily in M. longifolia (Table 2).

Classification of M. longifolia TPSs Based on the Phylogenetic Tree
The phylogenetic analysis of 63 M. longifolia TPSs was performed using MEGA X with the maximum likelihood method. Based on the phylogenetic tree, 63 M. longifolia TPSs could be divided into 6 subfamilies, namely, 13 TPS-a genes, 22 TPS-b genes, 5 TPS-c genes, 18 TPS-e genes, 1 TPS-f gene, and 4 TPS-g genes. The TPS-e and TPS-f subfamilies were always merged into one subfamily since TPS-f is derived from TPS-e, and they were clustered into one clade ( Figure 2). It is worth noting that there are 18 TPS-e subfamily genes in M. longifolia genome sequence, which is much more than that reported for most other plants [13].

Exon-Intron Stucture of M. longifolia TPS Genes
The numbers of exons and introns in plant TPS genes are relatively low. According to the intron-exon pattern, TPS genes can be divided into three classes, class I, class II, and class III, which contain 12-14 introns, 9 introns, and 6 introns, respectively [16]. In this study, most TPS-a, TPS-b and TPS-g subfamily genes of M. longifolia contain six to eight exons and five to seven introns (Table 1 and Figure 2), and they all belonged to class III TPSs. The TPS-c subfamily genes contain 14 to 15 exons and 13 to 14 introns (Table 1 and Figure 2), which belonged to class I TPSs. The gene structure of the TPS-e subfamily genes showed a relatively large variation. The exon numbers of TPS-e subfamily genes varied from 6 to 14, and part of which exhibited a loss of exons in the 5 -terminal (Table 1 and Figure 2).

Genomic Distribution of M. longifolia TPS Genes
The 63 TPS genes were mapped to nine scaffolds of M. longifolia genome sequence assembly based on their localization information (Figure 3). The distribution of these genes is uneven, for example, only two TPS genes mapped onto scaffold3 and scaffold6, while 19 TPS genes clustered on scaffold9. The clustered distribution of some subfamily members was also observed, such as nine TPS-b genes clustering on scaffold11 and 16 TPS-e genes clustering on scaffold9. Tandem duplication and segment duplication are common phenomena related to the increase in gene copies in plants. In this study, tandem duplication and segment duplication of TPS genes were also analyzed. Seven tandem duplicates and 3 segment duplicates of TPS genes were observed in the M. longifolia genome sequence assembly, and it contained a total of 30 TPS genes (  members was also observed, such as nine TPS-b genes clustering on scaffold11 and 16 TPS-e genes clustering on scaffold9. Tandem duplication and segment duplication are common phenomena related to the increase in gene copies in plants. In this study, tandem duplication and segment duplication of TPS genes were also analyzed. Seven tandem duplicates and 3 segment duplicates of TPS genes were observed in the M. longifolia genome sequence assembly, and it contained a total of 30 TPS genes (Figure 3). The duplication events occurred in the TPS-a, TPS-b, and TPS-e subfamilies.

Conserved Motif Analyses of M longifolia TPSs
TPS harbors conserved structural features such as the RR(X)8W motif in the N-terminal domain and DDXXD and NSE/DTE motifs in the C-terminal domain, which play important roles in the catalytic function of TPS [12,43]. In our study, conserved motifs were analyzed in M. longifolia TPSs, and significant differentiation was found between different

Conserved Motif Analyses of M. longifolia TPSs
TPS harbors conserved structural features such as the RR(X)8W motif in the N-terminal domain and DDXXD and NSE/DTE motifs in the C-terminal domain, which play important roles in the catalytic function of TPS [12,43]. In our study, conserved motifs were analyzed in M. longifolia TPSs, and significant differentiation was found between different subfamilies ( Figure 4). The RR(X)8W motif is conserved in the TPS-b subfamily and plays a role in initiation of the isomerization cyclization reaction [44]. Both the TPS-b and TPS-g subfamilies are angiosperm monoterpene synthases, but the TPS-g proteins do not contain this motif. The TPS-g proteins are required for the biosynthesis of acyclic monoterpenes, which form floral volatile organic compounds (VOCs) [45]. It has been reported that the TPS-a subfamily encodes only sesquiterpene synthase, and the second arginine of the RR(X)8W motif is not conserved [46]. The NSE/DTE motif is conserved in most subfamilies except for the TPS-c subfamily. The RXR motif is conserved in the TPS-a and TPS-b subfamilies. The DDXXD motif is the most conserved motif among these TPSs and is conserved in the TPS-a, TPS-b, TPS-e, TPS-f, TPS-g subfamilies but not the TPS-c subfamily (Figure 4). The DDXXD motif is involved in the coordination of divalent ions and water molecules and the stabilization of the active site [47,48]. The TPS-c proteins are not expected to have this domain as they do not cleave the prenyl diphosphate unit; however, they contain a DXDD motif that is critical for the protonation initiate reaction [49].

Adaptive Evolution Analysis of M. longifolia TPSs
In order to explore whether positive selection drove the evolution of the M. longifolia TPS gene family, the nonsynonymous-to-synonymous substitution ratio (Ka/Ks = ω) was calculated to estimate the positive selection. Using the sliding window of 90 bp and a moving step of 30 bp, the Ka/Ks ratios of 14 M. longifolia TPS paralog pairs were calculated ( Figure 5). A few sites in eight paralog pairs (three, three, and two for the TPS-a, TPS-b, and TPS-e subfamilies, respectively) had Ka/Ks > 1, and most sites had Ka/Ks < 1, suggesting that most M. longifolia TPS genes were subjected to purifying selection after the species-specific expansions. To further investigate the evolutionary selection pressures acting on M. longifolia TPS genes, the site models of each subfamily were calculated using EasyCodeML. As shown in Table 3, all the subfamilies were subject to purification selection with ω ranging from 0.202 to 0.310. Some amino acid residues under positive selection were identified in the TPS-c and TPS-g subfamilies.

Adaptive Evolution Analysis of M longifolia TPSs
In order to explore whether positive selection drove the evolution of the M. longifolia TPS gene family, the nonsynonymous-to-synonymous substitution ratio (Ka/Ks = ω) was calculated to estimate the positive selection. Using the sliding window of 90 bp and a moving step of 30 bp, the Ka/Ks ratios of 14 M. longifolia TPS paralog pairs were calculated ( Figure 5). A few sites in eight paralog pairs (three, three, and two for the TPS-a, TPS-b, and TPS-e subfamilies, respectively) had Ka/Ks > 1, and most sites had Ka/Ks < 1, suggesting that most M. longifolia TPS genes were subjected to purifying selection after the species-specific expansions. To further investigate the evolutionary selection pressures acting on M. longifolia TPS genes, the site models of each subfamily were calculated using EasyCodeML. As shown in Table 3, all the subfamilies were subject to purification selection with ω ranging from 0.202 to 0.310. Some amino acid residues under positive selection were identified in the TPS-c and TPS-g subfamilies.

Enzyme Activity Assays of MlongTPS29
Limonene is an important precursor of the essential oil components of the genus Mentha, whose synthesis is catalyzed by limonene synthase (LS). In order to identify the candidate LS in M. longifolia genome sequence, LSs of M. spicata and M. piperita were used as queries to BLAST in M. longifolia TPSs. As a result, a candidate LS-coding gene, MlongTPS29, was identified in M. longifolia genome sequence. The coding sequence of MlongTPS29 is 1800 bp, which is the same as that for the LS homologs in M. spicata and M. piperita. Multiple sequence alignment also showed that MlongTPS29 was considerably similar to the LS of M. spicata and M. piperita ( Figure S1). Both the sequence length and sequence similarity indicate that MlongTPS29 is complete. This gene was cloned and then subjected to assay its catalytic activity. The recombinant MlongTPS29 was heterologous expressed in E. coli and used to construct the reaction in vitro. After adding GPP as a substrate, GC-MS analysis showed that the limonene could be detected in the MlongTPS29 group, while no limonene was detected in the empty pET28a group (Figure 6). This result indicates that MlongTPS29 could catalyze the production of limonene from GPP.
MlongTPS29 is 1800 bp, which is the same as that for the LS homologs in M. spicata and M. piperita. Multiple sequence alignment also showed that MlongTPS29 was considerably similar to the LS of M. spicata and M. piperita ( Figure S1). Both the sequence length and sequence similarity indicate that MlongTPS29 is complete. This gene was cloned and then subjected to assay its catalytic activity. The recombinant MlongTPS29 was heterologous expressed in E. coli and used to construct the reaction in vitro. After adding GPP as a substrate, GC-MS analysis showed that the limonene could be detected in the MlongTPS29 group, while no limonene was detected in the empty pET28a group ( Figure  6). This result indicates that MlongTPS29 could catalyze the production of limonene from GPP.

Discussion
The genus Mentha has important economic value for its abundance of essential oils. The major constituents of mint essential oils are monoterpenes and sesquiterpenes [18,19]. Mentha plants (especially peppermint and spearmint) have been employed as model systems for the study of monoterpene biosynthesis [20,21]. However, the complex polyploidy and lack of genomic information limited further study. Horse mint (M. longifolia) is a diploid ancestor species of the genus Mentha, which has been developed as a model species for mint genomics [22]. The completion of M. longifolia genome sequencing provides opportunity to perform functional genomic studies of Mentha plants [23]. In this study, the TPS gene family, which is positioned at the branch point and is a key enzyme for terpenoid biosynthesis, was genome-widely identified and analyzed in M. longifolia genome sequence assembly. A total of 63 complete TPS genes were identified in the M. longifolia genome sequence assembly according to the conserved N-terminal and C-terminal domains of TPS. TPS belongs to a medium-sized gene family, with various gene numbers (approximately 20-150) among different plants [12]. The number of TPS genes in M. longifolia genome sequence assembly is moderate when compared to that of other reported plants.
According to the phylogenetic analysis, TPSs of M. longifolia fall into six known angiosperm TPS subfamilies (TPS-a, TPS-b, TPS-c, TPS-e, TPS-f, and TPS-g). No gymnospermspecific TPS-d subfamily or S. moellendorffii-specific TPS-h subfamily genes were identified. However, recent studies indicated that the TPS-d subfamily is not gymnosperm-specific, it was also found in Ananas comosus and Marchantia polymorpha [13]. TPS-b is the largest subfamily in M. longifolia genome sequence, and it has more members than the TPS-a subfamily (34.9%TPS-b genes and 20.6% TPS-a genes). This is in contrast to most other plants, such as A. thaliana (18.8% TPS-b genes and 68.8% TPS-a genes) [50], Vitis vinifera (29.0% TPS-b genes and 43.5% TPS-a genes) [46], and Oryza sativa (5.0% TPS-b genes and 62.5% TPS-a genes) [13]. The genomic distribution analysis showed that there were some tandem duplicates and segment duplicates in TPS-b genes, which might be the cause of the increase in the number of TPS-b subfamily genes in M. longifolia genome sequence [13]. The TPS-b subfamily is mainly responsible for catalyzing the biosynthesis of monoterpenoids, and monoterpenoids are the main components of the essential oils of Mentha plants [1,18]. Therefore, we speculate that the expansion of the TPS-b subfamily of Mentha may be related to the rich monoterpenoid content. Another interesting phenomenon is that there are 18 TPS-e subfamily genes in M. longifolia genome sequence, which is much higher than that of most other plants. It is worth noting that most TPS-e genes (15 of 18) are distributed on scaffold9, and tandem duplicates also exist in this subfamily. Whether the species-specific expansion of TPS-e in M. longifolia causes functional differentiation remains unclear. The integrated chemical-genomic-phylogenetic approach in Lamiaceae revealed that gene family expansion rather than increasing the enzyme promiscuity of terpene synthase is correlated with mono-and sesquiterpene diversity [51]. GC-MS analysis showed that the diversity of mono-and sesquiterpene in the genus Mentha was more abundant than that in other genera of Lamiaceae [51]. The catalytic function of the expanded TPS-e subfamily needs further investigation.
The TPS genes could also been classified into different classes according to their genomic structure, including class I (13-15 exons), class II (10 exons), and class III (7 exons), which appear to have evolved sequentially from class I to class III [16]. Class I TPSs consist primarily of diterpene synthases found in gymnosperms (secondary metabolism) and angiosperms (primary metabolism). Class II TPSs evolved from class I by loss of the conifer diterpene internal sequence domain. Class III TPSs consist of angiosperm monoterpene, sesquiterpene, and diterpene synthases involved in the secondary metabolism, which evolved from Class II by loss of introns [16]. There are differences in gene structure between different subfamilies, while members of the same subfamily show minor differences. TPS-a, TPS-b, and TPS-g subfamilies with 6 to 8 exons belong to class III TPS, while TPS-c, TPS-e and TPS-f with 13 to 15 exons belong to class I TPS. In M. longifolia genome sequence, the gene structure of TPS is basically consistent with the subfamily classification, except for TPS-e. By comparing TPS-e genes with other plants, it was observed that some M. longifolia TPS-e genes have a loss of exons in the 5 -terminal. It has been suggested that during the evolutionary process, class I TPS genes will loss exons and introns successively to form a new class, so we speculate that these exon-losing TPS genes may be involved in this evolutionary process. Whether this exon deletion affects its function remains unclear.
The main components of essential oils of Mentha plants are monoterpenoids, which are mainly catalyzed by the TPS-b subfamily. In this study, we selected the MlongTPS29, a putative limonene synthase encoding genes belonged to the TPS-b subfamily, for catalytic activity analysis. Limonene is the most important precursor of the essential oil components of the genus Mentha, which is catalyzed by limonene synthase. In peppermint and spearmint (two widely cultivated Mentha plants), the limonene synthase has been identified and shown to catalyze the synthesis of limonene from GPP [52]. The results of our study indicate that MlongTPS29 could also catalyze the production of limonene from GPP in vitro.

Conclusions
In this study, we genome-widely identified and analyzed the TPS gene family in M. longifolia genome sequence assembly, a model plant for functional genomic research in the genus Mentha. A total of 63 TPS genes were identified in the M. longifolia genome sequence, which could be divided into six subfamilies. The TPS-e subfamily had 18 members and showed a significant species-specific expansion compared with other plants. The 63 TPS genes could be mapped to nine scaffolds of M. longifolia genome sequence assembly, and the tandem duplicates and fragment duplicates contributed greatly to the increase in the number of TPS genes. The conserved motifs of M. longifolia TPSs were significantly differentiated between different subfamilies. Adaptive evolution analysis showed that M. longifolia TPSs were subjected to purifying selection after the species-specific expansion, and some amino acid residues under positive selection were identified. We also cloned a TPS-b gene, MlongTPS29, which could encode a limonene synthase and catalyze the biosynthesis of limonene, an important precursor of essential oils from the genus Mentha. This study provides useful information for the biosynthesis of terpenoids in the genus Mentha.