Complete Chloroplast Genome of Abutilon fruticosum: Genome Structure, Comparative and Phylogenetic Analysis

Abutilon fruticosum is one of the endemic plants with high medicinal and economic value in Saudi Arabia and belongs to the family Malvaceae. However, the plastome sequence and phylogenetic position have not been reported until this study. In this research, the complete chloroplast genome of A. fruticosum was sequenced and assembled, and comparative and phylogenetic analyses within the Malvaceae family were conducted. The chloroplast genome (cp genome) has a circular and quadripartite structure with a total length of 160,357 bp and contains 114 unique genes (80 protein-coding genes, 30 tRNA genes and 4 rRNA genes). The repeat analyses indicate that all the types of repeats (palindromic, complement, forward and reverse) were present in the genome, with palindromic occurring more frequently. A total number of 212 microsatellites were identified in the plastome, of which the majority are mononucleotides. Comparative analyses with other species of Malvaceae indicate a high level of resemblance in gene content and structural organization and a significant level of variation in the position of genes in single copy and inverted repeat borders. The analyses also reveal variable hotspots in the genomes that can serve as barcodes and tools for inferring phylogenetic relationships in the family: the regions include trnH-psbA, trnK-rps16, psbI-trnS, atpH-atpI, trnT-trnL, matK, ycf1 and ndhH. Phylogenetic analysis indicates that A. fruticosum is closely related to Althaea officinalis, which disagrees with the previous systematic position of the species. This study provides insights into the systematic position of A. fruticosum and valuable resources for further phylogenetic and evolutionary studies of the species and the Malvaceae family to resolve ambiguous issues within the taxa.


Introduction
The genus Abutilon Mill. [1,2], whose members are widely distributed in tropical and subtropical regions [3], is considered as one of the largest genera of Malvaceae [4,5], with ca. 200 accepted species in all continents except Antarctica [3]. The systematic position of some of the taxa in the genus is still not clear [6]; hence, it is the most difficult genera in the Malvaceae with a need for critical systematic studies. The genus is distinguished from sister taxa by the presence of an endoglossum and dorsal dehiscence and a lack of an epicalyx [7]. Members of the genus received a large amount of attention due to their medicinal and economic value [8]. In addition, parts of the plant including the flower, bark, fruit and seeds are reported to contain some phytoconstituents that are responsible for their biological activity [9]. The plants contain no toxins; therefore, many researchers are focusing on them [10][11][12]. Abutilon fruticosum is reported to have medicinal values; all parts of the plant are used in the treatment of various ailments including ulcers, leprosy, inflammation of the bladder, piles, bronchitis, rheumatism and jaundice [10,13,14]. The fiber from the plant is used as a substitute of jute [8]. Despite its importance, the phylogenetic position of the genus is still not clear, and its complete chloroplast genome had not yet been reported until this study. The phylogenetic position of the species within the genus Abutilon and the family Malvaceae has not been reported. According to the available literature, there  The complete chloroplast genome of A. fruticosum contained a total of 133 genes, where 114 genes out of the 133 are unique and are present in the single copy regions; 18 genes are duplicated in the inverted repeat region which includes 7 protein-coding genes, Figure 1. Gene map of the A. fruticosum chloroplast genome. Genes outside the circles are transcribed in the counterclockwise direction and those inside in the clockwise direction. Known functional genes are indicated by colored bar. The GC and AT contents are denoted by the dark gray and light gray colors in the inner circle, respectively. LSC indicates large single copy; SSC indicates small single copy; and IR indicates inverted repeat. The complete chloroplast genome of A. fruticosum contained a total of 133 genes, where 114 genes out of the 133 are unique and are present in the single copy regions; 18 genes are duplicated in the inverted repeat region which includes 7 protein-coding genes, 4 rRNAs and 7 tRNAs. There are 80 protein-coding genes, 4 rRNAs and 30 tRNAs in the plastome ( Table 2 and Figure 1). The inverted repeat region contained seven protein-coding genes, seven tRNA and four rRNA, while in the single copy region, the LSC contained 62 protein- coding genes and 22 tRNA genes; the rest of the 12 protein-coding genes and 1 tRNA are located within the SSC region. Almost all the protein-coding genes start with the ATG codon that codes for methionine, whereas some of the genes contained alternative start codons such as ATC, GTG and ACG; this is common in most chloroplast genomes of flowering plants (angiosperms) [26][27][28]. Table 2. Genes present in the chloroplast genome of A. fruticosum.

Category
Group of Genes Name of Genes

Ribosomal proteins
Small subunit of ribosome rps2, rps3, rps4, rps7 a , rps8, rps11, rps12 a , rps14, rps15, rps16 + , rps18, rps19 Transcriptiongenes Large subunit of ribosome rpl2 +,a , rpl14, rpl16, rpl20, rpl22, rpl23 a , rpl32, rpl33, rpl36. The A. fruticosum chloroplast genome is found to contain an intron in some of the coding genes, such as in other chloroplast genomes of flowering plants [26,27]. Among the 114 coding genes in A. fruticosum, 17 contain introns ( Table 3). Out of the 17 genes with an intron, 11 are protein-coding genes and six are tRNAs. The LSC region contains introns in 11 genes and the IR region contains introns in 5 genes, while the SSC region contains introns in only 1 gene. Two genes, ycf3 and clpP, possess two introns and the other 15 genes have only one intron. trnK-UUU has the longest intron, while accD has the shortest intron (Table 3). Codon usage compares the frequencies of each codon that codes for a particular amino acid [29]. Codons are used in transmitting genetic information because they are the building blocks of proteins [30]. Codon usage is a factor shaping the evolution of chloroplast genomes because of bias in mutation [28], and it varies across different species [31]. The frequency of the codon present in the chloroplast genome was computed using the nucleotide sequence of protein-coding genes and tRNA genes 84,048 bp. The relative synonymous codon usage (RSCU) of the genes in the genome is presented in Table 4. The results show that genes in the plastome are encoded by 27,967 codons. Codons that code for leucine appear more frequently in the genome 2957 (10.57%) ( Figure 2). Meanwhile, codons coding for cysteine are the least with 325 (1.16%) in the genome. Guanine and cytosine endings are found to be more frequent than their counterparts adenine and thymine; this is not the case in other plastome sequences [32][33][34]. The result of the analysis (Table 4) shows that codon usage bias is low in the chloroplast genome of A. fruticosum. The RSCU values of 30 codons were greater than 1 and all of them have an A/T ending, while for 31 codons, the values were less than 1 and are all of the G/C ending. Only two amino acids, tryptophan and methionine, have an RSCU value of 1 and therefore they are the only amino acids with no codon bias. RNA editing is a set of processes including the insertion, deletion and modification of nucleotides that alters the DNA-encoded sequence [35], which is a way to create transcript and protein diversity [36]. Some chloroplast RNA editing sites are preserved in plants [37]. The program PREP suite was used to predict the RNA editing sites in the  RNA editing is a set of processes including the insertion, deletion and modification of nucleotides that alters the DNA-encoded sequence [35], which is a way to create transcript and protein diversity [36]. Some chloroplast RNA editing sites are preserved in plants [37]. The program PREP suite was used to predict the RNA editing sites in the chloroplast genome of A. fruticosum. The first nucleotide of the codon was used in all the analyses. The result of the analysis shows that most of the conversions in the codons are from serine to leucine (Table 5).
Generally, 50 editing sites in the genome were revealed which were distributed within 22 protein-coding genes. The gene ndhB has the highest number of editing sites with 12 sites, and this is consistent with previous studies [38][39][40]. One gene with eight editing sites is ndhD and other genes with a high number of editing sites are ndhF and rpoB having four and matK with three editing sites. The genes accD, atpA, ndhA, ndhG, rpoA, rpoC1, rpoC2 and rps2 have two editing sites.
The following genes: atpF, atpI, ccsA, clpP, petB, psbF, rpl20, rps8 and rps14, with one editing site, have the lowest number of editing sites. Conversions of proline to serine were observed, which involve the change of amino acids in the RNA editing site from a nonpolar to a polar group. Genes such as atpB, petD, petG, petL, psaB, psaI, psbB, psbE, psbL, rpl2, rpl23, rps16 and ycf3 do not possess predicted RNA sites in their first codon. The program REPuter was used to identify long repeat sequences present in the A. fruticosum chloroplast genome. It was discovered from the results that all four types of long repeats (palindromic, forward, reverse and complement) were present in the plastome of A. fruticosum ( Table 6). The analysis showed 22 palindromic repeats, 21 forward repeats, 5 reverse repeats and 1 complement repeat (Table 6). In total, there were 49 long repeats in the chloroplast genome of A. fruticosum. The majority of the repeats were between 20 and 29 bp (87.75%) in size, followed by 30-39 bp (8.16%) and 50-59 bp (4.08%) long repeats. In the first location, the intergenic spacer harbored 61.22% of the repeats. The tRNA contained four repeats (8.16%), and eight repeats (16.32%) were located in the protein-coding genes. The length of repeated sequences in the A. fruticosum chloroplast genome ranged from 10 to 69 bp, analogously to the other angiosperm plants [41][42][43]. I compared the frequency of repeats among four Malvaceae cp genomes and found that all the types of repeats (palindromic, forward, reverse and complement) were present in all genomes ( Figure 3). Malva parviflora has the highest number of palindromic repeats (25), while Sida szechuensis has the lowest with 17. A. fruticosum and M. parviflora have the same number of forward repeats-21 for each of them. T. populnea has the highest number of reverse repeats (9), while M. parviflora has the lowest (3). Complement repeats were found to be the least numerous types of repeat across the genome in A. fruticosum, in S. szechuensis and in M. parviflora, occurring once. In the plastome of T. populnea, there were three complement repeats.

Simple Sequence Repeats (SSRs)
There were short repeats of nucleotide series (1-6 bp) that were dispersed through the whole genome called microsatellites (SSRs). These short repeats in the plastid genome were passed from a single parent. As a result, they are used as molecular indicators in developmental studies such as genetic diversity and also contribute to the recognition of

Simple Sequence Repeats (SSRs)
There were short repeats of nucleotide series (1-6 bp) that were dispersed through the whole genome called microsatellites (SSRs). These short repeats in the plastid genome were passed from a single parent. As a result, they are used as molecular indicators in developmental studies such as genetic diversity and also contribute to the recognition of species [44][45][46]. A total of 212 microsatellites were found in the chloroplast genome of A. fruticosum in this study ( Table 7). The majority of SSRs in the cp genome are mononucleotides (88.88), where poly-A (polyadenine) and poly-T (polythymine) are dominant ( Figure 4). Poly-A constituted 45.06%, whereas poly-T constituted 41.97%. This is consistent with previous studies [47]. Among the dinucleotide repeats, only AG/CT and AT/AT were found in the cp genome. Taking into account the complementarity of series, only one trinucleotide (AAT/ATT), five tetranucleotides (AAAG/CTTT, AAAT/ATTT, AACT/AGTT, AATC/ATTG and AATG/ATTC) and only one pentanucleotide (AAAGT/ACTTT) were present in the cp genome ( Figure 4). The intergenic/non-coding regions harbored most of the microsatellites (75.92%) ( Figure 5).

Comparative Analysis of Plastomes of Malvaceae Species
To examine the degree of divergence in the chloroplast genome of the six species of Malvaceae, comparative analysis was conducted using the mVISTA program to align the sequences using the annotation of A. fruticosum as a reference. The alignment showed that the genomes are highly conserved with some degree of variation. The coding regions are more conserved than the non-coding regions and the inverted repeat regions are more conserved than the single copy regions (Figure 7). This was reported in the chloroplast genomes of some genera in previous studies [47,48]. The most divergent non-coding regions among the six cp genomes are trnH-psbA, trnK-rps16, psbI-trnS, atpH-atpI, trnT-trnL, ndhC-trnV, accD-psaI, petA-psbJ, atpB-rbcL, rps12 and trnL-rpl32. A slightly lower level of variability was observed in the following genes: matK, ycf1, ndhH, ycf2 and accD. These regions can be used as a source of potential barcodes for identification/authentication of Malvaceae species as well as resources for inferring phylogenetic relationships of the family.

Comparative Analysis of Plastomes of Malvaceae Species
To examine the degree of divergence in the chloroplast genome of the six species of Malvaceae, comparative analysis was conducted using the mVISTA program to align the sequences using the annotation of A. fruticosum as a reference. The alignment showed that the genomes are highly conserved with some degree of variation. The coding regions are more conserved than the non-coding regions and the inverted repeat regions are more conserved than the single copy regions (Figure 7). This was reported in the chloroplast genomes of some genera in previous studies [47,48]. The most divergent non-coding regions among the six cp genomes are trnH-psbA, trnK-rps16, psbI-trnS, atpH-atpI, trnT-trnL, ndhC-trnV, accD-psaI, petA-psbJ, atpB-rbcL, rps12 and trnL-rpl32. A slightly lower level of variability was observed in the following genes: matK, ycf1, ndhH, ycf2 and accD. These regions can be used as a source of potential barcodes for identification/authentication of Malvaceae species as well as resources for inferring phylogenetic relationships of the family.
Generally, angiosperms retain the structure and size of the chloroplast genome [46]; however, due to evolutionary events such as an expansion and contraction in the genome, slight variations in the size and location of the boundaries of inverted repeats and single copy regions do occur [49,50]. I compared IR-LSC and IR-SSC boundaries of six cp genomes of Malvaceae (Abutilon fruticosum, Althaea officinalis, Abelmoschus esculentus, Malva parviflora, Sida szechuensis, Thespesia populnea) ( Generally, angiosperms retain the structure and size of the chloroplast genome [46]; however, due to evolutionary events such as an expansion and contraction in the genome, slight variations in the size and location of the boundaries of inverted repeats and single copy regions do occur [49,50]. I compared IR-LSC and IR-SSC boundaries of six cp genomes of Malvaceae (Abutilon fruticosum, Althaea officinalis, Abelmoschus esculentus, Malva parviflora, Sida szechuensis, Thespesia populnea) (

Divergence of Protein-Coding Gene Sequences
The rates of synonymous (dS) and nonsynonymous (dN) substitutions and the dN/dS ratio were calculated using DNAsp among the plastome of six species of Malvaceae to detect whether the 80 shared protein-coding genes were under selective pressure. The results show that the dN/dS ratio is less than 1 in almost all of the paired genes except petD of A. fruticosum vs. T. populnea, psaI of A. fruticosum vs. S. szechuensis and rps12 of A. fruti-

Divergence of Protein-Coding Gene Sequences
The rates of synonymous (dS) and nonsynonymous (dN) substitutions and the dN/dS ratio were calculated using DNAsp among the plastome of six species of Malvaceae to detect whether the 80 shared protein-coding genes were under selective pressure. The results show that the dN/dS ratio is less than 1 in almost all of the paired genes except petD of A. fruticosum vs. T. populnea, psaI of A. fruticosum vs. S. szechuensis and rps12 of A. fruticosum vs. T. populnea, A. fruticosum vs. S. szechuensis and A. fruticosum vs. T. populnea (Figure 9). This indicates that the majority of the genes were under negative selection, and only three of them underwent positive selection. The synonymous (dS) values range from 0.01 to 0.16 in all the genes (Figure 9). Some of the genes including infA, petG, petN, psaJ, psbA, psbZ, psbF, psbH, psbI, psbL and rps7 showed that no nonsynonymous changes occur in the plastome of the paired species of Malvaceae.

Divergence of Protein-Coding Gene Sequences
The rates of synonymous (dS) and nonsynonymous (dN) substitutions and the dN/dS ratio were calculated using DNAsp among the plastome of six species of Malvaceae to detect whether the 80 shared protein-coding genes were under selective pressure. The results show that the dN/dS ratio is less than 1 in almost all of the paired genes except petD of A. fruticosum vs. T. populnea, psaI of A. fruticosum vs. S. szechuensis and rps12 of A. fruticosum vs. T. populnea, A. fruticosum vs. S. szechuensis and A. fruticosum vs. T. populnea (Figure 9). This indicates that the majority of the genes were under negative selection, and only three of them underwent positive selection. The synonymous (dS) values range from 0.01 to 0.16 in all the genes (Figure 9). Some of the genes including infA, petG, petN, psaJ, psbA, psbZ, psbF, psbH, psbI, psbL and rps7 showed that no nonsynonymous changes occur in the plastome of the paired species of Malvaceae.

Phylogenetic Analysis
A complete chloroplast genome is a good resource for inferring evolutionary and phylogenetic relationships [51][52][53]. Many researchers have used plastome sequences to resolve phylogenetic relationships at various taxonomic levels [54,55]. To understand the evolutionary relationship of Malvoideae, Malvaceae and the phylogenetic position of A. fruticosum in the family, the complete plastome sequences of 10 species belonging to Malvoideae were downloaded from the GenBank database. In addition, two species, C. yunnanensis (Tilioideae, Malvaceae) and Bombax ceiba (Bombacoideae, Malvaceae), used as an outgroup, were also downloaded from GenBank. The downloaded cp genomes and the plastome of A. fruticosum were aligned using MAFFT. The phylogenetic tree was constructed using the Bayesian inference approach. The results reveal ( Figure 10) that the species belonging to the subfamily Malvoideae are in one clade (monophyletic) with highly strong support, with a posterior probability (PP) value of (1.00). This is congruent with previous studies using molecular and morphological data [56][57][58]. The tree showed four distinct clades: a first clade containing Abutilon and Altheae and a second clade including Malvea species and being sister to a large clade containing two clades (Hibisceae and Gossypieae). A similar tree was obtained in a previous study using ITS [59] with slight variation. The species A. fruticosum is closely related and sister to A. officinalis. This result is incongruent with the earlier systematic position of A. fruticosum and S. szechuensis. Pre-

Phylogenetic Analysis
A complete chloroplast genome is a good resource for inferring evolutionary and phylogenetic relationships [51][52][53]. Many researchers have used plastome sequences to resolve phylogenetic relationships at various taxonomic levels [54,55]. To understand the evolutionary relationship of Malvoideae, Malvaceae and the phylogenetic position of A. fruticosum in the family, the complete plastome sequences of 10 species belonging to Malvoideae were downloaded from the GenBank database. In addition, two species, C. yunnanensis (Tilioideae, Malvaceae) and Bombax ceiba (Bombacoideae, Malvaceae), used as an outgroup, were also downloaded from GenBank. The downloaded cp genomes and the plastome of A. fruticosum were aligned using MAFFT. The phylogenetic tree was constructed using the Bayesian inference approach. The results reveal (Figure 10) that the species belonging to the subfamily Malvoideae are in one clade (monophyletic) with highly strong support, with a posterior probability (PP) value of (1.00). This is congruent with previous studies using molecular and morphological data [56][57][58]. The tree showed four distinct clades: a first clade containing Abutilon and Altheae and a second clade including Malvea species and being sister to a large clade containing two clades (Hibisceae and Gossypieae). A similar tree was obtained in a previous study using ITS [59] with slight variation. The species A. fruticosum is closely related and sister to A. officinalis. This result is incongruent with the earlier systematic position of A. fruticosum and S. szechuensis. Previous studies [60] reported that two species are sister taxa. In a recent classification, the subfamily Malvoideae [61] was divided into four tribes, namely, Malveae, Hibisceae, Gossypieae and Kydieae. Traditionally, Abutilon was placed in Malveae together with Malva and Sida by various researchers [62,63]. Later, Hutchinson [64] restructured the traditional classification using morphological data, particularly the ovule positions and their number. He proposed an introduction of such tribes as Abutileae (comprising two subtribes Abutilinae and Sidinae), Malveae, Malopeae and Hibisceae. Traditionally, Bentham, Hooker and Schumann classified Abutilon (tribe Malveae, subtribe Abutilinae), Malva (tribe Malveae, subtribe Eumalvinae), Sida (tribe Malveae, subtribe Sidinae) and Altheae (tribe Malveae, subtribe Eumalvnae); Hutchinson, later revised Abutilon (tribe Abutileae, subtribe Abutilinae), Malva and Altheae (tribe Malveae, subtribe Malvinae). Here, my results disagree with all the previous tribal positions of the genera. The tree showed that Abutilon is closely related to Altheae (with strong support) and Sida, which was reported as a sister to Abutilon, is in a different clade. Additionally, Malva and Altheae are also in different clades but were included in the same subtribe by previous classification. Based on the result in this study, I proposed the exclusion of Altheae from the tribe Malvae and its placement in Abututileae. Comparative analysis in this study (Figures 6 and 7) also showed high similarity between cp genomes of Abutilon and Altheae. More sequenced chloroplast genomes of the representatives of the subfamily Malvoideae and phylogenetic analysis based on them would still be useful to establish the final systematic position of the genera within it.

Plant Material and DNA Extraction
Leaf material of Abutilon fruticosum was collected during field research in Jeddah, Saudi Arabia. Total genomic DNA was extracted from the samples using the Qiagen genomic DNA extraction kit according to the manufacturer's protocol.

Plant Material and DNA Extraction
Leaf material of Abutilon fruticosum was collected during field research in Jeddah, Saudi Arabia. Total genomic DNA was extracted from the samples using the Qiagen genomic DNA extraction kit according to the manufacturer's protocol.

Library Construction, Sequencing and Assembly of the Chloroplast Genome
A total amount of 1.0 µg DNA was used as an input material for the DNA sample preparations. Sequencing libraries were generated using the NEBNext DNA Library Prep Kit for Illumina following the manufacturer's recommendations. The genomic DNA was randomly fragmented into 350 bp long sequences. The raw reads were filtered to get the clean reads (5 Gb) using PRINSEQlite v0.20.4 [65] and were subjected to de novo assembly using NOVOPlasty2.7.2 [66] with kmer (K-mer= 31-33) to assemble the complete chloroplast genome from the whole genome sequence. One contig containing the complete chloroplast genome sequence was generated. The chloroplast genome sequence of A. fruticosum has been submitted to GenBank (accession number: MT772391)

Sequence Analysis
The relative synonymous codon usage values (RSCU), base composition and codon usage were computed using MEGA 6.0. Possible RNA editing sites present in the proteincoding genes of the cp genome of Malvaceae species were determined using PREP suite [35] with 0.8 as the cutoff value.

Repeat Analysis
Simple sequence repeats (SSRs) were identified in the Abutilon fruticosum chloroplast genome using the online software MIcroSAtellite (MISA) [70] with the following parameters: eight, five, four and three repeat units for mononucleotides, dinucleotides, trinucleotides and tetra-, penta-and hexanucleotide SSR motifs, respectively. For analysis of long repeats (palindromic, forward, reverse and complement), the program REPuter (https://bibiserv.cebitec.uni-bielefeld.de/reputer) [71] with default parameters was used to identify the size and location of the repeats in the genome.

Genome Comparison
The complete chloroplast genomes of six species of Malvaceae were compared with the program mVISTA [72] using the annotation of A. fruticosum as a reference in the Shuffle-LAGAN mode [73]. The border regions between the large single copy (LSC) and inverted repeat (IR) and small single copy (SSC) and inverted repeat (IR) junctions were compared using an IR scope.

Characterization of Substitution Rate
DNAsp v5.10.01 [74] was used to analyze synonymous (dS) and nonsynonymous (dN) substitution rates and the dN/dS ratio to detect the genes that are under selection pressure; the chloroplast genome of A. fruticosum was compared with the cp genome of M. parviflora, S. szchuensis, T. populnea and A. officinalis.

Phylogenetic Analysis
The complete chloroplast genomes of eleven Malvoideae and two species, Craigia yunnanensis (Tilioideae) and Bombax ceiba (Bombacoideae), were downloaded from GenBank. The downloaded sequences were aligned with the sequenced cp genome of A. fruticosum using MAFFT v.7 [75]. The data were analyzed with the Bayesian inference approach using MrBayes version 3.2.6 [76]. jModelTest version 3.7 [77] was used to select the suitable model.

Data Availability Statement:
The data that support the findings of this study are openly available in GenBank of NCBI at https://www.ncbi.nlm.nih.gov, reference number (A. fruticosum, MT772391).