Molecular evolution of bumble bee vitellogenin and vitellogenin‐like genes

Abstract Vitellogenin (Vg), a storage protein, has been significantly studied for its egg yolk precursor role in oviparous animals. Recent studies found that vitellogenin and its Vg‐like homologs were fundamentally involved in many other biological processes in social insects such as female caste differences and oxidative stress resilience. In this study, we conducted the first large‐scale molecular evolutionary analyses of vitellogenin coding genes (Vg) and Vg‐like genes of bumble bees, a primitively eusocial insect belonging to the genus Bombus. We obtained sequences for each of the four genes (Vg, Vg‐like‐A, Vg‐like‐B, and Vg‐like‐C) from 27 bumble bee genomes (nine were newly sequenced in this study), and sequences from the two closest clades of Bombus, including five Apis species and five Tetragonula species. Our molecular evolutionary analyses show that in bumble bee, the conventional Vg experienced strong positive selection, while the Vg‐like genes showed overall relaxation of purifying selection. In Apis and Tetragonula; however, all four genes were found under purifying selection. Furthermore, the conventional Vg showed signs of strong positive selection in most subgenera in Bombus, apart from the obligate parasitic subgenus Psithyrus which has no caste differentiation. Together, these results indicate that the conventional Vg, a key pleiotropic gene in social insects, is the most rapidly evolving copy, potentially due to its multiple known social functions for both worker and queen castes. This study shows that concerted evolution and purifying selection shaped the evolution of the Vg gene family following their ancient gene duplication and may be the leading forces behind the evolution of new potential protein function enabling functional social pleiotropy.


| INTRODUC TI ON
Vitellogenin (Vg) is a phospholipoglycoprotein, and the precursor protein of vitellin, a protein required for egg yolk formation by most oviparous species (Spieth et al., 1991). In insects, vitellogenin is synthesized in the fat body, released into the hemolymph, and finally taken up by developing oocytes to be consumed throughout embryogenesis (Hagedorn & Kunkel, 1979;Pan et al., 1969;Raikhel & Dhadialla, 1992;Tufail & Takeda, 2008). Despite its main egg yolk function, Vg is not female-specific and can also be found in males of some species although in smaller amount (Piulachs et al., 2003;Trenczek & Engels, 1986;Tufail & Takeda, 2008). Moreover, Vg has been extensively studied for its multifunctional effects in social insect life histories. In honey bee (Apis mellifera), in addition to its central involvement in the division of labor between queens and workers (Tufail & Takeda, 2008;Weil et al., 2009), Vg is known to be involved in regulation of nonreproductive features of colonies, such as aging and queen longevity (Corona et al., 2007;Excels, 1974), temporal worker division of labor (Bloch & Grozinger, 2011;Guidugli et al., 2005;Münch & Amdam, 2010;Nelson et al., 2007), and royal jelly production (Amdam et al., 2003).
In honey bee, only one conventional Vg gene can be found and has been extensively studied for its multiple phenotypic effects on both queen and worker traits (Amdam et al., 2003). Pleiotropic genes are expected to be evolutionarily constrained since mutations that increase fitness for one trait might decrease overall fitness via antagonistic effects on other traits (Otto, 2004). Interestingly however, many previous studies have not found support for a negative relationship between the pleiotropy of a given gene and its selection pressure, measured as the ratio of nonsynonymous to synonymous substitutions (dN/dS; e.g., Razeto-Barry et al., 2011;Twyman et al., 2018;Vedanayagam & Garrigan, 2015). Vg displayed high rates of adaptive evolution, and positive selection signs of this gene were repeatedly detected in eusocial hymenopteran species (Kent et al., 2011;Morandin et al., 2014;Salmela et al., 2016).
Multiple roles for a single protein are also projected to lead to a gene duplication event and to favor the multiple acquired roles.
Besides the conventional Vg, three homologs called Vg-like proteins were recently discovered in ants (Morandin et al., 2014). These Vg homologs have arisen from an ancient gene duplication event.
Two of these homologs, Vg-like-A and Vg-like-B, can be found in all insect species studied, while Vg-like-C was so far only found in Hymenoptera (Kohlmeier et al., 2018;Morandin et al., 2014). These homologs exhibit differences in their conserved protein domains and have undergone rapid evolution after duplications (Morandin et al., 2014). Their role is currently unknown, but their structural variation suggests variable functions. In honey bee, Vg-like-A displays the closest structural and functional similarities to Vg and responded strongly to inflammatory and oxidative conditions, thus is likely associated with the aging process (Salmela et al., 2016). Vg-like-A also showed a strong temporal expression variation and may be involved in wintering worker longevity (Ricigliano et al., 2018). Furthermore, Vg-like-A is linked to the regulation of nursing behaviors in the ant Temnothorax longispinosus (Kohlmeier et al., 2018). During duplication, Vg-like-B lost several Vg structural elements, which may suggest that Vg-like-B may perform only few of the Vg original functions, such as coping with oxidative stress (Morandin et al., 2014).
Four protein domains (N-sheet, a-helical, vWFD, and polyserine linker) were found in Vg, while only the N-sheet was detected in Vg-like-C, potentially implying specialization, and could possibly be involved in neurobiological functions (Salmela et al., 2016).
Bumble bees are a group of insects belonging to the genus Bombus (Hymenoptera: Apidae). Bumble bees, honey bees, and stingless bees (Tetragonula) are phylogenetically close relatives (Peters et al., 2017). There are about 250 known bumble bee species belonging to 15 subgenera, mainly distributed in the northern hemisphere (Cameron et al., 2007;Williams et al., 2008). Bumble bees are often described as primitively eusocial because their social organization is simpler than that of the honeybee. Unlike honey bees or stingless bees, most bumble bee species have an annual cycle, with queens single-handedly founding nests (Goulson, 2010).
Bumble bees pass through several distinct phases during their annual life cycle, including solitary and eusocial phases. At the final stage of their colony cycle, termed the competition phase, the queen and workers will compete intensely over the production of males (Amsalem et al., 2015).
The fascinating life history and high levels of biological and ecological heterogeneity make bumble bees an outstanding model system for the study of molecular evolution. First of all, the biological and ecological characteristics of bumble bees can largely differ among the different subgenera (Williams et al., 2008). For instance, the tongue lengths of bumble bees are very diversified among different subgenera. Some subgenera such as Orientalibombus, Subterraneobombus, and Sibiricobombus which favor deep flowers have very long tongues, while others have relatively short tongues (Williams et al., 2008). Bumble bees are also extremely diversified in their habitats: for example, Mendacibombus and Alpinobombus species prefer alpine/arctic, while Orientalibombus generally use forest habitats. Most strikingly, there are obligate parasitic species in the subgenus Psithyrus which enslaves the species from other subgenera (Amsalem et al., 2015). To our knowledge, such extreme biological and ecological diversifications within a single genus have not been previously reported in honey bees or stingless bees. Also, honey bees and stingless bees are primarily tropical insects, with relatively stable environments, while bumble bees mainly occur in cool climates in general with more variable environments. Furthermore, bumble bees biological and ecological characteristics may also deviate among different species within a single subgenus. For example, in at least seven subgenera, the distribution elevation significantly varies among different species , leading to distinct genomic evolution rates among species (Lin et al., 2019).
Consequently, we hypothesize that Vg and Vg-like genes should be under distinct selection forces in bumble bees compared to honey bees or stingless bees.
To date, for most multi-species Vg analysis, bumble bee Vg sequences were compared with species from other genera (Du et al., 2019;Li et al., 2010;Salmela et al., 2016), probably due to the lack of sequence data from multiple bumble bee species. In this study, based on 27 bumble bee genomes, and combined with genomes of their phylogenetically close relatives (honey bees and stingless bees), we conducted the first large-scale molecular evolutionary analyses of bumble bee Vg and Vg-like genes to understand the selective patterns of Vg gene family in bumble bees.

| Discovery of Vg and Vg-like sequences
In addition to our nine newly sequenced Bombus species, previously published bumble bee genomes were added to our dataset (see Jackson et al., 2020;Kent et al., 2018;Lin et al., 2019;Sadd et al., 2015;Tian et al., 2019). As a result, a total of 27 bumble bee species belonging to 10 subgenera (2 to 5 species for each genus) were used for this study ( Figure 1; Table S1). The nucleotide sequences of Vg-like-B, XP_012245829.1; and Vg-like-C, XP_003489777.1). We F I G U R E 1 Phylogenetic relationships of the 27 bumble bee species involved in this study (red color, species whose genomes were sequenced in this study; blue color, obligate parasitic subgenus) Additionally, we aimed to retrieve Vg and Vg-like sequences from stingless bees (Tetragonula spp.) and honey bees (Apis spp.). These two genera are phylogenetically close relatives of Bombus (Peters et al., 2017) and five species of each genus (Tetragonula carbonaria, T. clypearis, T. davenporti, T. hockingsi, and T. mellipes; Apis cerana, A. dorsata, A. florea, A. laboriosa, and A. mellifera) were available with genomic resources in GenBank (Table S2). To do so, we obtained Vg and Vg-like sequences from GenBank for Apis mellifera (Vg, Vg-like-C, XP_001122505) and used them as queries to extract the other Apis species sequences using Exonerate with default settings.
In the same manner, the translated protein sequences of the four genes from B. impatiens and A. mellifera were used as queries to extract the corresponding sequences from the five Tetragonula species.
The Vg and Vg-like sequences from each species were aligned for each genus separately using ClustalW (Codons) program embedded in the software MEGA v10.1.7 (Kumar et al., 2018) with default settings and verified by visual inspection. The DNA sequence variations were calculated using DnaSP v 6.12.03 (Rozas et al., 2017).

| Molecular evolutionary analyses among different genera
We used the CODEML program in the PAML package v4.9j (Yang, 2007) to study the selection pressures affecting the different genes and to test for patterns of molecular evolution. Our aim was to estimate synonymous and nonsynonymous substitutions using the branch tests and site tests (Bielawski & Yang, 2005).
We firstly tested whether the overall selection of Vg and Vg-like genes of bumble bees deviated from honey bees and stingless bees.
The M0 (one-ratio) was used to estimate the overall selection (dN/dS, ratio of nonsynonymous / synonymous substitution rates) across all sites, and the alignment of each gene for each genus (Bombus, Apis, and Tetragonula) was separately used to calculate the dN/dS ratio of each genus. Next, we calculated the pairwise dN/dS ratios (Yang & Nielsen, 2000) across the species within each genus using the YN00 program from the PAML package. The values of pairwise dN/dS ratios of each gene were compared between Bombus and each of the other two genera using a Mann-Whitney rank test in SPSS v25.0.
The relationships between dN and dS and between dN/dS and dS were tested using a linear regression analysis in SPSS.

| Molecular evolutionary analyses within Bombus
We first tested whether differences of selection pressures exist among the ten bumble bee subgenera. The "several dN/dS ratios" branch model (BM) was used to estimate the dN/dS ratios of each of the ten subgenera independently. The ten external branches corresponding to the ten subgenera were viewed as different foregrounds, whereas all the internal branches were viewed as a common background (Figure 1). The likelihood ratio tests (LRTs) between M0 (null model) and BM models were conducted by comparing twice the difference in log-likelihood values (2ΔlnL) against a chi-square distribution (df = 2). The obtained ten dN/dS ratios of the bumble bee subgenera were compared across the four different genes using one-way ANOVA and paired samples t tests in SPSS. Moreover, we also used the branch-site model called Model A to test for positive selections in each subgenera. The null model for Model A is Model A1, which is a modify on Model A, but with ω2 = 1 fixed Zhang et al., 2005). In each run, one target subgenus was marked as the foreground branch while the remaining nine subgenera were viewed as backgrounds. Again, 2ΔlnL values between Model A and Model A1 were used to conduct LRTs for robustness with χ 2 test (df = 1).
Lastly, we studied the extent of selection for each Vg and Vglike gene set of Bombus individually by dividing the data into four data sets (one for each orthologous gene) and comparing the neutral model (M1a) with a model allowing positive selection (M2a). The 2ΔlnL values between the M1a and M2a models were used to test for robustness using LRTs with χ 2 test (df = 2), and positively selected sites were identified with the Bayes Empirical Bayes (BEB; Yang et al., 2005).
It should be mentioned that, in order to reduce false discovery rate, a Benjamini-Hochberg correction in R program ("p.adjust" command) was used where necessary (see below).
Coding sequences of the four Vg genes were successfully obtained for the 27 bumble bee species (File S1). The lengths of the aligned sequences (stop codons not considered) were 5,337, 4,569, 4,260, and 960 base pairs, respectively. Vg was the most variable in terms of nucleotide sequence identity among the four genes, with 2,389 (42.98%) variable nucleotide sites, including 39 indel codons. In contrast, Vg-like-B was the most conservative, with 11.56% variable nucleotide sites, and no indel was detected. Vg-like-A and Vg-like-C showed similar levels of genetic variation and Vg-like-C sequence included three indel codons (Table 1).
The sequences of the four genes from five stingless bee species and five honey bee species are also provided in File S1. Unlike bumble bees, stingless bees and honey bees Vg-like-C sequence was the most variable in terms of nucleotide sequence identity among the four genes, follows by Vg and Vg-like-A (Table 1). Consistent with the bumble bee sequences, Vg-like-B was the most conserved gene for both stingless and honey bees ( Table 1). The overall sequence variation information can be found in Table 1.  Table 3). The linear regression information is shown in Table 3 and Figure S2.

| Molecular evolution within Bombus
Similar as the M0 model, the branch model showed that Bombus con-

F I G U R E 3
The dN/dS ratios of Vg and Vg-like genes of bumble bees based on branch model  (Table S3).
The branch-site models as well as Benjamini-Hochberg correction (n = 10) showed that for nine of the ten  Table 6.

| D ISCUSS I ON
Vitellogenin is a multifunctional hemolymph protein that is char-  selection pressure on Vg might be slightly lower because Vg-related traits underlying adaptive evolution may differ between the genera.
Thus, our results suggest that duplication, positive evolution, and purifying selection may be major evolutionary forces driving Vg gene evolution across divergent taxa.
Vg is best known for its primary role in the formation of egg yolk in egg-laying animals (Tufail & Takeda, 2008); however, in social insects, Vg has probably acquired additional functions (Guidugli et al., 2005) and fulfills roles related to behavior and survival (Havukainen et al., 2011;Kent et al., 2011;Nelson et al., 2007;Seehuus et al., 2006). Functional pleiotropy plays an important role in molecular evolution (Paaby & Rockman, 2013 lantschouensis (Jedlicka et al., 2016;Zhen et al., 2018). This pattern is consistent with the fact that the positive selection that we detected Functional pleiotropy of a gene is also predicted to lead to a duplication event, such as the duplication of the ancestral gene leading to Vg and Vg-like genes (Morandin et al., 2014). Gene duplication is an important source of new genetic material for selection to act upon (Force et al., 1999;Lynch & Force, 2000;Ohno, 1970;Zhang, 2003). After duplication, the duplicated gene copy can acquire functions different from those of the ancestral gene (Gu et al., 2002;Khaladkar & Hannenhalli, 2012;Morandin et al., 2014;Wagner, 2000). Unlike Vg, the three Vg-like genes showed signs of purifying selection in all three genera. Multiple factors could be affecting the rate of sequence evolution such as the number of pleiotropic interactions, the gene expression levels, or their tissuespecific expression patterns, that could not be detected in this study. Further studies on evolutionary patterns of Vg-like genes across social insect species and on their functions are needed to fully understand their roles in social insects and the selection pressures they experience.
An exception to the otherwise overall positive selection on vitellogenin in our set of bumble bee species is the subgenus Psithyrus.
We found that the subgenus-based branch model analyses showed that almost all subgenera had a dN/dS ratio over 1. Incredibly however, the dN/dS ratio of the subgenus Psithyrus was much lower than the other subgenera (only 0.713). Moreover, the subgenusbased branch-site model analyses indicated that all subgenera but Psithyrus had significant deviations between Model A (alternative hypothesis) and Model A1 (null hypothesis). These results repeatedly indicate the distinctive evolution of Psithyrus Vg. Bumble bees in the subgenus Psithyrus have annual life cycles similar to those of typical bumble bee species, except that instead of founding their own nest and rearing workers, they steal a nest from "true" bumble bees (Goulson, 2010). Initially, these bumble bees were formerly described as a separate genus Psithyrus, but it is now widely accepted that they belong to a subgenus within Bombus, with the subgenus Thoracobombus as the sister group (Cameron et al., 2007;Williams et al., 2008). These bumblebees also exhibit social parasitism, with the absence of a worker caste, or the need to forage for nectar and pollen to provision developing larvae .
Although it is not clear whether the lower positive selection level of Psithyrus was a secondary evolutionary event, it is probable that their fundamentally different life history has influenced the evolution of Vg even within just one genus. In fact, due to the absence of caste differentiation in Psithyrus, it is conceivable that its simplified lifestyle reduced the functional pleiotropy demand on Vg and thus caused a lower dN/dS ratio.

ACK N OWLED G M ENTS
The research was supported by the Science and Technology

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The genome data (BioProject ID: PRJNA667279) are available from the Sequence Read Archive of the National Centre for Biotechnology Information (NCBI). The other data generated or analyzed during this study are included in this published article and its supplementary files.