A Proteomic- and Bioinformatic-Based Identification of Specific Allergens from Edible Insects: Probes for Future Detection as Food Ingredients

The increasing development of edible insect flours as alternative sources of proteins added to food and feed products for improving their nutritional value, necessitates an accurate evaluation of their possible adverse side-effects, especially for individuals suffering from food allergies. Using a proteomic- and bioinformatic-based approach, the diversity of proteins occurring in currently consumed edible insects such as silkworm (Bombyx mori), cricket (Acheta domesticus), African migratory locust (Locusta migratoria), yellow mealworm (Tenebrio molitor), red palm weevil (Rhynchophorus ferrugineus), and giant milworm beetle (Zophobas atratus), was investigated. Most of them consist of phylogenetically-related protein allergens widely distributed in the different groups of arthropods (mites, insects, crustaceans) and mollusks. However, a few proteins belonging to discrete protein families including the chemosensory protein, hexamerin, and the odorant-binding protein, emerged as proteins highly specific for edible insects. To a lesser extent, other proteins such as apolipophorin III, the larval cuticle protein, and the receptor for activated protein kinase, also exhibited a rather good specificity for edible insects. These proteins, that are apparently missing or much less represented in other groups of arthropods, mollusks and nematods, share well conserved amino acid sequences and very similar three-dimensional structures. Owing to their ability to trigger allergic responses in sensitized people, they should be used as probes for the specific detection of insect proteins as food ingredients in various food products and thus, to assess their food safety, especially for people allergic to edible insects.

for each of the modelled hexamerins from Apis mellifera, Galleria melonella, Locusta migratoria and Tenebrio molitor. Finally, hybrid models of hexamerins were built up from the different previous models. Similarly, the three-dimensional structures of odorant binding proteins (OBP) from Apis mellifera (PDB code 3S0D) [55], Anopheles gambiae (PDB code 3R1P) [63], AtraPBP1 from Amyelois transtyella (PDB code 4INW) [64], Antheraea polyphemus PBP1 (PDB code 2JPO) [65], Bombyx mori GOBP2 (PDB code 2WCJ) [56] and chemosensory protein 1 from Bombyx mori (PDB code 2JNT) [53]), were used as templates for the building of other OBP models from Locusta migratoria and Onthophagus taurus. Finally, hybrid models of were built up from the different previous models. Apolipophorin III from Locusta migratoria (code PDB 1AEP) [51], was used as a template to build the three-dimensional lodels of apolipophorin III from Acheta domesticus, Bombyx mori, Galleria melonella, Schistocerca gregaria, and Tenebrio molitor. Similarly, a single protein template, the crystal structure of p53 epitope-scaffold of a cysteine protease in complex with human MDM2 protein (5SWK) [66] available at the PDB, was used to build the 3D-models for the larval cuticle proteins (LCP) of Tenebrio molitor, Bombyx mori, Locusta migratoria, Musca domestica, and Tribolium castaneum (red flour beetle). PROCHECK [67], ANOLEA (Atomic NOn-Local Environment Assessment) [68], and the calculated QMEAN scores [69,70] were used to assess the geometric and thermodynamic qualities of the homology built three-dimensional models. Using ANOLEA, only a few residues essentially located in loops connecting the α-helices or β-sheets in the models, were found to exhibit an energy value over the fixed threshold. Similarly, the calculated QMEAN scores for all the models, gave values below 0.5.
The superposition of insect proteins was performed with Chimera [71]. Molecular cartoons were drawn with Chimera. The detection of potential cleavage sites to pepsin, trypsin and chymotrypsin, was performed at pH 1.3 for pepsin and pH 8.5 for trypsin/chymotrypsin, respectively, with the PeptideCutter web server (https://web. expasy.org/peptide_cutter/) [72], from the amino acid sequences of chemosensory proteins, odorant binding proteins, and apolipophorins III. Finally, the cleavage sites were represented on the molecular surface of the proteins, using Chimera.

Results
The nano-LC-MS/MS approach performed on insect protein extracts allowed the identification of a variable number of proteins, depending on the edible insects analyzed: 314 distinct proteins for Bombyx mori, 73 proteins for Locus migratoria, 62 proteins for Zophobas morio, and only 46 proteins for Acheta domesticus, and 42 proteins for Rhynchophorus ferrugineus. In a previous study, a similar approach allowed the detection of 106 distinct protein in protein extract from Tenebrio molitor.
As an example, (Table 1) shows the complete list of proteins identified in the silkworm (Bombyx mori) pupa protein extract. Table 1. List of proteins identified in the silkworm (Bombyx mori) pupae protein extract. Proteins are ranked by decreasing scores. Uncharacterized proteins and fragments from identical proteins were discarded from the list. Most of the potential allergens frequently distributed in edible insects, correspond in fact to IgE-binding cross-reactive allergens that occur in other groups of arthropods (acari, crustaceans), mollusks, and nematods. However, a few potential allergens including apolipophorin III, the chemosensensory protein, the coackroach allergen-like protein, hexamerin, the larval cuticle protein, the odorant binding protein and the receptor for activated protein kinase, appear as being apparently most specifically distributed in insects ( Table 2).
Fatty acid-binding protein Fructose-1,6-biphosphate aldolase Myosin heavy chain + + X X X X X X X X Chemosensory proteins, odorant binding proteins and hexamerins, emerge as three groups of proteins essentially distributed in insects (≥98%), together with apolipophorins III, larval cuticle proteins and receptors for activated protein kinase, which are preferentially distributed in insects (80-85%) ( Table 3). The cockroach allergen-like protein holds a unique place since its apparently occurs only in the yellow mealworm (Tenebrio molitor). Chemosensory proteins (CSP), consist of small globular proteins of 110-120 amino acids, built from 6-7 α-helices connected by short loops. They usually contain four cystein residues forming two adjacent disulfide bridge, which contribute to the tight packing of the protein ( Figure 1A).  Chemosensory proteins from different insect species, exhibit rather conserved amino acid sequences, especially at the N-terminal end of the polypeptide chain whereas their C-terminal end appears as less conserved ( Figure 2). However, in spite of these amino acid sequence discrepancies, CSP from different insects display very similar structural organizations that are readily superposable ( Figure 3).

), Rhynchophorus ferrugineus (Rhynchophorus
TKCKKCSDKQKVIFDKVITWFEENDKETWKVILAKSINEHVNVRSRRS--------- Chemosensory proteins from different insect species, exhibit rather conserved amino acid sequences, especially at the N-terminal end of the polypeptide chain whereas their C-terminal end appears as less conserved ( Figure 2). However, in spite of these amino acid sequence discrepancies, CSP from different insects display very similar structural organizations that are readily superposable ( Figure 3).
However, in spite of these amino acid sequence discrepancies, CSP from different insects display very similar structural organizations that are readily superposable ( Figure 3).
Odorant binding proteins, also known as pheromone-binding proteins, are small globular proteins with structure very similar to that of CSP, that have been recognized as potential IgE-binding proteins in yellow mealworm extracts. They typically consist of small polypeptide chains of about 120-130 amino acid residues (13-14 kDa) built up from 6 α-helices tightly packed by 3 conserved disulfide bridges ( Figure 1B).
Despite a very conserved three-dimensional structure, they differ by their amino acid sequences which show a low degree of both identity and similarity ( Figure 4). Accordingly, their three-dimensional core structures are readily superposable ( Figure 5).
Hexamerin consists of an insect storage protein synthesized in body fat, resulting from the non-covalent oligomerization of protomers exhibiting a hemocyanin-like domain ( Figure 1E). Their amino acid sequences exhibit a high degree of identity and homology ( Figure 6), which is in accordance with their very structurally conserved character. In this respect, hexamerins from different edible insects exhibit nicely superposed threedimensional core structures, with the exception of a few exposed loops, the conformation of which differs from one structure to another (Figure 7). Chemosensory proteins from different insect species, exhibit rather conserved amino acid sequences, especially at the N-terminal end of the polypeptide chain whereas their C-terminal end appears as less conserved ( Figure 2). However, in spite of these amino acid sequence discrepancies, CSP from different insects display very similar structural organizations that are readily superposable (Figure 3). -----EDYTTKYDDMDIDRILQNGRILTNYIKCMLD--EGPCTNEGRELKKILPDALS Bombyx m.
Odorant binding proteins, also known as pheromone-binding proteins, are small globular proteins with structure very similar to that of CSP, that have been recognized as potential IgE-binding proteins in yellow mealworm extracts. They typically consist of small polypeptide chains of about 120-130 amino acid residues (13-14 kDa) built up from 6 α-helices tightly packed by 3 conserved disulfide bridges ( Figure 1B).
Despite a very conserved three-dimensional structure, they differ by their amino acid sequences which show a low degree of both identity and similarity (Figure 4). Accordingly, their three-dimensional core structures are readily superposable ( Figure 5).
Odorant binding proteins, also known as pheromone-binding proteins, are small globular proteins with structure very similar to that of CSP, that have been recognized as potential IgE-binding proteins in yellow mealworm extracts. They typically consist of small polypeptide chains of about 120-130 amino acid residues (13-14 kDa) built up from 6 α-helices tightly packed by 3 conserved disulfide bridges ( Figure 1B).
Despite a very conserved three-dimensional structure, they differ by their amino acid sequences which show a low degree of both identity and similarity (Figure 4). Accordingly, their three-dimensional core structures are readily superposable ( Figure 5).   Hexamerin consists of an insect storage protein synthesized in body fat, resulting from the non-covalent oligomerization of protomers exhibiting a hemocyanin-like domain ( Figure 1E). Their amino acid sequences exhibit a high degree of identity and homology ( Figure 6), which is in accordance with their very structurally conserved character. In this respect, hexamerins from different edible insects exhibit nicely superposed three-dimensional core structures, with the exception of a few exposed loops, the conformation of which differs from one structure to another (Figure 7).    As previously reported, other less specific potential allergens such as apolipophorin III and larval cuticle protein, exhibit rather well conserved amino acid sequences but share a highly conserved three-dimensional structure. As an example, the multiple alignment As previously reported, other less specific potential allergens such as apolipophorin III and larval cuticle protein, exhibit rather well conserved amino acid sequences but share a highly conserved three-dimensional structure. As an example, the multiple alignment of amino acid sequences of apolipophorin III from different edible insects shows a moderate degree of identity and similarity (Figure 8), even though their three-dimensional structures are nicely superposed (Figure 9). of amino acid sequences of apolipophorin III from different edible insects shows a moderate degree of identity and similarity (Figure 8), even though their three-dimensional structures are nicely superposed (Figure 9).  In addition, it is noteworthy that different isoforms often exist for the specific allergens indentified in Bombyx mori, Locusta migratoria and Tenebrio molitor, as shown from the corresponding genome assemblies available for the three edible insects (Table 4). Depending on the potential allergens, the number of identified isoforms shows important variations. In this respect, proteins involved in recognition function towards environmen-  of amino acid sequences of apolipophorin III from different edible insects shows a moderate degree of identity and similarity (Figure 8), even though their three-dimensional structures are nicely superposed (Figure 9).  In addition, it is noteworthy that different isoforms often exist for the specific allergens indentified in Bombyx mori, Locusta migratoria and Tenebrio molitor, as shown from the corresponding genome assemblies available for the three edible insects (Table 4). Depending on the potential allergens, the number of identified isoforms shows important Figure 9. Superposition of the ribbon diagrams of apolipophorin III from Acheta domesticus (blue), Bombyx mori (green), Locusta migratoria (red), Galleria mellonella (purple), Schistocerca gregaria (yellow), and Tenebrio molitor (orange).
In addition, it is noteworthy that different isoforms often exist for the specific allergens indentified in Bombyx mori, Locusta migratoria and Tenebrio molitor, as shown from the corresponding genome assemblies available for the three edible insects (Table 4). Depending on the potential allergens, the number of identified isoforms shows important variations. In this respect, proteins involved in recognition function towards environmental factors such as chemosensory proteins and odorant binding proteins, exhibit the higher diversity. This extreme diversity of both groups of chemosensory proteins and odorant binding proteins in insects, has been known for a long time. Moreover, a bioinformatics identification of the potential cleavage sites for pepsin and trypsin on the molecular surface of chemosensory proteins, odorant binding proteins, and apolipophorin III, suggests the occurrence of a great number of exposed cleavage sites to trypsin and chymotrypsin, compared to the reduced number of cleavage sites accessible to pepsin ( Figure 10). Accordingly, these insect protein allergens are suspected to exhibit an enhanced resistance to the proteolytic attack by pepsin and other aspartic proteases, whereas they should be further degraded at alcaline pH, in the presence of trypsin and trypsin-like proteases.  Moreover, a bioinformatics identification of the potential cleavage sites for pepsin and trypsin on the molecular surface of chemosensory proteins, odorant binding proteins, and apolipophorin III, suggests the occurrence of a great number of exposed cleavage sites to trypsin and chymotrypsin, compared to the reduced number of cleavage sites accessible to pepsin ( Figure 10). Accordingly, these insect protein allergens are suspected to exhibit an enhanced resistance to the proteolytic attack by pepsin and other aspartic proteases, whereas they should be further degraded at alcaline pH, in the presence of trypsin and trypsin-like proteases.

Discussion
Using an appropriate combination of SDS-PAGE and nano-LC-MS/MS proteomic analyses of proteins extracts from various insect species including the crickets Acheta domesticus and Locus migratoria (Orthoptera), the silkworm Bombyx mori (Hymenoptera), and the mealworms Rhynchophorus ferrugineus, Tenebrio molitor and Zophobas morio (Coleoptera), we have revealed the great variety of proteins occurring in insect extracts. Depending on the insect species, the number of identified proteins may vary considerably, from up to 314 distinct proteins for the Bombyx mori extract to only 42 proteins for the

Discussion
Using an appropriate combination of SDS-PAGE and nano-LC-MS/MS proteomic analyses of proteins extracts from various insect species including the crickets Acheta domesticus and Locus migratoria (Orthoptera), the silkworm Bombyx mori (Hymenoptera), and the mealworms Rhynchophorus ferrugineus, Tenebrio molitor and Zophobas morio (Coleoptera), we have revealed the great variety of proteins occurring in insect extracts. Depending on the insect species, the number of identified proteins may vary considerably, from up to 314 distinct proteins for the Bombyx mori extract to only 42 proteins for the Rhynchophorus ferrugineus extract. These discrepancies observed in (2) the very limited availability of genome sequencing data for edible insects and, (3) possible variations in the protein content and isoform diversity among different insect species. In this respect, the more complete protein data were obtained for the well known insects Bombyx mori (161 distinct proteins identified), Locusta migratoria (73 distinct proteins identified) and Tenebrio molitor (106 distinct proteins identified), for which the genome sequencing data are available.
Insect allergens responsible for either contact allergies or food allergies, have been reviewed in detail by de Gier & Verhoeckx [31]. By reference to this review, it appears that most of the potential edible insect allergens consist of pan-allergens widely distributed in other arthropods, acari, chelicerates (spiders) and crustaceans, mollusks and nematods. As such, they are far from being usable as probes to detect the occurrence of insect flour in food products, due to the lack of specificity. Hopefully, a few insect allergens have been characterized as rather specific for (edible) insects since they primarily occur in insects and are much less abundant or lacking in other organisms phylogenetically-related to insects. These specific insect allergens essentially correspond to proteins dedicated to the recognition of environmental chemical signals, such as the chemosensory proteins and the odorant-or pheromone-binding proteins [73], Bla g 3 from the German cockroach (Blattella germanica) [74], and Per a 3 from the American cockroach (Periplaneta americana) [75], hexamerin from the edible cricket Gryllus bimaculatus [76], hexamerin from the maggot fly Caliphora erythrocephala [77], hexamerin from the fruit fly Drosophila melanogaster [78], and hexamerin from the yellow mealworm Tenebrio molitor [79]. Except for CSP and OBP, which usually are poorly glycosylated, other specific allergens of insects contain N-glycosylation sites and apparently consist of glycosylated proteins. Due to the different N-glycosylation pathways and linkage of sugar units in the insect oligosaccharides, compared to that occurring in human glycans [80,81], specific allergens from insects should act as non self CDD, responsible for some non specific immunologic-reactivity.
In spite of sharing poorly conserved amino acid sequences, all these allergens exhibit extremely well conserved three-dimensional structures. Accordingly, all these proteins are sufficiently closely-related to display an IgE-binding cross-reactivity allowing their use as specific probes for insects. Additionally, most of the identified insect specific allergens are built from a tightly packed structural fold, strengthened by disulfide bonds, that should enhance their resistance to heat denaturation susceptible to occur during the transformation processes of insect flour-containing food products. Join to their predicted resistance to acidic proteases susceptible to occur in foods and food products, all these features favor their use as particularly stable specific probes for the detection of insect flour in food products. In this respect, both processing and in vitro digestion of three mealworm species including Tenebrio molitor, Zophobas atratus, and Alphitobius diaperinus, and Gryllus bimaculatus, was reported to readily influenced their allergenic cross-reactivity [82][83][84].
With the exception of the receptor for activated protein kinase, other proteins have been previously identified as IgE-binding allergens, and could therefore be used as specific immuno-probes for the detection of added insects or insect flours to food products, in complement to other detection methods such as the genomic detection of insect proteins using polymerase chain reactions (PCR) [45] or DNA barcoding authentication [85]. Recently, functional biological bioassays based on tropomyosin as immuno-probes, have been proposed to assess the tropomyosin allergenicity of novel animal foods [86]. However, even though its allergenic character has been demonstrated [87], the cockroach-like allergen protein, which is apparently restricted to the yellow mealworm, could not serve as a relevant immuno-probe for the detection of insect proteins.
Moreover, beside the safety aspect associated to the consumption of edible insect and insect products, other important aspect dealing with the use of edible insects as food and feed, and as a protein ingredient added to improve the nutritional balance of various food products, should be considered [88,89]. Especially, the ecological and legislation aspects deserve to be discussed. Facing the increasing demand for animal proteins, edible insects and insect proteins were seen early on as a sustainable source of proteins whose production was susceptible to cause less negative impact on the environment, compared to other sources of animal proteins, e.g., the conventional forms of livestock [90,91]. Compared to the conventional sources of animal proteins, the production of insect proteins at an industrial scale, requires less energy use, releases less greenhouse gas and uses less surface area. However, while the industrial-scale production and processing of microbially-and parasitically-safe insect proteins is apparently assured, their production at a reasonable price remains a current challenge in comparison to meat or plant proteins [92,93]. In addition, outside of countries where entomophagy is traditionally developed, the reluctancy to consume edible insect often observed in other countries, especially in european countries and in USA, could be an obstacle to the development of the industrial farming of edible and insect-containg food products [94][95][96][97].
As part of the safety assessment of novel foods, two scientific opinions have been recently published by EFSA (European Food Safety Authority) on the "risk profile of insects as food and feed" [98], and on the "safety of dried yellow mealworm (Tenebrio molitor larva) as a novel food pursuant to Regulation (EU) 2015/2283" [99]. Both opinions, intended to help europeans politicians to put in place a regulation for the authorization of edible insects on the european market, mention the allergenic risk associated to the consumption of edible insects for allergic people. They point out the risk of reactions to either insect-specific allergens, allergens cross-reacting with other arthropods, or contaminant allergens from insect feeding such as gluten.

Conclusions
Potential IgE-binding allergens identified in edible insects, correspond essentially to pan-allergens developing some IgE-binding cross-reactivity with other homologous proteins present in other arthropods (acari, crustaceans), mollusks and nematods. Owing to this lack of specificity, they are not suitable for being used as relevant probes for the specific detection of insect flours, added as ingredients to different food products. However, a few other proteins occurring in insect protein extracts, emerge as specific allergens essentially distributed in insects whilst they are much less abundant or even lacking, in other phyllogenetically-related organisms such as acari, crustaceans, molluks and nematods.
These specific insect allergens include chemosensory proteins (CSP), odorant or pheromone-binding proteins (OBP), and hexamerin, the main storage protein of insect fat bodies. Three other proteins, apolipophorin III, the larval cuticle protein, and the receptor for activated protein kinase, could be used as specific probes since they are preferentially distributed in insects and, to a much more lesser extent, in crustaceans and nematods. The cockroach allergen-like protein is apart, because it only occurs in the yellow mealworm (Tenebrio molitor).
With the exception of the receptor for activated protein kinase, other proteins have been previously identified as IgE-binding allergens, and could therefore be used as specific immuno-probes for the detection of added insects or insect flours to food products, in complement to other detection methods such as the genomic detection of insect proteins using polymerase chain reactions (PCR) or the DNA barcoding authentication.