Giardia duodenalis in Wildlife: Exploring Genotype Diversity in Italy and across Europe

Fragmented data are so far available on genotype diversity of G. duodenalis in wildlife in different countries in Europe, in particular, in Italy. In the present study, G. duodenalis sequences obtained from different Italian wild animals [12 porcupines (Hystrix cristata), 4 wild boars (Sus scrofa), 1 wolf (Canis lupus italicus), 6 Alpine chamois (Rupicapra rupicapra rupicapra)] were compared with those available from wild host species in Europe to add new data on the geographic distribution of Giardia assemblages/sub-assemblages and their transmission patterns among natural hosts. Thirty-eight sequences were obtained by MLG analysis (SSU-rRNA, bg, gdh, and tpi genes) and subsequently compared by phylogenetic and network analyses with those from wild species monitored in the last decades in Europe. The results revealed the presence of potentially zoonotic (A-AI, A-AII from wild boar; B from porcupine) and host-adapted (D from wolf; E, A-AIII from chamois) assemblages and sub-assemblages and represent the first report for Italian wild boar. The analysis did not find any evidence of spatial or host segregation for specific genetic variants, mostly shared between different hosts from different European countries. However, conflicting evidence was found in genotypic assignment, advocating for data improvement and new genomic approaches.


Introduction
Parasites in wildlife are a significant component of biodiversity, and their life cycles depend on the ecological networks in which they live [1][2][3], representing fundamental elements of healthy ecosystems [4,5]. However, many parasites can be a risk for threatened species because of their impacts on host populations, occasionally causing severe population declines and thus affecting conservation efforts [6,7]. In the last decades, considerable advances have been achieved in the knowledge of parasites in wildlife and on their complex interactions with the hosts, but comprehensive pictures of taxa distribution and prevalence within specific epidemiological contexts are frequently missing.
Giardia Kunstler, 1882 is one of the most common intestinal protozoa, transmitted through fecal deposition of cysts by an infected host. Transmission of this parasite via contaminated water and/or food to other hosts, such as domestic animals or humans, may

Genotyping of the Italian Samples at the SSU-rRNA Locus
Sequence analysis of the SSU-rRNA gene fragment allowed to assign all the 23 isolates to their respective G. duodenalis assemblages (see Table 1). As expected for this locus, no differences among the sequences here analyzed were observed within each assemblage. Seven sequences [four wild boar (SS1-SS4) and three Alpine chamois (RR7, RR8, RR12)] were identified as assemblage A, showing 100% identity (174/174; 100% query coverage) to Giardia isolates from a variety of hosts and localities [e.g., cat in Denmark (GenBank accession number: MN263894), human in Colombia (GenBank accession number: MG924430), a dog from Malaysia (GenBank accession number: KJ027399)]. Assemblage B was represented by 12 sequences identified in all isolates from porcupines (HC1-HC12), showing 100% identity (175/175; 100% query coverage) with many sequences worldwide (e.g., GenBank accession number: MG924431 from a human in Colombia and GenBank accession number: LC341260 from a cat in Japan).
Comparison of the nucleotide sequence from the wolf (CL1) revealed 100% of identity (175/175; 100% query coverage) to Giardia assemblage D from a dog in Australia (AF199443), whereas the sequences from two Alpine chamois (RR6, RR9) showed to be identical (174/174; 100% query coverage) with assemblage E, from a goat in Australia (GenBank accession number: AF199448), from a dairy cattle in China (GenBank accession number: MN593002), and from a lamb in Ethiopia (GenBank accession number: KT922264). Finally, a mixed A+E infection was detected, based on overlapping peaks at the diagnostic positions, in one Alpine chamois (RR11). Table 1. Materials analyzed in this study, with details on the host species, classification, and country of isolation as reported in GenBank or in references. Sequences retrieved from GenBank (Dataset I-III) were used for phylogenetic and network analyses. The symbol " indicates that information is as above in the same column.

Gdh, Bg and Tpi Phylogenetic Analysis
Fifteen new sequences from 10 samples were successfully obtained for gdh, tpi, and bg and compared for phylogenetic purposes with previously published data from different wild hosts in Europe ( Figure 1).
Comparison of the nucleotide sequence from the wolf (CL1) revealed 100% of identity (175/175; 100% query coverage) to Giardia assemblage D from a dog in Australia (AF199443), whereas the sequences from two Alpine chamois (RR6, RR9) showed to be identical (174/174; 100% query coverage) with assemblage E, from a goat in Australia (GenBank accession number: AF199448), from a dairy cattle in China (GenBank accession number: MN593002), and from a lamb in Ethiopia (GenBank accession number: KT922264). Finally, a mixed A+E infection was detected, based on overlapping peaks at the diagnostic positions, in one Alpine chamois (RR11).

Gdh, Bg and Tpi Phylogenetic Analysis
Fifteen new sequences from 10 samples were successfully obtained for gdh, tpi, and bg and compared for phylogenetic purposes with previously published data from different wild hosts in Europe ( Figure 1). In the phylogenetic trees achieved using the three genetic markers (see Figure 2a-c), most of the isolates here analyzed belonged to the assemblage A and B, as detailed below.
Molecular analysis of partial gdh sequences ( Figure 2a; Table 1) revealed well-defined clades describing the different genetic assemblages/sub-assemblages (see Figure 2a). The Clade I, characterized by two sub-clades (green: sub-assemblage AI; red: sub-assemblage AII) included 18 sequences; in particular, our isolate from wild boar (SS1), together with other isolates from ruminants (Italian Alpine chamois, two roe deer from Poland, and one fallow deer from Sweden), was identified as assemblage A-AI. Clade II (blue clade) was well supported (>80 of bootstrap value) and grouped our Alpine chamois (RR8) with 15 isolates from different hosts and countries including Italy, all belonging to assemblage A-AIII. The remaining clades defined the other assemblages B, D, E, and G (all strongly supported, >90 of bootstrap values).

Haplotype Variability
Thirty-nine different haplotypes were identified at the gdh, bg, and tpi loci; in particular, the bg locus showed the higher haplotype diversity (Hd). Haplotype variability results are displayed in Tables S1-S4 and visualized as networks representation (Figure 3ac).
As for the gdh locus (Dataset I), nine haplotypes (hp1-hp9) were detected. In particular, haplotype 6 (hp6), recognized within assemblage A, had the highest frequency, followed by haplotype 4 (hp4), just one mutation step from hp6. Hp6 was detected in Italy and Croatia from fallow deer and wild boar, respectively, and corresponds to sub-assemblage AIII.
The Median-Joining Network (Figure 3a) confirmed all the haplotypes retrieved from present study were also shared with others European countries such as Poland, Sweden, and the Netherlands. Indeed, the Alpine chamois isolate from the present study (RR8) clustered together with two red deer isolates from Poland, one Apennine chamois isolate from Italy, and one roe deer isolate from the Netherlands in hp4; while our wild boar isolate SS1 clustered together with two roe deer isolates from Poland, one Apennine chamois isolate from Italy, and one fallow deer isolate from Sweden in haplotype 5 (hp5).
The analysis of Dataset II (bg) revealed the presence of 16 different haplotypes (hp1-hp16). Haplotype 10 (hp10) showed the highest frequency, followed by haplotype 6 (hp6). They belong respectively to assemblage A and B. Hp10 was detected in Poland, Sweden, Norway, and Spain in roe deer, fallow deer, fox, reindeer, and moose. Hp6 was detected When analyzing the bg gene, the wild boar isolate (SS3) was assigned to the hostspecific assemblage D (max value of bootstrap), clustering with the corresponding bg gene sequence obtained from the Nyctereutes procyonoides isolate NP1 (GenBank accession number: HQ538708) and from Canis lupus (wolf) isolate 22 (GenBank accession number: KF736103), both detected in Poland (differing by one single-nucleotide polymorphism due to overlapping peaks). The taxonomic assignment of the other isolates was more difficult, due to the low bootstrap values: the two isolates from wild boar (SS1 and SS4) and the isolate from Alpine chamois (RR8) grouped together, being similar but not identical to sequences described as assemblage A-AII (GenBank accession number: AY072723, AY072724), presenting two single-nucleotide polymorphisms (SNPs), in contrast to the results obtained by gdh and tpi phylogenetic analysis (Figure 2b). Finally, we were able to assign all isolates from porcupines (HC1, HC2, HC4, HC8, and HC11) to the assemblage B (black clade, bootstrap value: 79) together with isolates from different hosts and countries; however, due to the low resolution capability of this marker to discriminate between the sub-assemblages BIII and BIV, it was not possible to identified these isolates at the sub-assemblage level.
Concerning tpi analysis, we were able to assign our isolates to well-defined clusters (bootstrap values 70-100, see in detail Figure 2c): in particular, the wild boar SS1 grouped within the assemblage A-AI, as observed in gdh phylogenetic analysis, and one porcupine (HC6) was assigned to the assemblage A-AII together with red deer and fallow deer from Croatia and Sweden. The other two Giardia porcupine isolates (HC1 and HC2) grouped within the assemblage B-BIV with red foxes and a rat from Sweden and the Canary Island, respectively.

Haplotype Variability
Thirty-nine different haplotypes were identified at the gdh, bg, and tpi loci; in particular, the bg locus showed the higher haplotype diversity (Hd). Haplotype variability results are displayed in Tables S1-S4 and visualized as networks representation (Figure 3a-c).
Finally, the analysis of Database III (tpi) showed 14 different haplotypes (hp1-hp14). Haplotype 13 (hp13), belonging to assemblage G, revealed the highest frequency together with haplotype 1 (hp1), from assemblage A. Hp13 was detected in the Canary Islands from rodents (rats and mice); instead, hp1 was recognized in red deer and fallow deer from Poland and Italy, respectively.
The Median-Joining Network (Figure 3c) indicated four well distinct clusters: one including assemblage B, one including assemblages A and E, and two different clusters describing assemblage G, indicating a high internal diversity.
Isolates from the present study clustered individually as different haplotypes, i.e. the wild boar isolate (SS1) in haplotype 3 (hp3) and the isolates from crested porcupines (HC1, HC2, HC6) in haplotype 4 (hp4), haplotype 5 (hp5), and haplotype 12 (hp12), respectively.  Haplotypes are represented by circles proportional to relative haplotypes abundance; different colors indicate different areas of origin (see Figure 1 for more details). Numbers in brackets refer to the mutational steps between haplotypes. Black circles represent hypothetical missing haplotypes predicted by the model. Hosts from the present study are represented as black silhouettes. Correspondence between different haplotypes and isolates is indicated in Tables S2-S4.

Discussion
There is now strong evidence that parasites produce a significant impact on wildlife population dynamics, acting as evolutionary drivers of traits such as host genetic and Haplotypes are represented by circles proportional to relative haplotypes abundance; different colors indicate different areas of origin (see Figure 1 for more details). Numbers in brackets refer to the mutational steps between haplotypes. Black circles represent hypothetical missing haplotypes predicted by the model. Hosts from the present study are represented as black silhouettes. Correspondence between different haplotypes and isolates is indicated in Tables S2-S4. As for the gdh locus (Dataset I), nine haplotypes (hp1-hp9) were detected. In particular, haplotype 6 (hp6), recognized within assemblage A, had the highest frequency, followed by haplotype 4 (hp4), just one mutation step from hp6. Hp6 was detected in Italy and Croatia from fallow deer and wild boar, respectively, and corresponds to sub-assemblage AIII.
The Median-Joining Network (Figure 3a) confirmed all the haplotypes retrieved from present study were also shared with others European countries such as Poland, Sweden, and the Netherlands. Indeed, the Alpine chamois isolate from the present study (RR8) clustered together with two red deer isolates from Poland, one Apennine chamois isolate from Italy, and one roe deer isolate from the Netherlands in hp4; while our wild boar isolate SS1 clustered together with two roe deer isolates from Poland, one Apennine chamois isolate from Italy, and one fallow deer isolate from Sweden in haplotype 5 (hp5).
The analysis of Dataset II (bg) revealed the presence of 16 different haplotypes (hp1-hp16). Haplotype 10 (hp10) showed the highest frequency, followed by haplotype 6 (hp6). They belong respectively to assemblage A and B. Hp10 was detected in Poland, Sweden, Norway, and Spain in roe deer, fallow deer, fox, reindeer, and moose. Hp6 was detected in Luxembourg, Poland, and Germany in different hosts such as wild cat, wild boar, roe deer, red deer, and raccoon.
The Median-Joining Network (Figure 3b) showed haplotype 5 (hp5), hp6, and hp10 were abundantly shared between different European countries, without evidence of spatial segregation; in addition, haplotypes retrieved from assemblage B had the highest variability.
Finally, the analysis of Database III (tpi) showed 14 different haplotypes (hp1-hp14). Haplotype 13 (hp13), belonging to assemblage G, revealed the highest frequency together with haplotype 1 (hp1), from assemblage A. Hp13 was detected in the Canary Islands from rodents (rats and mice); instead, hp1 was recognized in red deer and fallow deer from Poland and Italy, respectively.
The Median-Joining Network (Figure 3c) indicated four well distinct clusters: one including assemblage B, one including assemblages A and E, and two different clusters describing assemblage G, indicating a high internal diversity.

Discussion
There is now strong evidence that parasites produce a significant impact on wildlife population dynamics, acting as evolutionary drivers of traits such as host genetic and phenotypic diversity. However, much of the research is focused on determining the role of wildlife populations as reservoir of parasitic diseases for domestic animals and humans. G. duodenalis is a ubiquitous enteric protozoan able to infect a broad range of vertebrates including humans, with different degrees of host specificity depending on the genetic identity of the isolates [9]. In the present study, G. duodenalis sequences from different wild animals in Italy were identified and compared with those available from other wild host species in Europe (last update, December 2021).
In Italy, few data are so far available on G. duodenalis genotypes diversity in wildlife. The genotyping results obtained in the present study include the first report in Italian wild boar and, along with those previously described by Coppola et al. [21], represent the only data on G. duodenalis genotypes in Italian wild rodents.
Sequencing at the SSU-rRNA locus indicated that assemblage A was the only detected in wild boar. However, conflicting evidence was found in assemblage/sub-assemblage assignment. As observed by the phylogenetic analysis, the wild boar isolate SS1 was assigned to sub-assemblage AI at both gdh and tpi loci but to sub-assemblage AII at the bg locus, while the isolates SS3 was assigned to the canid-specific assemblages D. Ambiguous results were observed also for isolates from other hosts. Among Alpine chamois, the assemblages A and E, commonly associated with wild hoofed mammals, were found at the SSU-rRNA locus, and the sub-assemblage AIII at the gdh locus was identified in one isolate (RR8). However, this result was not confirmed at the bg locus, which identified the same isolate as sub-assemblage AII, mostly found in humans. The sub-assemblage AIII has been almost exclusively observed in wild ruminants, especially in species within the Family Cervidae [9]. It was described for the first time in fallow deer from northern Italy [18,19] and in Apennine chamois from central Italy [20]. For porcupines, even though sequencing at the SSU-rRNA locus assigned all isolates to assemblage B, the detection of assemblage A-AII was obtained for one isolate at the tpi locus, while bg analysis failed to discriminate other samples between the BIII and BIV sub-assemblages. Lack of concordance is frequently observed in Giardia genotyping, mainly due to the presence of genetic differences at a genetic locus (ASH, allelic sequence heterozygosity) within isolates and the frequent occurrence of infections with mixed assemblages [9]. As for carnivores, assemblage D, identified in the present study in wolf from the Apennines of North-Central Italy together with the isolates from wolf from Apennines of South-Central Italy (sequence homology 99% with assemblage C) [33], confirmed the circulation of the canid-specific assemblages C and D in this wild species in Italy.
Several studies reported data on assemblage/sub-assemblage identity of G. duodenalis in wild animals in Europe. Assemblage G is predominant in small rodents, mainly rats, as reported from the Canary Island (Spain) and Sweden [22,25], while in Germany, few isolates from mice and voles were found positive for the potentially zoonotic assemblages A and B [17]. In wild hoofed animals from Norway, Poland, Croatia, Romania, Sweden, and The Netherlands, both assemblages A and B were found to a large extent in cervids (e.g., red deer, roe deer, fallow deer, moose, and reindeer) as well as in wild boar [13][14][15]23,26], while the "classical" bovine specific assemblage E was rarely detected in these hosts [24,25]. Canid-specific assemblages C and D have been identified in wolves and raccoon dogs in Croatia, Romania, and Poland [14,15,24,29]. However, among carnivores, also assemblages A and B have been seen in wolves, red foxes, and jackal in Norway, Sweden, and Croatia [14,16,27] and in a free-living European wildcat from Luxembourg [30].
The phylogenetic analysis performed in the present study allowed better depicting the taxonomic position of the Italian isolates, compared with those identified in European wildlife. The genetic variants reported in Italy were found in several countries and, although some clustering relating to the host was observed, all isolates are shared among wildlife of different geographical origins. Differences in local environmental and ecological conditions (e.g., sharing of pastures with domestic animal, possible contacts with humans, access to urban environments) might have a role in the distribution pattern of G. duodenalis assemblages in wildlife.
These findings are consistent with results obtained by haplotype analysis. Different haplotypes were shared between diverse wild hosts from several European countries. This variability is mostly represented by assemblage B, confirming its complex genetic diversity [34]. The occurrence of the most represented haplotypes for assemblages A and B, without any evidence of host or spatial segregation, should be considered for a hypothetical analysis of zoonotic transmission risk.
Our analysis of genotypes diversity of Giardia from different hosts and countries allowed taking a step forward for a better knowledge of the geographical distribution and spatial patterns of G. duodenalis genotypes in Italian and European wildlife. No novel potentially host-specific genotypes were found in the wildlife surveyed. The results suggest that Giardia populations in wild animals are well established and mixed throughout different countries, as previously indicated [35,36]. As an example, to date, the unique Giardia sequence available from porcupine worldwide (isolate 10707 from New Zealand, GenBank accession number: KY124019) was identified as assemblage B, along with our isolates from Italy.
However, genetic variation still appears underestimated due to data biases (e.g., low sampling, neglected sequence submission to genetic databases, different amplification targets, low phylogenetic consensus of the current genotyping genes) [9,37]. New genomic approaches are advocated to accurately genotype G. duodenalis at the assemblage/subassemblage level and to assess its transmission routes reliably.

Sampling
In the period 2014-2019, 23 fecal samples tested positive to the presence of Giardia spp. cysts by a direct immunofluorescence assay (Merifluor ® Meridian Diagnostic, Cincinnati, OH, USA); they were obtained from 12 crested porcupines (Hystrix cristata) collected in the province of Pisa, 4 wild boars (Sus scrofa) and 1 wolf (Canis lupus italicus) originating from the province of Pistoia (Tuscany, Central, Italy), and 6 Alpine chamois (Rupicapra rupicapra rupicapra) harvested in the Italian Central Alps. All samples were subjected to molecular and phylogenetic analyses for assemblage/sub-assemblage identification.

Assemblage and Sub-Assemblage Identification
Genomic DNA was isolated using the QIAmp DNA stool mini kit (QIAGEN, Valencia, CA, USA) following the manufacturer's instructions. All samples were amplified by twostep nested PCR protocols using four sets of primers targeting different loci, according to previous descriptions: (i) RH4 and RH11 for primary PCR and GiarR and GiarF for secondary PCR [38] designed to amplify a 130 bp fragment of the small subunit 18S (SSU rRNA) gene; (ii) external forward primer GDHeF, internal forward primer GDHiF, and reverse primer GDHiR designed to yield a fragment of 432 bp of the glutamate dehydrogenase (gdh) gene [39]; (iii) forward primer G7, forward inner primer G376, and reverse primer G759 to amplify a 384 bp fragment of the β-giardin (bg) locus [40]; (iv) primers AL3543 and AL3546 for primary PCR and AL3544 and AL3545 for secondary PCR which amplified a fragment of 530 bp of the triose phosphate isomerase (tpi) gene [41].
Amplicons were purified using a mi-PCR Purification Kit (Metabion International AG) and sent to an external laboratory for sequencing (Bio-Fab Research, Rome, Italy). The resulting chromatograms were manually checked using Finch TV 1.4 software (Geospiza, Inc., Seattle, WA, USA), in order to identify possible double peaks for mixed infections or single-nucleotide polymorphisms (SNPs). After adjoining of the forward and reverse reads, the sequences included approximately 60% of positions readable on both strands. The sequences were pairwise aligned using Clustal Omega software [42] to recheck variable positions. Consensus sequences for each amplified region were compared to those previously published in GenBank database: particularly, 18S SSU rRNA identities at assemblage level were verified using the Basic Local Alignment Search Tool (BLAST). Subsequently, sequences were trimmed to the shortest length with high quality in all our samples and complete representation in the downloaded ones.
GiardiaDB (https://giardiadb.org/), a central resource for public access to computational platforms, analysis tools, and data mining of genome-scale research data, was used to build three datasets (I, II, and III), one for each genetic marker (gdh, bg, and tpi) analyzed for phylogenetic purposes. In detail, we used a multistep strategy (section: GiardiaDB's My Strategies) matching the keywords "Giardia sp./Giardia" and "Giardia duodenalis" together with "Europe", excluding all the sequences isolated from humans (Homo sapiens). This produced a unique database with more than 100 sequences that was trimmed down by removing all sequences not isolated from wild animals and/or characterized in different genetic loci (ITS, SSU rRNA). In addition, a final GenBank "complete record" check was performed (last update December 2021) for each sequence, paying attention to "Features-isolation and host".  [43]. The assemblage/sub-assemblage A reference sequences were selected from Cai et al. [44].
After alignment, each dataset was trimmed at particular nucleotide positions of the reference sequences, to maximize the overlap between ours and Giardia sequences from European wild hosts (Table 1) The best-fit model of evolution for each alignment was determined using the Akaike Information Criteria (AIC) selected from Modeltest v3.7 [45], included in the "phangorn package" in Rstudio v1.4.1106. Phylogenetic analyses of Datasets I, II, and III were conducted using the Maximum Likelihood (ML) statistical method and 1000 bootstrap pseudo-replications using Rstudio v1.4.110 ("ape" and "phangorn" packages) under the TrN+G+I substitution model (Dataset I), TrN+I (Dataset II), and TIM1+I (Dataset III). For the details of the datasets, see Table 1. Phylogenetic Trees were visualized trough Itol: Interactive Tree of life v6 (https://itol.embl.de/). The sequences generated in this study were deposited in GenBank under accession numbers OL840340-OL840345 for the SSU rDNA gene, OL828750-OL828751 for the gdh locus, OL944442-OL944447 for the bg gene, and OL944434-OL944437 for the tpi locus.

Haplotype Analysis and Networks
To gain insights into the geographical distribution of assemblages/sub-assemblages in the European context of wildlife, a haplotype analysis was conducted on polymorphic sites using DnaSP v.6 software [46] and Tajima's D test [47]. PoPART (Population Analysis with Reticulate Trees) genetic software [48] was used to perform the Median-Joining network calculation [49]. The analysis was performed on the three databases (see Section 4.2 of Materials and Methods and Table 1) with sequences trimmed to the shortest length with high-quality fragments and sites, without considering alignment gaps; in detail, gdh (Dataset I), consisting of 26 sequences of 154 bp, bg (Dataset II), of 69 sequences of 155 bp, and tpi (Dataset III), of 38 sequences of 433 bp, including also ambiguous sequences (double peaks presence). In case of ambiguous sequences (unphased data), haplotype reconstruction was performed in DnaSP v.6 by the algorithms provided by PHASE [50,51] which uses a coalescent-based Bayesian method to infer haplotypes.