First complete genome sequence and molecular characterization of Canine morbillivirus isolated in Central Brazil

The Brazilian regions are still highly endemic areas for Canine morbillivirus [canine distemper virus (CDV)]. However, little is known regarding the genetic variability of the strain circulating in several Brazilian regions. Here, we report the first full-length genome and molecular characterization of CDV isolated from domestic dogs in the Brazilian Center-West region. Sequence alignment and phylogenetic analyses based on deduced amino acid and nucleotide sequences showed that the isolated strain is characterized as the South America-I/Europe genotype. However, it segregates into a CDV subgenotype branch. Interestingly, both H and F proteins have a gain of a potential N-glycosylation sites compared to the Onderstepoort vaccine strain. Therefore, this study provides a reference to further understand the epidemic and molecular characteristics of the CDV in Brazil.

www.nature.com/scientificreports/ determining CDV genetic lineages 13,14 . In addition, these classifications are related to the geographic origin where the lineages have been detected. While Brazil is considered endemic for CD, with high disease incidence rates 15 , there have been limited studies conducted in the country related to virus isolation and molecular characterization of the circulating wild-type strains. To date, no studies have examined the full-length genome to characterize the Brazilian CDV field. Therefore, to elucidate the genetic basis of the protein diversity of CDV, we conducted amino acid and nucleotide sequence analysis of a recent field isolate, with a focus on the H and F genes, which are the most suitable targets to investigate the CDV variability and evolution 5,13,14 .

Methods
Ethics statement. All  Reverse transcription-polymerase chain reaction (RT-PCR). The ocular/nasal specimens from dogs were collected with flocked swabs placed into 1 mL universal transport medium (UTM; Copan, Brescia, Italy). These samples in UTM were separated and used for the detection of viral RNA and cryopreserved for virus isolation. Initially, the RNA was extracted using a QIAamp Viral RNA commercial kit (QIAGEN, Hilden, Germany) according to the manufacturer's specifications. Briefly, the method involved synthesis of a complementary DNA (cDNA) strand with a denaturation mix consisting of 1.0 µL (10 pmol/µL) random hexamers (Promega, Inc), 0.5 µL nuclease-free water (Thermo Scientific, Inc), and 8.5 µL total RNA; this mix was denatured at 70 °C for 5 min and immediately incubated on ice. In addition, zeocin (R25001, Gibco) antibiotic was added (1% final concentration) for stable maintenance of canine signaling lymphocytic activation molecule (SLAM) tag expression, which is one of the cellular receptors for CDV. An aliquot of cryopreserved PCR-positive swab samples for CDV was thawed and filtered through a 0.22μm filter and used as an inoculum for virus isolation. Thus, CDV isolation was attempted using VDS cells. Consequently, confluent VDS cells in 6-well plates were washed twice with phosphate-buffered saline (1X) and inoculated with 400 µL sample and 200 µL DMEM. After 1-2 h incubation with gentle shaking, more DMEM supplemented with 2% FBS and 1% antibiotic was added to each well. Inoculated cells were incubated at 37 °C with 5% CO 2 . The cell culture plate was observed each day under an inverted microscope to determine whether a cytopathic effect (CPE) developed. If there was a CPE, the supernatant was collected and confirmed to be positive for CDV by RT-PCR. All positive cell cultures were subjected to molecular typing with whole genome sequencing. However, if no CPE was observed 7 days post-inoculation, the supernatants were inoculated on new VDS cells for a second passage. Finally, if the CPE tests and RT-PCR results were negative after three passages, the virus isolation result was considered negative.
Primer design, RT-PCR and whole genome sequencing. CDV-specific primers (Supplementary Table S1) were designed based on the reference sequences for the target species, which were selected from an analysis of complete sequences accessed through the Virus Pathogen Database and Analysis Resource (ViPR) 18 and nucleotide sequences available in GenBank 19 .
The primers were designed using Geneious Primer (2020.2) 20 to cover the complete genome sequence of all CDV strains. The primers were designed to have a melting temperature (Tm) between 52 and 68 °C and not to form hairpin loops or primer dimers. In addition, the NCBI BLAST tool was used to confirm the specificity of the primers for CDV. Supplementary Table S1 shows the primers created and used for full-length cDNA amplification and sequencing. Viral RNA, extracted from cell lysate plus supernatant, was quantified by Qubit 4 Fluorometer. Briefly, RT reaction was carried out in a volume of 20 µl. Eight point five microliters of viral RNA plus 1.5 µl of Random primer (0.75 µg) were heated for 5 min at 70 °C and chilled on ice (5 min). Ten microliters of the previous reaction was mixed with 4 µl of 5 × buffer, 1. www.nature.com/scientificreports/ PCR was conducted in a reaction volume of 50 µl containing 5 µl 10 × PCR buffer (Invitrogen), 2 µl MgSO 4 (50 mM), 4 µl mix dNTPs (2.5 mM), 1 µl forward primer (10 µM), 1 µl reverse primer (10 µM), 0.4 µl Taq DNA polymerase High Fidelity (2 U) (Invitrogen), 28,6 µl nuclease-free water and 8 µl of template DNA. The initial denaturation at 94 °C for 2 min was followed by 50 cycles at 94 °C for 15 s, 58 °C for 30 s and 68 °C for 75 s. Taq DNA polymerase High Fidelity was used to amplify the genome in 25 overlapping fragments (Fig. 1). The PCR products were then purified with the QIAquick PCR purification kit according to the manufacturer's protocol. Purified amplicons were sequenced bidirectionally using an ABI3.500 genetic analyzer (Applied Biosystems).

Phylogenetic analysis and molecular characterization of CDV. Phylogenetic analysis was per-
formed using the nucleotide/amino acid sequence, as well as the sequences of 24 reference strains for which full genome sequences were available in GenBank and ViPR. Sequences were edited and aligned using the Multiple Sequence Comparison by Log-Expectation (MUSCLE) program in the Geneious software package. A phylogenetic tree was constructed, based on the open reading frame (ORF) sequences of CDV, using the neighbor-joining method in the Geneious software package. Bootstrap analysis was carried out on 10,000 replicate data sets.
Selection pressure on the F/H proteins was evaluated using four methods: Single Likelihood Ancestor Counting (SLAC), Fixed Effects Likelihood (FEL), Mixed Effects Model of Evolution (MEME), and Fast Unconstrained Bayesian AppRoximation, for inferring selection (FUBAR) on the Datamonkey web server 22 . A p value less than 0.05 for MEME and FEL and a posterior probability higher than 0.9 for FUBAR were considered suggestive of positive selection. The Genetic Algorithm for Recombination Detection (GARD) analysis in Datamonkey was performed to detect the recombination breakpoints in the H gene alignment of the wild-type CDV isolated in this study.

Results
Detection of the N gene in clinical specimens. Biological samples from a total of 30 dogs with clinical suspicion of CD were collected in 2019. Preliminary identification of CDV was done using nested RT-PCR targeting the conserved region of the N gene. A total of 18 samples were positive and showed a specific band at 287 bp in an agarose gel. VDS cells were then inoculated with the samples. Gross lesions such as detachment of cells and the syncytial effect were observed. Again, the nested RT-PCR was employed on the harvested VDS cells to confirm the isolation of a virus, designated JA88/2020. In the third passage of JA88, a confluent CPE (~ 80%) was observed at 48 h post-infection. The concentration of viral RNA obtained from cell lysates plus supernatant was 74.8 ng/µl. Figure 2 presents the results with the synthesized information.

CDV subgenotype and amino acid analysis of the H and F proteins.
Following the criteria of at least 95% amino acid identity to define a genotype and 98% to define a CDV subgenotype, for H amino acid identity, we identified a subgenotype in the SA1/EU lineage (subgenotype 1A; 3.95% amino acid variation). Regarding the F amino acid identity, for the Fsp fragment gene sequences, we arbitrarily extrapolated the classification and also found a subgenotype within the SA1/EU lineage (subgenotype 1A; 4.4% amino acid variation).
The deduced full-length H and F amino acid sequences were aligned with the Onderstepoort vaccine strain and with wild-type strains from other parts of the world (Figs. 5 and 6

H and F protein N-glycosylation sites analysis.
Seven N-glycosylation sites are predicted at amino acid residues in the H protein: 19, 149, 391, 422, 456, 587, and 603. Our analysis also showed that these are conserved sites for N-glycosylation in the H protein for the SA1/EU, America 1/2, Africa, Asia-1/2, and Arctic lineages. Surprisingly, a potential N-glycosylation site at position 309 is lost in JA88/2020 compared with the other lineage from SA1/EU. Regarding the F protein, there are six N-glycosylation sites at amino acid residues 62, 108, 141, 173, 179, and 517, which are common sites for other lineages. In addition, an N-glycosylation site at amino acid residue 108 is lost in the reference lineage from SA1/EU (Uy251).  Table S2). Codons 71, 208, 612, and 644, potential positively selected codon sites, were detected by FEL. Codons 21, 87, and 101 were detected by FUBAR and MEME. Codons 61 and 302, potential negatively selected codon sites, were detected by FEL. One hundred codon sites predicted under negative selection were detected by FUBAR (Supplementary Table S2). Finally, recombination events for codons 549-662 were found for CDV by analysis with GARD from the Datamonkey package. Selection pressure analysis of the H protein revealed 14 positively selected codon sites. Of the 14 positive sites, three were detected by several methods (Supplementary Table S2). Codons 172, 218, 227, 291, 309, 401, and 530, potential positively selected codon sites, were detected by FEL. Codon 530 was detected by FEL, FUBAR, and MEME. Seventy-five codon sites predicted to be under negative selection were detected by FUBAR (Supplementary Table S3).

Discussion
In this work, we amplified the complete coding and intergenic regions of the JA882020 SA1/EU strain obtained from a dog's nasal swab. The sample came from a 5-year-old male that showed clinical manifestations of myoclonus, paralysis, and vocalization; the animal ultimately died. Importantly, degenerate primer sets were generated Figure 3. A phylogenetic tree based on amino acid sequences between the detected CDV and reference strains. Bootstrap values (> 50%) are shown at each node of the tree using 10,000 replicates. The scale bar below the tree represents a genetic distance of 0.04 amino acid substitutions per site. The CDV isolate identified in this study is indicated in red and with an asterisk. Sequences are labelled using GenBank accession numbers. SA1/EU: South America 1/Europe; AM1: America 1; AM2: America 2; AS1: Asia 1; AS2: Asia 2; PDV: Phocid distemper virus (was included as an outer group). www.nature.com/scientificreports/ to sequence this isolate. Furthermore, we contribute to the identification of sequence variability, and this information is also valuable for selecting appropriate primers and excluding false-negative PCR results. The putative natural recombination events of the CDV F gene have been reported by our group, and the results in this study correlate with other previous findings 23 . Recombination events have also been reported in the H gene 10,[24][25][26] . Consequently, the introduction of genetic mutations and recombination result in significant genetic variability of RNA viruses and may lead to the emergence of new viral lineages. Therefore, it is necessary to monitor these events to understand the genetic evolution of CDV.
One of the main benefits of monitoring mutations in infectious agents is to associate whether possible nonsynonymous mutations are contributing to the prevalence of a more contagious and/or pathogenic strain. It is worth mentioning the molecular epidemiological surveillance of two glycoproteins on the CDV viral surface: H and F. To evaluate the possibility that key residues are involved in virulent CDV, Zipperle et al. 27 identified the key residues in the H protein (Y525, D526, and R529) that are involved in controlling SLAM-binding activity. SLAM and nectin-4 are CDV host cell receptors, which are expressed on activated T and B lymphocytes; epithelial, glial, and dendritic cells; and macrophages 5 . Consequently, based on the amino acid mutations of the viral isolate in this study, these key residues (H protein) have been conserved. However, it is important to search for new key residues in other proteins such as F to understand factors involved in virulent CDV.
Better molecular characterization of the CDV epidemic in Brazil is needed because CDV infection in dogs is high and deadly. Few studies have analyzed molecular epidemiology and carried out molecular analysis of fulllength genes from Brazilian CDV lineages 28,29 . Therefore, some researchers have performed complete sequence analysis of the full-length F and H genes 14,30 . These reports have demonstrated the predominance of one genotype in Brazil: SA1/EU. However, two co-circulating lineages have already been detected, including the South America-II lineage. Moreover, similar results to our study have been described regarding CDV subgenotypes found in biological samples from dogs in Brazil and elsewhere 30,31 . CDV genotypes possibly differ due to geographic distribution rather than by host species. In this context, for a country as large as Brazil, concomitant circulation of different CDV genotypes is possible. Given this possibility, extensive molecular epidemiological surveys are required to determine the circulating (sub)genotypes 32 .
The unique molecular signatures of the F and H genes were identified through visual inspection from amino acid positions (F: S71G, R105Q, K208R, L386I, V612I, D644E, H: K161N, L172S, N218T, V227L, T291S, E332K, V363M, R401E). The analysis of polymorphisms featuring has been carried out by Fischer 28 ; with the observation of unique amino acid signatures of the distemper epidemic in local dog populations from Rio Grande do Sul state, Brazil. Similarly, another study showed unique amino acid patterns for viral isolates from Argentina 11 . Here, we note that JA88 can be classified into a subgenotype (1A) based on the genetic diversity of the F and H genes. Consequently, the unique molecular signatures contributed to this finding. Further research will be important www.nature.com/scientificreports/ to ascertain whether these findings are stable in local dog populations, and/or whether they are involved in the gain of some characteristic of the virus, such as an increase or reduced virulence in these viral strains.
The N-glycosylation sites in the F and H proteins are essential for their correct folding, transport, and cell surface expression. Hence, it is necessary to ask what is the importance of monitoring N-glycosylation sites in viral proteins? Among the possible answers, the following stand out: in previous studies, researchers have hypothesized that reduced N-glycosylation contributes to attenuate CDV pathogenesis, and that an increase in N-glycosylation may eventually result in vaccine failure 33,34 . Here, we observed an extra putative N-glycosylation site in the F protein compared with the Uruguay sequence (Uy251). Strikingly, four additional N-glycosylation sites in the F and H proteins were found compared with the vaccine strain, thus showing its importance for modulating virulence.
In conclusion, this study is unique because it is the first that has isolated and identified the full CDV genome from Brazil. In sum, the isolated strain is characterized as the SA1/EU genotype, but it is segregated into a CDV subgenotype branch. Sequence analysis of more CDV field isolates from different Brazilian geographic regions is needed to investigate differences between (sub)genotypes. In addition, immunological investigation might be required to determine and monitor the biological relevance of circulating CDV (sub)genotypes and their importance for future drug and new vaccine development.  www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.