Genetic Characterization and Variation of African Swine Fever Virus China/GD/2019 Strain in Domestic Pigs

African swine fever (ASF) was first introduced into Northern China in 2018 and has spread through China since then. Here, we extracted the viral DNA from the blood samples from an ASF outbreak farm in Guangdong province, China and sequenced the whole genome. We assembled the full length genomic sequence of this strain, named China/GD/2019. The whole genome was 188,642 bp long (terminal inverted repeats and loops were not sequenced), encoding 175 open reading frames (ORF). The China/GD/2019 strain belonged to p72 genotype II and p54 genotype IIa. Phylogenetic analysis relationships based on single nucleotide polymorphisms (SNPs) also demonstrated that it grouped into genotype II. A certain number of ORFs mainly belonging to multigene families (MGFs) were absent in the China/GD/2019 strain in comparison to the China/ASFV/SY-18 strain. A deletion of approximately 1 kb was found in the China/GD/2019 genome which was located at the EP153R and EP402R genes in comparison to the China/2018/AnhuiXCGQ strain. We revealed a synonymous mutation site at gene F317L and a non-synonymous mutation site at gene MGF_360-6L in China/GD/2019 comparing to three known Chinese strains. Pair-wise comparison revealed 165 SNP sites in MGF_360-1L between Estonia 2014 and the China/GD/2019 strain. Comparing to China/GD/2019, we revealed a base deletion located at gene D1133L in China/Pig/HLJ/2018 and China/DB/LN/2018, which results in a frameshift mutation to alter the encoding protein. Our findings indicate that China/GD/2019 is a new variant with certain deletions and mutations. This study deepens our understanding of the genomic diversity and genetic variation of ASFV.


Introduction
African swine fever (ASF) is a highly pathogenic infectious disease caused by African swine fever virus (ASFV) [1,2]. Since the ASFV genome is complex and encodes many genes that have different functions [3], it is difficult to develop vaccines and drugs against ASFV infection [4][5][6]. Since the disease was first introduced into China in 2018, it has spread rapidly and has a tendency to sweep the whole country it is present in [7][8][9][10].
With huge molecular weight and linear double-stranded DNA, ASFV is the only member of Asfvirus genus within the Asfarviridae virus family [3,11]. The genome of the virus ranges in size from 170 to 193 kb, containing 150-167 open reading frames (ORF), of which the function of one third is unknown. It consists of a conserved central region and a variable region at both ends (containing five multigene families, MGFs) [3,12]. Most of the variations among ASFV genomes are due to the presence of different numbers of MGF genes in the left or right variable regions (LVR and RVR) [13][14][15]. MGFs are characteristic of the virus; five families have been recognized-MGF 100, 110, 300, 360 and 505/530 [16,17]and the function of many is still unknown. With the spread of ASFV for so many years, the virus has already had a lot of variations and divergences in the genome [18]. Fortunately, Pathogens 2022, 11, 97 2 of 11 in recent years, complete genome sequences from strains of different origins have become more easily available along with comparative analyses [19,20].
No study on the characterizations of the complete genome of strains responsible for ASF outbreaks in Guangdong province in Southern China is available. Using the next generation sequencing technique, the complete genome sequence of the ASFV China/GD/2019 strain was assembled. Using phylogenetic analysis based on full length p72 and p54 genes, the China/GD/2019 strain clustered into genotype II. We used phylogenetic analysis to identify the different origin and genotype strain relationships based on SNPs, and the China/GD/2019 strain clustered with genotype II strains and showed high similarity with Estonia 2014 on encoding genes. A detailed genomic comparison of the China/GD/2019 strain with related p72 genotype II isolates on encoding genes and SNPs was conducted. We compared China/GD/2019's genome with China/2018/AnhuiXCGQ by comparative genomic analysis. We found a deletion of approximately 1 kb in the China/GD/2019 genome which was located at the EP153R and EP402R genes. According to SNP/InDel analysis, a large number of mutations were found between Estonia 2014 and China/GD/2019. A synonymous mutation site at gene F317L and a non-synonymous mutation site at MGF_360-6L were detected in the China/GD/2019 strain; we also found a base deletion at gene D1133L in the China/Pig/HLJ/2018 and China/DB/LN/2018 strains. By comparing core and pan genes, we found that 14 MGF members were absent from the China/GD/2019 strain in the MGF regions (especially 360 and 110 multigenes) in comparison to the China/ASFV/SY-18 isolate. Other genes whose function is unknown were also found to be missing compared to the China/2018/SY-18 isolate. It is of no doubt that China/GD/2019 is a new member of ASFV family with certain deletions and mutations. This study of genome characteristics of ASFV is of great significance for the source tracing and prevention and control of ASFV.

Complete Genome Sequence of ASFV China/GD/2019 Strain
The complete genome sequence of the ASFV China/GD/2019 strain was 188,642 bp in length, not including terminal inverted repeats and cross links. The final assembly of the China/GD/2019 strain genome was accomplished from a reference-based alignment consisting of 1.925 Gb mapped reads with an average depth 100×. The genome of this strain is considerably smaller than that of other Chinese ASFV isolates. We used three databases to predict gene functions: Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Swiss-Prot. We identified 175 ORFs, the function of the ASFV China/GD/2019 strain encoding genes is involved in virus assembly, enzymes, extracellular region parts, and viral reproduction.

Phylogenetic Analysis of Full Length p72 (B646L) and p54 (E183L) Genes
To determine the genetic relationship between the China/GD/2019 strain and other previously identified ASFV genotype (I-XX p72) isolates listed in Table 1 (for which p72 sequences were available), the ASFV p72 gene phylogenetic tree was constructed ( Figure 1).
A phylogenetic tree based on the full length p72 (B646L) sequence alignment indicated that the new ASFV strain belongs to genotype II and is similar to those from Chinese reporting in recent years; it is certain that all the ASFVs circulating in China belong to genotype II. Nucleotide sequence comparisons by the Basic Local Alignment Search Tool (BLAST) revealed that the p72 sequence of China/GD/2019 was 100% identical to other Chinese isolates. Furthermore, a p54 NJ tree was constructed using the full length p54 sequences ( Figure 2) from the strains in Table 1 (for which p54 sequences were  available). Undoubtedly, the p54 genetic tree showed that p72 genotype II strains were separated into similar genotype II, consistent with the p72 evolutionary relationship. The China/2018/AnhuiXCGQ, China/Pig/HLJ/2018 and China/DB/LN/2018 strains belong to p54 genotype IIc, while China/GD/2019 was not clustered within the identification of the previously reported four Chinese genotype II variants, and was closely related to Georgia 2007/1 (genotype IIa). These data indicate that the strain in the Southern China could be diverse, and the ASFV strains in China in 2019 are related to that of China in 2018 and Eastern Europe.   into similar genotype II, consistent with the p72 evolutionary relationship. The China/2018/AnhuiXCGQ, China/Pig/HLJ/2018 and China/DB/LN/2018 strains belong to p54 genotype IIc, while China/GD/2019 was not clustered within the identification of the previously reported four Chinese genotype II variants, and was closely related to Georgia 2007/1 (genotype IIa). These data indicate that the strain in the Southern China could be diverse, and the ASFV strains in China in 2019 are related to that of China in 2018 and Eastern Europe.

Core Genes and Specific Genes Analysis
To understand the difference of coding genes in different strains more intuitively, we selected previously reported ASFV isolates, and combined with the China/GD/2019 strain to perform core and pan genes analysis. The number of ASFV pan genes may expand with each added genome which contributes to understand the evolution relationship. Conversely, the number of ASFV core genes may decline as each genome increases, most of which are critical to the ASFV survival.
We obtained 116 core genes and 249 pan genes in different samples. The 116 core genes identity of 28 ASFV strains with China/GD/2019 is listed in Supplementary Table S1.
We performed ORFs identity comparison of 28 ASFV strains with China/GD/2019, which shows the missing coding proteins. These deleted genes were located at MGF 505, MGF 110, MGF 300, MGF 100, and MGF 360. Deleted MGF regions in the China/GD/2019 strain were mainly located at MGF 360 and 110, with few of them located at MGF 300, 100 and 505.
Since China/GD/2019 belongs to genotype II, and is similar to those strains from China and Eastern Europe reported in recent years, we compared China/GD/2019 with the Russia/Kashino_04/13 strain. We found 133 identical ORFs and 23 ORFs sharing 90.8-99.8% sequence identity (Supplementary Table S1). The changed ORFs included IAP-  Table S1). We also compared the China/GD/2019 genome with the China/2018/AnhuiXCGQ strain by collinearity analysis and found a deletion of approximately 1kb in China/GD/2019 which was located at EP153R and EP402R genes ( Figure S1).

Phylogenetic Analysis of the SNP
SNP is the most common evolutionary form of genomic variation. ASFV has a huge genome and highly susceptible to mutation, so SNPs may be either in the gene sequence or in the noncoding sequence outside the gene [3]. Based on SNPs from the complete genome level, we explored the correlation between China/GD/2019 and other different ASFV strains, and we also selected previously identified ASFV strains listed in Table 1. As shown in Figure 3, most of them belong to genotype II and all the 29 ASFV strains were also grouped into five main branches. At the top, China/GD/2019 was most closely related with three other Chinese strains (MK128995, MK333180, and MK333181), but was relatively distant to the first isolated Chinese strain (MK766894). Georgia 2007/1, Belgium_2018/1 and Estonia 2014 were also in a cluster with China/GD/2019 at high credibility. Strains from Poland (Pol16_20186_o7, Pol16_29413_o23, and Pol17_05838_C220) and Russia/Odintsovo_02/14 showed similarity to the China/GD/2019 strain in SNPs distribution. These data show that different strains belonging to the same genotype have similar SNPs and that strains with highly similar encoding genes also have similarity in SNPs distribution.

SNP/InDel Analysis of China/GD/2019 Strain
Through the above evolutionary tree analysis, we focused on the relative variation and similarity between this strain with other p72 genotype II isolates, so that we can learn more about the variation and characteristics of ASFV transmission in China. Based on the above phylogenetic tree relationship, we selected eight strains that were relatively similar to China/GD/2019. The SNP statistics results revealed a large number of SNPs that were discovered between China/GD/2019 and Estonia 2014 (Supplementary Table  S2). About 165 SNPs were located at MGF_360-1La and MGF_360-1Lb, including 1 initial codon nonsynonymous mutations, 2 premature_stop, 63 synonymous, 102 nonsynonymous, and 4 intergenic. However, the effects of these variations cannot be determined. Comparing China/GD/2019 with the other seven strains, only a few mutations were found ( Table 2). The SNPs distribution of seven ASFV strains with China/GD/2019 is listed in Supplementary Table S3. tively distant to the first isolated Chinese strain (MK766894). Georgia 2007/1, Bel-gium_2018/1 and Estonia 2014 were also in a cluster with China/GD/2019 at high credibility. Strains from Poland (Pol16_20186_o7, Pol16_29413_o23, and Pol17_05838_C220) and Russia/Odintsovo_02/14 showed similarity to the China/GD/2019 strain in SNPs distribution. These data show that different strains belonging to the same genotype have similar SNPs and that strains with highly similar encoding genes also have similarity in SNPs distribution.

SNP/InDel Analysis of China/GD/2019 Strain
Through the above evolutionary tree analysis, we focused on the relative variation and similarity between this strain with other p72 genotype II isolates, so that we can learn more about the variation and characteristics of ASFV transmission in China. Based on the above phylogenetic tree relationship, we selected eight strains that were relatively similar to China/GD/2019. The SNP statistics results revealed a large number of SNPs that were discovered between China/GD/2019 and Estonia 2014 (Supplementary Table S2). About 165 SNPs were located at MGF_360-1La and MGF_360-1Lb, including 1 initial codon nonsynonymous mutations, 2 premature_stop, 63 synonymous, 102 nonsynonymous, and 4 intergenic. However, the effects of these variations cannot be determined. Comparing China/GD/2019 with the other seven strains, only a few mutations were found ( Table 2).    (Table 3).  There are other non-synonymous mutation sites located at genes K145R, E199L, MGF 505, MGF 360 and E184L, when comparing with other four strains from Poland and Belgium. Base substitutions affect the encoding protein only by changing the encoding amino acid, whereas insertion and deletion have the greatest impact on the genome. Therefore, we used LASTZ software to detect small fragment InDel with a length of less than 50 bp by comparing China/GD/2019 with the seven other related ASFV genotype II strains. InDel analysis results showed a base deletion in China/Pig/HLJ/2018 and China/DB/LN/2018, which was located at gene D1133L, causing frameshift mutation, and changing the encoding amino acid and protein structure.

Discussion
Since the first outbreak of ASFV in China in 2018, the pig breeding industry, especially the basic production capacity, has been affected. The present study investigated the molecular characterization of ASFV strains that occurred in 2019 in Guangdong province, Southern China. Genetic analysis showed that the ASF outbreaks in Southern China were caused by genotype II ASFV, which was highly similar to the other Chinese strains and related Eastern European (Russia and Poland) genotype II strains. This study verified that the China/GD/2019 isolate may be derived from an introduction of ASFV strains circulating in Eastern Europe [21,22].
The phylogenetic tree based on all available nucleotide sequences of the ASFV complete genomes indicated that full length p72 (B646L) and p54 (E183L) were similar to that of SNPs. However, further phylogenetic analysis is necessary to ascertain this relationship [23].
The key finding from our study is that p72 ASFV phylogenetic analysis genotyping results can be coordinated with other phylogenetic analysis methods. The p72 genotype II viruses are separated into genotype IIa and Iic in the p54 phylogenetic tree, suggesting that they are phylogenetically closely related, and it is clear from the latter phylogenetic tree that they do not form a monophyletic lineage. However, although the genome sequence of the ASFV strain in Guangdong province of China showed high similarity to those of recently isolated ASFV strains from China and Eastern Europe, the specific source of this strain remains unclear, probably due to the limited sequence information obtained in this study.
Comparing the genome sequence of the China/GD/2019 strain with those of Chinese and related Eastern European virulent p72 genotype II strains showed a range of 9-165 mutation sites along the genome sequences. Small numbers of SNPs have been found among Chinese strains. ASFV major structural proteins and some reported virulence factors such as MGF 360-4L, 11L, 12L, and MGF 505-1R did not contain any genetic mutations [24][25][26][27]. Furthermore, several genes were affected by point mutations, including K145R, E199L, MGF 505, MGF 360 and E184L. A total of 165 variable sites were found at MGF_360-1L between China/GD/2019 and Estonia 2014, a long time may be required to result in such huge difference.
It is suggested that the variation of ASFV in China does not simply depend on the replacement of a few or even dozens of bases, but is accompanied by the insertion or deletion of small or large fragments [28]. By comparing to the China/GD/2019 strain, a deletion region was checked at gene D1133L simultaneously in China/Pig/HLJ/2018 and China/DB/LN/2018. According to previous analyses, the insertion/deletion may be attributed much to the homologous recombination [29][30][31]; we thought these could be a variation of the ASFV as it spread in China, but the effects on the infectivity and virulence of the virus is unknown.
Most of ASFV genome variations result from gain or loss of genes in the MGFs [32]. The ORFs are absent or truncated in the China/GD/2019 genome, with even additional genes adjacent to these areas (MGF 360 and MGF 110) deleted or truncated. According to previous observations, the ASFV BA71V strain isolated by repeated tissue culture would lead to the loss of MGF 110 family members [33]. Those ORFs still present may have a crucial role for replication in macrophages and virulence, but quite a few of them are still mostly uncharacterized. At present, ASFVs with MGF 505 and 360 genes deletion have been identified as the most promising vaccine candidates [34]. China/GD/2019 has a lack of MGF 360 family genes, which may be related to the virulence but not the infectivity of the virus. However, China/GD/2019 may not be attenuated in this way since other genes also can affect the virulence, and whether the missing genes in China/GD/2019 strain are due to frameshifts/single SNPs or full deletions needs further study. Comparing China/GD/2019 with China/2018/AnhuiXCGQ, a deletion of approximately 1 kb was found in China/GD/2019 which was located at the EP153R and EP402R genes. This deletion may cause changes in virus virulence and infectivity. As for the influence of sequencing artefacts, it is difficult to precisely distinguish between true low frequency variants and mutations and sequencing artefacts. Low template copies may be associated with higher probability of artefacts. There are many strategies to minimize the occurrence of sequencing artefacts, such as improving sequencing depth and template copies of samples, performing duplicate reactions for the same sample and so on.
This study demonstrates that genotype II ASFV circulating in Southern China (China/ GD/2019) is genetically diverse. Further research is required to compare the whole ASFV genomes of genotype II from pigs in China with entire genome sequences of isolates from recent outbreaks to provide more insights into the genetic characterization and variation of ASFV.

Ethics Approval and Consent to Participate
We obtained written informed consent to collect clinical samples from the pig farm. All clinical samples collection was approved by the Institutional Animal Care and Use Committee of SunYat-sen University of China.

Field Samples
Clinical blood samples were collected from a pig farm of the Guangdong province in Southern China in 2019, and then were confirmed by real-time PCR with amplification targeting the B646L (p72) gene, with Ct values ranging from 15 to 23. Blood sample (Ct value = 15) collected from one pig which showed severe clinical symptoms was used for genome sequencing.

DNA Extraction
The field ASFV-positive blood samples were used to extract DNA for the next generation sequencing. Total DNA was extracted in duplicate using the QIAamp MinElute Virus Spin Kit (Qiagen) according to the protocol. The extraction kit retains both RNA and DNA, and a diagnostic conventional real-time PCR confirmed ASFV positive in the samples. The final elution volume was 30 µL of sterile nuclease-free water.

Genome Sequencing and Assembly Analysis
For full genome sequencing, the extracted DNA was fragmented into a length of about 350 bp by Covaris ultrasonic processor, and then the DNA fragments were processed using Pathogens 2022, 11, 97 9 of 11 the NeBNext ® Ultra™ DNA Library Prep Kit for Illumina (NEB, Ipswich, MA, USA) according to the manufacturer's instructions. After quantification by Qubit 2.0 equipment, the DNA samples were sequenced using an Illumina NovaSeq PE150 sequencer (Illumina, San Diego, CA, USA). Raw reads were cleaned by filtering the inferior quality reads by Readfq v10. Swine genome (Sus scrofa 11.1, GenBank accession number GCF_000003025.6) reads were removed to eliminate host DNA contaminations. The viral genome was assembled using China/2018/SY-18 genome as a reference (GenBank accession number MH766894) by CLC Genomics Workbench v9. The genome sequence data generated in this study are available in GenBank database (accession number MW361944).

SNP/InDel Analysis
The global alignment between each sample and the reference sequence was carried out using the MUMmer (version 3.23) comparison software. Sequences of 100 bp on each side of the reference sequence SNP sites were extracted, and then BLAT software was used to compare the extracted sequences with the assembly results to verify the SNP sites. If the length of comparison is less than 101 bp, it is considered to be an untrustworthy SNP and will be removed. If the SNP is considered to be a repeating region after comparison for many times, it will also be removed. Finally, BLAST, TRF and Repeat mask software were used to predict the repeating sequence region of the reference sequence, and the SNP located in the repeating region was filtered, so that we could end up with a reliable SNP.
Insertion and deletion (InDel) refers to the insertion and deletion sequences of small segments of the genome. LASTZ software (Version 1.03.54) was used to compare the sample with the reference sequence, and then the comparison results were processed by axt_correction, axtSort and axtBest procedures to select the best comparison results, and the preliminary InDel results were obtained. Then, the upstream and downstream 150 bp (3xSD) of the reference sequence InDel site were selected and compared with the sequenced Reads of the sample using BWA software and SamTools for verification. Filtering yields reliable InDel.

Core Genes and Pan Genes Analysis
The common genes existing in all strains are called core genes. In addition to the core genes, other non-common genes are called dispensable genes. Specific genes only exist in a certain strain [35]. All dispensable genes and core genes are merged into pan genes. Core and pan genes analysis were performed using cd-hit (Version 4.6.1) software to cluster the protein sequences of multiple strains to be analyzed and mapped with R (Version 3.2.4). By comparing the gene/protein sequences of the different strains, we constructed core and pan genes tree of all strains.

Phylogenetic Analysis
Phylogenetic analysis of ASFV p72 (B646L) and p54 (E183L) genes was constructed based on strains for which these two genes were available from GenBank. Clustal W alignments were used for the alignment of the p72 and p54 nucleotide sequence.
The SNP phylogenetic tree was constructed based on SNP matrix of strains and reference strain population. For each strain, all SNPs were connected in the same order to obtain the same length of FASTA format (one of which is a reference sequence). The phylogenetic tree was constructed by maximum-likelihood (ML) method of Neighbor-Joining (NJ) method by TreeBeST software. The GenBank accession number, the year, genotype, reference and origin of ASFV genome sequences are listed in Table 1.

Conclusions
This study investigated the genomic characterization of an ASF outbreak in 2019 in Guangdong province, China. Genetic analysis indicates that the China/GD/2019 strain, which has new variations, is closely related to the genotype II ASFV isolates. Although belonging to genotype II, the ASFVs associated with the outbreaks in the Northern provinces of China have genetic diversity, and these outbreaks are correlated. Phylogenetic tree and comparative genomic analysis in this study will have multiple applications to improve our understanding of the degree of genetic evolution and variation differences between different isolates. This study provides useful information for exploring key factor of ASFV vaccine development.