Molecular characterization, phylogenetic and variation analyses of SARS-CoV-2 strains in Turkey

Aims: We present the sequence and single-nucleotide polymorphism (SNP) analysis for 47 complete genomes for SARS-CoV-2 isolates on Turkish patients. Methods: The Illumina MiSeq platform was used for sequencing the libraries. The SNPs were detected by using Genome Analysis Toolkit – HaplotypeCaller v.3.8.0 and were inspected on GenomeBrowse v2.1.2. Results: All viral genome sequences of our isolates were located in lineage B under the different clusters, such as B.1 (n = 3), B.1.1 (n = 28) and B.1.9 (n = 16). According to the Global Initiative on Sharing All Influenza Data nomenclature, all of our complete genomes were placed in G, GR and GH clades. In our study, 549 total and 53 unique SNPs were detected. Conclusion: The results indicate that the SARS-CoV-2 sequences of our isolates have great similarity with all Turkish and European sequences.


Results
A database of 68 complete genomes of SARS-CoV-2 strains belonging to different countries randomly selected from the GISAID database were compared with our isolations. The reference SARS-CoV-2 strain, which was rooted from Wuhan (NCBI GenBank, NC 045512) was also selected for comparison. A total of 115 SARS-CoV-2 genomes were placed in the phylogenetic tree. The maximum likelihood phylogenetic tree in Figure 1 shows a main lineage including several sublineages. All viral genome sequences of our isolates were located in lineage B under the different clusters, such as B.1 (n = 3), B.1.1 (n = 28) and B.1.9 (n = 16). According to the GISAID nomenclature, all of our complete genomes were placed in G, GR and GH clades (Figure 1 Table 2 SNPs were detected only once. Two hundred ninety-one (53%) SNPs were detected in ORF1ab, which is the longest ORF consisting of approximately 70% of the whole genome. The ORF1ab is cleaved into 16 nonstructural proteins (nsp). Among those, nsp3 and nsp2 had more SNPs in our study (n = 100 and n = 27, respectively). However, all noncoding mutations were detected in the 5 UTR region. All genomes had c.1-25C>T nucleotide variation, but only two genomes had additional noncoding mutations: c.1-56G>T and c.1-77G>T. In terms of base changing, the one most commonly detected (96%) was C>T. The detailed data about coding and noncoding mutations in our isolates is shown in Supplementary Table 2.

Discussion
The analysis of complete genome sequencing of new viruses is an important tool for epidemiology of infectious diseases, updating diagnostics and assessing viral evolution [7,8]. Complete genome data of a virus makes visible some epidemiological parameters including doubling time of an outbreak, reconstruction of transmission routes, and the identification of possible sources and animal reservoirs. It can also help the study of drug and vaccine design. In our study, we described 47 SARS-CoV-2 genomes and compared them with a dataset of 68 available SARS-CoV-2 genomes from different countries obtained from GISAID [9]. After the phylogenetic analysis, it was shown that all of our SARS-CoV-2 genomes are similar to the European strains. According to GISAID lineage, 47 of 47 of our SARS-CoV-2 genomes are in lineage B, which mostly consists of European strains. According to the marker mutations, those were placed in G, GR and GH clades. Moreover, most of European isolates were placed in G, GR and GH clades in GISAID [9].
In current literature, there are some studies that investigate the first isolates of their country to find the origin of those [3,[13][14][15][16][17]. One study contains two complete genomes of a Chinese patient visiting Rome and an Italian patient, reporting that the Italian patient's sequence clustered with European sequences and the Chinese patient's sequence clustered with the Wuhan sequence (NC 045512) [18]. The same researchers compared the two genomes regarding concurrent evolution and accumulation of mutations and reported that four mutations were detected in the Italian sequence. Another study reported that because of the variations, the first isolates of India did not highly identify with the Wuhan sequence (NC 045512) [17]. Moreover, Bal et al. reported that the first three French sequences were located in the European clade instead of the reference Wuhan clade because of the three-nucleotide deletion in Orf1a at positions 1607-1609 [13]. However, some European isolates that were generally the first isolates of a related country belong to non-European clades. A German sequence was evaluated in a study performed by Zehender et al. [19] and reported that it belonged to lineage A because of traveling to Shanghai between 20 and 24 January.
So far, there is a total of 138 SARS-Cov-2 complete genome sequence data except for our 47 isolates in the GISAID database from Turkey [9]. It can be seen that all of those genomes belong to lineage B, including the first case (lineage B.4) in Turkey. All sequences (n = 47) belong to lineage B, similar to all other Turkish sequences. However, the first cases in Turkey belong to the L clade according to GISAID classification, whereas ours belong to G, GR and GH clades.
According to the current literature, ORF genes have a crucial role in COVID-19 [20]. So, in our study, 549 total and 53 unique SNPs were detected from 47 SARS-CoV-2 isolates (Supplementary Table 2). A study performed by Khailany et al. [21] reported that 156 total and 116 unique SNPs were found in 95 SARS-CoV-2 isolates. The same study also pointed out that the most frequently observed base changes were C>T, the same as our study.
The most observed variations are found in positions 3036, 11083 and 13402 belonging to the Orf1ab gene; 28854 belonging to the N gene; 21707 and 21575 belonging to the S gene; and 28077 and 28144 belonging to the Orf8 gene [4,17,22,23]. Our variations in related positions are the same as those. We mostly detected variations in positions 3037 and 11083 in the Orf1ab gene; 28166 in the Orf8 gene; 28854 in the N gene and 24262 in the S gene. According to the global frequency of variations, most of our SNPs are novel; however, the studies on this issue are ongoing, and more detailed information will be presented in the near future. Orf1ab is the longest and most important gene among coronaviruses [20]. Most of the scientists detected variations in the Orf1ab gene. Thus, the mutations in this region may be significant concerning clinical features.

Conclusion
The results of the present study indicated that the SARS-CoV-2 sequences of our isolates have great similarity with all Turkish and European sequences. Further studies should be performed for better comparison of strains, after more complete genome sequences are released; however, these data may be useful to understand the dynamics of virus spread and may help further vaccine and treatment studies. The increase of SARS-CoV-2 cases all over the world is giving more genomes that may present some visibility of populace structure. This study showed the common and new variations in SARS-CoV-2 isolates. The fight against COVID-19 will last a long time until an effective vaccine or drug is developed. However, we believe that collecting and sharing any data about SARS-CoV-2 virus and COVID-19 will be effective and may help the related studies. At that point, we should carry on with detecting new variations.

Study limitations
Our study has some limitations: the number of national genomes available at the time of analysis and the number of detected variations. Additionally, not using a more robust methodology, such as maximum likelihood, is another limitation of the study. These limitations should be eliminated for further studies.

Supplementary data
To view the supplementary data that accompany this paper please visit the journal website at: www.futuremedicine.com/doi/sup pl/10.2217/fmb-2021-0118

Financial & competing interests disclosure
The current study was supported by Kafkas University, Scientific Research Project Council, with the following project number: 2021-TS-26. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
No writing assistance was utilized in the production of this manuscript.

Availability of data & material
The data that support the findings of this study are openly available in NCBI GenBank and GISAID at https://submit.ncbi.nlm.nih. gov/subs/ and https://www.gisaid.org/.

Summary points
• To understand the transmission patterns and evolution of SARS-CoV-2, it is crucial and necessary to create better drug and vaccine designs for disease control and prevention. The analysis of genetic sequence data from a pathogen is known as an important tool in infectious disease epidemiology. • Forty-seven positive samples were used for complete genome sequencing and variation analyses. A database of 68 complete genomes of SARS-CoV-2 strains belonging to different countries randomly selected from the Global Initiative on Sharing All Influenza Data database were compared with our isolations. • The results of present study indicated that the SARS-CoV-2 sequences of our isolates have great similarity with all Turkish and European sequences. • The fight against COVID-19 will last a long time until an effective vaccine or drug is developed. However, we believe that collecting and sharing any data about the SARS-CoV-2 virus and COVID-19 will be effective and may help the related studies. At that point, we should carry on detecting the new variations.