Temporal variations of country-specific mutational profile of SARS-CoV-2: effect on vaccine efficacy

Aim: In order to curb the transmission of SARS-CoV-2, nation-wide travel restrictions at different levels were implemented in different countries. Country-specific mutational profile may exist and have an impact on vaccine efficacy. Materials & methods: We identified nonsynonymous mutations in approximately 215,000 SARS-CoV-2 sequences during the 1st year of the pandemic in 35 countries. Mutational profiles on a bimonthly basis were traced over time. We also examined the mutations that overlapped with the spike protein vaccine epitopes. Results: Several new mutations emerged over time and were dominating in specific countries. Many nonsynonymous mutations were within multiple spike protein epitopes that might impact the vaccine efficacy. Conclusion: Our study advocates requirement of active monitoring of country-specific mutations and vaccine efficacies in respective countries.

The novel coronavirus SARS-CoV-2 since its emergence in 2019 has been in a state of constant evolution toward attaining higher stability, increasing transmissibility and higher virulence. To minimize the transmission of this virus across the globe, international travel restrictions had been imposed toward the end of March 2020. China had been the first to implement travel restrictions across country and state borders as early as January 2020 [1,2], while cross-country travel restrictions were steadily enforced in other countries in the time period ranging from late March to early April (www.bbc.com/news/world-52103747). As the number of SARS-CoV-2 isolates were sequenced and made publicly available, one can trace the evolution of the virus through analyzing its mutational landscape. Several nonsynonymous mutations have been characterized in the early months of the pandemic, chief among which are the substitutions D614G of the spike glycoprotein (S protein) and R203K and G204R of the viral nucleocapsid (N protein) [3]. The newly evolved S protein variant has been shown to exhibit higher rates of infection, replication and stability [4,5]. These variant strains could be seen to be prevailing across most countries which were a reflection of the inter-country migrations that existed before travel measures were implemented. It is only in the months following the imposition of the travel restrictions that mutations specific to a particular geographical location, if any should be observed.
Since there is a dearth of medication and therapeutic agents available to combat the SARS-CoV-2 pathogen, engineering vaccines and drugs against this virus has gained top priority [6]. Several strategies of vaccine development have been implemented, which are classified as nucleic acid vaccines like DNA and mRNA vaccines, vaccines employing dead, inactivated or attenuated form of the virus, vaccines constructed within carrier vectors and others exemplified by using protein subunits and virus-like particles [7,8]. Among the forthcoming protein-based vaccines include PicoVacc, developed by Sinovac [9,10], BBIBP-CorV [11,12], which have successfully been tested in various animal models. Studies have demonstrated that SARS-CoV-2 structural and non structural proteins can serve as potential candidates for vaccine development, by producing adequate immune responses [8,13,14]. Spike protein is the popular target for epitope constructs specifically due to its ability to elicit strong immune responses within host cells [9,10,[15][16][17][18][19][20][21][22]. The engineered variants, termed S-2P has been used as candidate vaccines in various gene-based approaches including mRNA vaccines by Moderna and BioNTech/Pfizer [13,23,24] and the AD26 vaccines developed by Janssen Pharmaceuticals [17,25]. Aside from these, other modifications for further stability enhancement of the S protein have been contrived in designing protein vaccines by Novavax [26,27]. These provided assuring results when tested in animal models and are currently underway through Phase III of clinical trials.

Experimental study
In this study, we have used 220,000 SARS-CoV-2 sequences and have analyzed for nonsynonymous substitutions in amino acid sequences throughout all viral proteins. We have segregated them into two categories based on the time around implementation of international travel restrictions and have compared the mutational profiles of 35 selected countries in the two time periods. We identified the mutations specific to certain countries, those that had diminished in abundance as well as those that were dominating with time. We assessed the temporal variation of highly mutated residues over the course of the year, to gain insight into the robustness of these variants. To shed light into the plausible effectiveness of the vaccines, country-specific frequently mutated residues were subsequently mapped to the predicted epitopes targeted toward the spike protein for potential vaccine development.

SARS-Cov-2 sequences
A total of approximately 220,000 SARS-CoV-2 sequences were downloaded from GISAID (www.epicov.org/epi3/), available as of 28 November 2020. For each of the open reading frames (ORFs) and proteins, partial sequences were filtered out, which accounted for less than 1% of the total isolates. The protein sequences were individually aligned using MUSCLE. The SARS-CoV-2 strain, originating in Wuhan, China (GenBank accession ID: NC 045512) was considered as the reference strain. Since countries started implementing international travel measures from the 3rd week of March to early April, the isolates were segregated into two groups, based on whether they were sequenced before May (group B for 'before') or after May (group A for 'after') 2020. We were left with a total of approximately 215,000 sequences, out of which 135,000 sequences are in group A and 80,000 sequences are in group B.

Selection of countries for carrying out mutational profiling
We have selected those countries for which there were a minimum of 100 complete sequences, from May to 28 November, provided that these countries also contained a minimum of 30 sequences in the span of January to April. By this measure, we obtained a total of 35 countries in which the mutational characterization was carried out, arranged according to their geographical locations ( Figure 1).
Identification of frequently mutated residues for each of the viral proteins Nonsynonymous mutations were identified independently for each of the selected countries for group A and B. A complete summary with all the unique variants, each characterized by mutations in one or more amino acid positions, along with the frequency of occurrences are calculated. We have defined a substitutions having at least 2% occurrence in all the variants as a frequently mutated residue. An overall summary table depicting the frequencies for all the 35 selected countries, covering each of the identified frequently mutated residues are provided separately for the two groups (Supplementary Table 1A & B).

Comparison of nonsynonymous mutation in pre-international & postinternational travel restrictions
In order to obtain a bird's eye view of the impact of the international travel measures in shaping the mutations in SARS-CoV-2, we have calculated a weighted average of the number of frequently mutated residues for each structural and nonstructural protein in 35 selected countries ( Table 1). The prevailing global trends with the most mutable proteins in both the groups were compared. For the individual countries, both before and after the travel regulations were imposed, the number of frequent mutations per 100 amino acids were calculated (Supplementary Table 2). The variations in the mutation patterns for one or more countries from the global framework were observed.  Classifying the different mutations on account of their temporal variations The mutated residues were grouped into three classes. The first of these occurred with a frequency of 10% and higher in at least one of the countries in group A and B were considered to have originated before the implementation of intercountry travel measures and continued to prevail, either in the previous location or in other regions around the globe. On the contrary, mutations which transpired in 10% of the strains in the early months of the pandemic but were observed to dwindle down after May were bracketed in the second class; whereas, mutations that were virtually nonexistent before May but emerged with higher frequencies during the ensuing months were placed in the third class. A heatmap depicting the frequencies of these mutations in their respective groups was generated using the R package 'gplots', where hierarchical clustering of the various substitutions was carried out using 'ward.D2' method ( Figure 1).
Tracking the evolution of the mutated residues over time in a bimonthly basis For each of the mutations comprising the three groups, we have calculated their frequency of occurrences in every 2 months, from January to October 2020. Since the amount of sequencing in each country is largely inconsistent, in order to capture the mutational frequencies with a fair degree of confidence, we chose to include the total sequences for every 2 months for frequency calculation instead of conducting the study on a monthly basis. The frequencies were calculated only for those countries in which the mutation was observed to occur with a minimum of 10% abundance in at least one of the timepoints considered.
Country-specific co-occurring amino acid substitutions We carried out analysis of co-occurring amino acid substitutions, considering those that have occurred with 10% or higher frequencies in each country, encompassing all viral proteins and have analyzed for the residues which have co-occurred with higher abundance. This allowed us to identify the viral variants which were showing greater dominance in a certain region along with the co-occurring mutations in the respective variants. We have also discerned the substitutions which had occurred simultaneously in the countries prone to those substitutions, and have further corroborated the fact from the bimonthly frequencies of such variants. Overlapping the frequent mutations in the S-protein with predicted epitopes from external studies A number of studies have been carried out for speedy engineering of vaccines against the SARS-CoV-2 virus. We selected eight studies which have experimentally determined potential immune-responsive epitopes owing to several viral proteins, of which we are interested especially with the epitopes delegated toward the spike protein [12,[28][29][30][31][32][33][34] (Supplementary Table 5A). Several studies have used in-silico approaches to predict T-cell and B-cell epitopes that could potentially be employed to trigger immune responses in the host. We have selected three such independent studies which used different methodologies to predict for putative epitopes that could be implemented in designing safe and effective vaccines [15,35,36] (Supplementary Table 5B). These potential epitopes were intersected with all frequently mutated S-protein substitutions obtained in our study. The mutations found to be harbored within one or more of the epitopic sites were observed along with the respective countries in which they were dominant. To further ascertain the progression of these mutants over time, we studied the bi-monthly frequencies of these variants.

Results
Frequently mutated residues in SARS-CoV-2 during the first year of pandemic To assess the effect of international travel restrictions in shaping the mutational profile in 35 countries, all protein sequences of SARS-CoV-2 comprising approximately 215,000 isolates were analyzed. We identified 334 amino acid substitutions that were observed in at least 2% of the total number of isolates, for at least one country in the period from January to April 2020, while 656 such frequently mutated residues were obtained in the interval from May to November 2020 (Supplementary Table 1A & B). Among the structural proteins, the envelope (E) and membrane (M) proteins were the most resilient to nonsynonymous changes. On the other hand, the S and N proteins displayed greater plasticity, respectively accounting for 41 and 37 substitutions during the initial 4 months (group B, for 'before') and 80 and 69 substitutions during post travel restrictions (group A, for 'after'). ORF3a, ORF8 and ORF7a were the most mutable in their respective phases. Expectedly, ORF1ab, which constitutes around 70% of the viral genome, represented by 16 nonstructural proteins (NSPs) had encountered the most number of substitutions, cumulatively equaling to 214 and 408 substitutions in group B and A, respectively. Probing into the substitutions sustained by the NSPs, NSPs 2, 3 and the replication transcription complex proteins NSP-12, 13 and 14 were the highest in the group. The summary tables containing the frequencies of each of the frequently mutated residues, as observed in 35 countries for both groups are presented in Supplementary Table 1A & B. Overall, we found that the number of mutations in group A had at least doubled compared with their status as observed in Group B.
Nonsynonymous mutations per 100 amino acids A weighted average of the frequently mutated residues per 100 amino acids (≥2% of all variants in a country) across all countries was calculated separately for both group B and A (Table 1). For the months preceding the travel restrictions, the global trend suggested that the mutation rate per 100 aa was the highest for ORF3a (0.84%) and ORF8 (0.94%) for the nonstructural proteins while N protein (0.93%) exhibited the highest rate among the structural proteins. When calculated for the following months, N (1.56%), S (0.4%), ORF3a (1.22%) and ORF7b (0.61%) were found to have the highest rates of mutations. N, ORF3a and ORF7b showed the largest surge in mutation rates over the two phases, while ORF8 depicted opposite behavior with an overall decrease in rate by 0.75 residues per 100 aa ( Table 1). The structural proteins E and M continued to display remarkable robustness and remained largely conserved over the entire period. Upon examining the rate of mutations for each country (Supplementary Table 2), some differences were observed. In the months leading to the travel curtailments where ORF8 had the highest mutation rates in 13 countries, followed by ORF3a and N protein in eight countries each. However, in the period from May to November, N protein displayed the highest mutation rates in 18 of the 35 countries, while in seven countries ORF3a had encountered the highest number of mutations (Supplementary Table 2). Country-wise mutation rates were most stark in case of ORF8, which had remained immune to changes in 21 countries while effecting fairly high rates of substitutions in the remaining countries. Apart from these, ORF7a in South Korea, ORF7b in UK and Latvia and ORF8 in Kenya and South Africa showed the highest rate of mutations.

Mutations that have declined &/or eliminated after international travel restrictions
Moving on to the mutations that were observed in the period before intercountry travel limitations, we notice a distinct cluster comprising mutations in N and M proteins, ORF3a and NSPs 2,3, and 4 ( Figure 1B). In the subsequent months, these variations were seen to have substantially decreased, with most of them being nearly wiped out. It was also interesting to observe substitutions P344S in N protein (UAE), L781F in NSP3 (South Korea), G172C in ORF3a (Bangladesh), E120K in NSP3 (Bangladesh), S183Y in N protein (New Zealand), L275F in ORF3a (Ireland), Y28H in S protein (UAE) were prevalent in the group B before May while being significantly reduced in group A.

Mutations that have newly emerged or increased sharply post-travel restrictions
We next surveyed the mutations that were virtually nonexistent before May, but gained prominence in specific locations in the subsequent months ( Figure 1C). Interestingly, we noticed that the European countries formed a separate cluster comprised of four mutations in the S protein, three each in N protein and ORF3a and others encompassing NSP4, NSP6, NSP7, NSP9, NSP12, NSP13 and NSP16. These were clearly not pervasive before May and had come into effect in the wake of the travel limitations. Apart from these, other noteworthy mutations that originated in specific countries post-travel restrictions were found in N and NSPs 3, 5 and 12 in Japan, three mutations comprising the S protein, NSP3 and NSP15 in China. Additionally, substitutions exclusive to South Korea among NSP7 and NSP16, mutations in S and NSP3 specific to Czech Republic, two mutations in ORF7a and NSP3 prevalent only in Kenya and two variations within NSP3 and S protein in Canada were among the ones that were absent in the initial months of the pandemic.
Temporal survey of the mutated variants on a bimonthly basis Upon identifying mutations specific to certain countries, we wanted to see whether these variations had come into being or declined sharply at a distinct time point. Estimation of bimonthly frequencies for the corresponding countries (Supplementary Table 3) allowed us to perceive three distinct trends with which the mutations developed in the course of time. The substitution S D614G was seen to have attained close to 100% concordance in most countries by October, with North American and European countries gaining the mutation at a faster rate compared with Middle East and Asian countries (Figure 2A-C). In contrast, the co-occurring substitutions R203K and G204R in N protein showed more diverse variations ( Figure 2D-F). In most of the countries, these substitutions showed greater dominance over the wild-type, with countries like Australia, Mexico, Russia, Japan, Bangladesh and Latvia, where the corresponding frequencies rose to approximately 90% and higher. However, in several European countries namely Belgium, Denmark, France, Germany, Ireland, Netherlands, Norway, Spain, Switzerland and UK and also in the USA and Brazil, this variant has not gained a firm footing over its wild-type counterpart, remaining more or less constant over the specified time period or plummeting after brief ascension. ORF3a substitution Q57H continued to reign disproportionately in different countries, where it can be seen to have attained a semblance of stability in USA, Canada, Mexico and Sweden ( Figure 2G), while low abundance or plunging rates were observed in Czech Republic, Germany and Australia among others ( Figure 2H). Contrarily, Belgium, France, Switzerland and China were among the nations where this variant was seen to be on the ascension after a brief hiatus around July-August ( Figure 2I).
There were instances of mutated variants that had gained a brief upsurge initially but were seen to diminish in stature in the following months. D936Y and I119V substitutions in S protein specific to Sweden and Peru showed dwindling characteristics ( Figure 3A). A similar trend was observed for S197L substitutions in nucleocapsid in Spain and Australia, which were coincident with F308Y substitution in NSP4 in the same countries ( Figure 3B). Similarly, N P13L and NSP3 T1198K variants that were appreciably high in India and Singapore before May, sharply reduced concurrently in the later periods ( Figure 3C). Moreover, S183Y, Q240K and P344S in nucleocapsid, characteristic of New Zealand, Ireland and UAE had taken a much sharper nosedive ( Figure 3D). NSP2 T85I with the exception of USA had peaked in the interval of March-April before following a downward trajectory ( Figure 3E). Among other noteworthy mutations that were on the descent after a brief period of prevalence are NSP12 L323P and ORF3a G251V as seen in myriad nations ( Figure 3F & G).
Several new mutations were found to have emanated after May and were seen to be on the rise in specific countries. Our interest was specifically arrested by two substitutions, A222V ( Figure 4A) and S477N ( Figure 4B) in S protein that were seen to emerge after July and progress rapidly; thereafter, the only exception being Norway ( Figure 4A). These mutations were exclusively obtained in high percentages largely among the European nations,  with the exception of the 477N variant also being widespread and on the rise in Australia ( Figure 4B). Among other variants, S protein mutations on the ascension included S98F exclusive to Belgium and Netherlands, L18F in the UK and V1122L in Sweden ( Figure 4C). N-protein variants R40C specific to Latvia and P151L in Japan were seen to have reached 50% abundance by October. Among the NSPs, significant mutations that were on the ascendency include NSP2 substitution I120F exclusive to Australia and Bangladesh, coincident NSP3 substitutions Q203R and S284G in Latvia and I441V in Belgium and S543P in Japan, NSP5 L89F and NSP5 P108S in USA and Japan, respectively, the P59T variant of NSP10 occurring in Latvia, A423V and V720I mutations of NSP12 in Japan and Czech Republic, respectively ( Figure 4D).
Mutations that have co-occurred with identical or nearly identical frequencies Now that we had identified prominent mutations in diverse SARS-CoV-2 proteins, we wanted to find the substitutions which have co-occurred in different geographical locations. We have considered variants with a prevalence of at least 10% for a particular country to be the dominant variant. These prevailing variants arising from the merger of various proteins along with the corresponding frequencies are presented in Supplementary Table 4. We have noticed that with the exception of China, all the substitutions have occurred in the background of the S protein D614G variant, which in all likelihoods preceded the inception of the other mutations. Apart from the known co-occurring mutations in the N protein at 203rd and 204th residues, we noted several other mutations that have appeared concurrently as also verified by having similar frequencies in one or more countries. Spike variant A222V went hand-in-hand with the A220V variant in the nucleocapsid, seen exclusively in the European nations. F308Y substitution in NSP4 co-occurred with S197L variation in the N protein, as observed in Spain and Australia. However, these resulting variants did not prosper and were seen to be obliterated over time. Substitutions G15S and T428I in NSP5 and NSP3, respectively, were observed in Peru, Canada, South Africa and New Zealand with appreciable frequencies. These mutations were gained after May but were seen to be decreasing as of October. Substitutions M234I and A376T in N, M324I in NSP4, A185S and V776L in NSP12 alongside K218R and E261D in NSP13 were seen to have arisen with identical measures after June and have reached appreciable levels in Belgium, Denmark, France and Switzerland. NSP13 also encountered simultaneous substitutions in 504th and 541st residues as seen in USA. ORF3a had similarly manifested corresponding mutations in 172nd and 202nd residues observed in Belgium and Netherlands from the month of July (Supplementary Table 3).
Mapping the frequently mutated residues with spike protein epitopes We have mapped the frequently mutated S protein residues observed to be prevalent in one or more countries with Spike protein epitopes obtained from 11 studies ( Figure 5, Supplementary Table 5A & B). Among the studies concerned with experimentally determined epitopes, one study [31] escaped convergence with any frequently mutated residues. The remaining reports contained at least one site which overlapped with the observed mutations in the S protein. The list of experimentally predicted epitopes with the corresponding mutations nestled within them along with the countries are provided in Supplementary Table 6. Bimonthly frequencies were calculated for each of these mutations (Supplementary Table 7) to trace the course of their advancement through time and determine those mutants, which might seem innocuous in the early stages but could pose a threat to vaccine potency if left unchecked. Most of the studies predicted epitopes within 500-700 amino acids positions of S-protein ( Figure 5). This region contained perceived mutational changes, chief among them are substitutions at positions 614 and 626 of S protein. The A626S variant exhibited contrasting trends in Czech Republic as opposed to Norway, Italy and Denmark. In both cases this mutant came into existence July onwards, but diminished sharply after peaking in Czech Republic, while was on the rise in the remaining countries. Other noteworthy mutations which may compromise vaccine effectiveness include T632I (New Zealand), G639S ( Italy), E654A (Ireland), Q675H (Germany), Q677H (Sweden, UAE) and P681H (New Zealand), all of which were seen to increase in the months after travel measures were imposed albeit at varying rates. Moreover, theV1176F variant in Brazil warrants immediate attention as these were seen to accumulate substantially in the span of 10 months. Mutations D1163Y and G1167V, both identified in Spain should not be discounted as these were also on the ascendancy as of October, while being present within a putative epitope as outlined by at least two studies. Another variant P1162S, shared within epitopes predicted by these studies was observed with a frequency greater than 10% in Portugal around the time of travel measures but lack of sufficient sequencing data in the latter months deprives us to get information of its evolution. Another substitution S459Y soaring over 10% frequency in Czech Republic was also overlapped with predicted epitopes by two studies. However, this variant were seen to subside in abundance by October in the respective countries. A solitary epitope contained P272L substitution which was discerned to rapidly gain grounds in many European nations. Similarly, S477N variant present in one epitopic site was already seen to be abundant in Australia and numerous European nations. Other epitope containing mutations that had an upwards curvature included V483A in UAE, seen only after September, F486L in Netherlands and V90F in Norway and Switzerland seen from July onwards.
The in-silico epitope predictions likewise mapped with several substitutions, the details of which are presented in Supplementary Table 8. These included substitutions L18F, seen to be increasing in UK, A222V rampaging among the European population and M1229I specific to Czech Republic. Among other substitutions, S98F variant as seen previously, exhibited an upwards trajectory, more noticeable in Belgium and Netherlands, M153T and S459Y variants were uncovered in Japan and Czech Republic, respectively, while S477N was a major mutant that had disseminated across the European countries.   Figure 5. Illustration of the predicted spike protein epitopes which have overlapped with one or more frequently mutated S protein residues. The countries in which they were found to be prevalent post imposition of travel measures have been mentioned alongside. The residues with frequencies greater than 10% have been highlighted along with the corresponding countries.

Discussion
We carried out a thorough country-wise characterization of the mutational landscape of SARS-CoV-2. Upon comparing the mutational profile of individual countries before and after the cross-country travel regulations allowed us to perceive the evolutionary trajectories of pre-existing as well as newly erupted variants. The S protein D614G, which had been characterized to have superior transmission rates and stabilities compared with the D variant [3][4][5], was found to have reached 100% confluence in most countries. Other previously reported variations in the N, ORF3a, ORF8 were also seen to prevail in multiple nations with varying degrees, which could be attributed to the diverse ethnic backgrounds or environmental conditions of the different regions. Several mutations that had materialized before April in certain countries, were observed to have been wiped out from the endemic populations after a brief period of dominance. One of the explanations of this phenomenon could be that the resulting variations greatly diminished the stability or transmissibility of the new SARS-CoV-2 variants, which led to their gradual elimination. Another hypothesis that could be attributed to this is that the new mutations were highly lethal in nature, and the infected individuals upon contracting these variants would consequently be placed under stringent quarantine; thereby, preventing the propagation of such variants in larger numbers. Other factors such as the frequency of international travel to and from these countries during the concerned period, as well as the extent of travel measures in these countries could have a say in the final outcome of these variants. One could try to draw a correlation between the period of prevalence of these mutations and the mortality rates of the associated countries, in order to corroborate this speculation. We were essentially interested to study whether the imposition of travel measures could lead to appearances of country-specific mutations. We are aware that inter-country travel restrictions are neither rigid nor completely infallible. A thorough modeling of inter-country travel restrictions would be a multifactorial one including pharmaceutical interventions, implementation and level of restriction on following the COVID-19 safety protocols, national and international travel restrictions at different time points during the pandemic among other factors. We wanted to see the temporal variation of mutational profile of the virus, and whether it could impact on the vaccine efficacy based on the country-specific mutations, specifically in the spike protein. Our study provides a rough estimation whether the travel restrictions could play a role in shaping country-specific mutational profiles. Consequently, we have observed several mutations exclusive to specific countries, which had emerged in the ensuing months of the pandemic. Several new strains have been found to originate after May, which were seen to increase with the passage of months. Particular attention should be drawn toward the European countries which have been the epicentre for few major variants especially the spike protein variants A222V and S477N and the N protein substitutions A220V, M234I and A376T. From the bimonthly frequency data, we could visibly track the upward progress of these variants and with recent relaxations in travel measures, these strains could in short time pervade the globe.
Extensive efforts to design safe and efficacious vaccines were already in full swing since the early days of the pandemic. Structural proteins, especially the spike glycoprotein are the preferred targets for vaccines as they are directly involved with fusion with and entry into the host cells. So far most of the mRNA-vaccines have been designed to target the receptor-binding domain of the spike glycoprotein, which has shown to elicit immune response and synthesize antibodies when experimented in murine cell models and the sera of humans who had contracted the disease. However, other proteins like nucleocapsid, ORF6, ORF8 and NSP3 can also be viewed as potential candidates when designing multi-epitope vaccines against the virus [8,[37][38][39]. As of December 2020, as many as 55 vaccines have entered the foray of clinical trials with many more yet in their preliminary stages. Among the frontrunners that have gained approval include the mRNA-based vaccines named BNT162b2 [40] and mRNA-1273 [41] manufactured by ModernaTX Inc.; CoronaVac [9], which utilize an inactivated form of the virus; Sputnik V [42,43]. However, since spike proteins only comprise a small fraction of the viral proteome, epitopes focusing only on this singular protein may not be as effective in eliciting a strong immune response compared with a mutli-epitope vaccine designed to target multiple viral proteins. Several of the SARS-CoV-2 proteins including S, N, NSP5 (Mpro) and NSP12 (RdRp) have been successfully crystalized [18,[44][45][46][47], which can be utilized in engineering the therapeutic agents. Several bioinformatic approaches and in-silico tools have been developed which have predicted peptides that could be used as epitopes, in inducing immune response against the virus. However, at the time of contriving these predictions, some of these novel variants that have been uncovered only after a certain time point would not have been taken into account. We have intersected the predicted epitopes from a total of 11 independent studies, eight of them corroborated by experimental results. A handful of them coincided with highly mutable sites while several mutations originating in the months following travel limitations and were seen to be on the rise thereafter. Among these the residue at positions 626 and 477 need close supervision while selecting epitopes as vaccine candidates.
The vaccines which recruit the whole virus in inactivated forms such as PicoVacc [43] and BBIBP-Corv [12,48], can code for the structural as well as accessory viral proteins in the host cells, which are desirable as they present multiple epitopic sites for antibody production. Though, the M and E-proteins are largely conserved, they have been found to display poor immunogenicity [49]. The N protein, on the other hand, though found to be highly immunogenic failed to provide adequate protection against SARS-CoV-2 when tested in mice models [50]. We have obtained several mutations within the N protein in diverse countries, chief among them being the modifications at positions 13, 220, 234 and 376 which should not be disregarded. A point that needs to be highlighted here is that the antibodies induced by the vaccines are polyclonal in nature. Therefore, even if one of the epitopic sites of the antibodies ate rendered ineffective owing to mutational changes, the other veritable epitopic sites should come into play in combating the virus instead of the antibodies being entirely dispensable. Only the degree of efficacy of the vaccines may vary among the nations. Moreover, new viral strains are continually cropping up in different countries, with a novel, highly contagious strain being recently reported in the UK [51]. The N501Y mutation, a key substitution within the UK strain, was inspected for in our study in the months of September and October, where we have found it to be present in only three countries, with highest among them being UK (0.5%), followed by South Africa (0.22%) and USA (0.03%). Therefore, it is comprehensible that the meteoric rise of this variant emanated around the month of December. Analyzing for any other emerging mutations with frequencies above 2% in the S protein, in the peripheral months of September and October as per out study, we have detected several new substitutions among various countries, with the variants L179F and Q913H in Japan and F486L in Netherlands to have frequencies above 10% (Supplementary Table 9). These mutants had not been observed with noteworthy frequencies in the preceding months, but only came into prominence toward the later period. As we cannot state with any degree of certainty the evolutionary course these variants will follow, countries should actively monitor these sites in the coming months. If any of these mutations are inadvertently incorporated among the epitopes of future vaccines, they may render them partly impotent in countries where those variants are thriving. Now that there have been considerable relaxations in inter-country migrations, other nations should also follow up the status of these mutations to confirm the fidelities of the associated amino acid residues. We conclude with the hope that the worst days of the pandemic are behind us, and await the timely disbursement of safe and effective vaccines while continuing to follow adequate social distancing norms around the world.

Conclusion
The COVID-19 pandemic inflicted by the novel coronavirus SARS-CoV-2 continues to rage around the world in varying degrees. The virus is constantly in a state of evolution and reports of new viral strains continue to emerge in various geographical regions. We have found several novel variants of the virus in different countries, some of which were seen to have emanated toward the latter half of 2020. With inter-country travel restrictions being lifted, these novel variants could permeate the globe within a short span of time. As the administration of vaccines commences throughout the world, caution should be exercised that these novel mutations do not curb the effectiveness of the vaccines. We have already shown that several of the frequent spike mutations took residence within one or more of the predicted epitopic sites. Our study warrants a need of active monitoring of the spike protein mutations in the respective countries with regards to vaccine efficacy.

Summary points
• In order to curb the spread of the COVID-19 pandemic, countries all over the world had implemented international travel restrictions. • To examine the role of the travel measures in shaping the mutational profile of SARS-CoV-2, we tabulated the viral protein mutations encompassing 215,000 sequences across 35 countries before and after the travel restrictions. • Three distinct classes of mutations were identified in the two time periods, namely those that had gained stability, those that had dwindled in abundance and others that had emerged post travel restrictions and were on the rise. • Notable mutations included A222V and S477N in spike protein and A220V in the nucleocapsid, which were observed exclusive to European countries post travel curtailments. • Bimonthly frequencies of the highly mutated residues revealed the evolutionary trends of these mutations over time. • Spike protein mutations overlapped with several predicted epitopes, advocating close monitoring of the mutations for effective vaccine development.

Supplementary data
To view the supplementary data that accompany this paper please visit the journal website at: www.futuremedicine.com/doi/suppl/10.2217/fvl-2021-0062 Author contributions R Chatterjee conceptualized the study. S Laha performed the data analysis, S Laha and R Chatterjee wrote the manuscript.

Financial & competing interests disclosure
This work was supported by the funding of Indian Statistical Institute, Kolkata and the CSIR provided a fellowship to S Laha. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
No writing assistance was utilized in the production of this manuscript.