Identification of potential vaccine targets for COVID‐19 by combining single‐cell and bulk TCR sequencing

Dear Editor, COVID-19 is a highly infectious novel pneumonia that has become the largest health crisis in modern history. T cells participate in recognizing and clearing viral infections and also helping B cells to produce antibodies. This pivotal role of T cells in immunity makes them ideal targets for studying the immune response in COVID-19. Here, we use scRNA-seq, scTCR-seq, deep TCR-seq, and HLAgenotyping to decode the matching T cell phenotype and antigenic epitopes of 16 early-recovery COVID-19 patients in the context of human HLA haplotypes. We collected fresh blood samples from 16 early-recovery patients with COVID-19 (Table S1). PBMCs were isolated for subsequent data generation: (1) single-cell transcriptome sequencing, (2) single-cell TCR sequencing, (3) deep TCR repertoire sequencing, and (4) HLA genotyping (Figure 1A andTable S1). Overall, we totally obtained single cell gene expression data from 26 223 T cells, single cell paired αβTCRs from 27 467 T cells and hypervariable regions of immune receptors from 4.9 million TCR clones (Table S2). In addition, high-resolution HLA typing results were performed by sequencing 5 HLA genes, including HLA-A, B, C, DRB1, and DQB1 (Table S3). In order to reveal the T cell response changes caused by COVID-19, 8 healthy controls (healthy cohort1) with scRNA-seq profiled by 10x Genomics and 31 healthy controls (healthy cohort2) with deep TCR-seq were included in this study (Figure S1). Using unsupervised clustering, we identified 13 distinct T cell types (Figure 1B), including naïve CD4+ T cells (Naïve CD4), Th1 cells (Th1), Th17 cells (Th17), Tfh cells (Tfh), cytotoxic CD4+ T cells (CD4 CTL), regulatory T cells (Tregs), naïve CD8+ T cells (Naïve CD8), memory CD8+ T cells (Memory CD8), transitional CD8+ T cells (transitional CD8), terminal effector CD8+ T cells (Effector CD8), gamma-delta T cells (gdT), MAIT cells (MAIT), and platelet-like cells (Platelets). Cells were annotated by SingleR.1 Classical marker genes were used to distinguish different cell types (Figure 1D, Figure S2).

Of all the T cell types identified in this study, the proportion of Transitional CD8, Effector CD8 and CD4 CTL in COVID-19 patients were significantly higher than those in healthy controls (P = 1.91 × 10 −4 , 2.52 × 10 −2 , and 2.98 × 10 −3 , respectively), while the proportion of Naïve CD4 was significantly lower (P = 1.24 × 10 −2 ) ( Figure 1C). Single cell repertoire analysis demonstrates that larger clonotypes exhibited a non-uniform distribution of cell types with an enrichment for cytotoxic T cells, such as transitional CD8, effector CD8 and CD4 CTL ( Figure 2). Interestingly, effector CD8 and CD4 CTL in our study also jointly expressed resident memory marker ZNF683 2 and tissue exit marker S1RP5 3 ( Figure 1D). According to Ref. (4), these cells might be cytotoxic T cells recently egressed from tissues (such as lung tissue) and reentered circulation, an observation that waits for further experimental validation.
To investigate whether the TCR repertoires of the recovered patients with COVID-19 differentiate from healthy individuals, we compared the deep TCR-seq data between patients with COVID-19 and controls. At the repertoire level, T cell diversity in COVID-19 patients was significantly lower than that in controls (P < 0.0001, Figure 3A), consistent with the clonal expansion upon antigen exposure. Clustering of TCRs with similar CDR3s is an effective approach to identify antigen-specific T cells, [5][6][7] as TCRs sharing similar motifs from distinct individuals may also share antigen-specificity. Through TCR clustering, we detected 29 409 TCR groups (Table S4). Interestingly, COVID-19 patients shared more TCR groups than healthy controls ( Figure 3B). To obtain patient-specific TCRs, we searched for CDR3 groups significantly enriched in the COVID-19 cases, and identified a total of 916 groups (FDR < 0.05, Table S5), which were referred as 'COVID-19 TCR groups' in the downstream analysis. These groups of T cells are enriched for activated T cells, specifically, the transitional CD8, effector CD8 and CD4 CTL subtypes    The identification of COVID-19 TCR groups also allowed us to uncover the candidate antigenic epitopes from the virus genome. In total, 866 9-mer peptides from 11 SARV-CoV-2 proteins were computationally predicted to bind patient HLA alleles profiled in our study (Table S6). We examined the peptides and CDR3 groups found in multiple individuals, and identified 1602 cooccurring TCR-antigen pairs that were significantly shared by the same patients, covering 31 CDR3 groups and 114 peptides (FDR < 0.05, Table S7). Of these, we identified two pairs, each with a single TCR group and a single antigen (FDR < 0.001, Figure     Those peptides presented by MHC-I with high affinity, significantly cooccurring with a COVID-19 TCR group are colored as red to multiple epitopes ( Figure 4A), which demonstrated similar motifs in the TCR contact regions. 8 We next located all the 114 peptides in the virus genome, and found more than 91% were distributed in proteins ORF1ab, S, N, and ORF3a ( Figure 4B), where ORF3a showed significant epitope enrichment (P = 0.0056, Binomial test). In summary, our analysis revealed a number of candidate peptides as promising targets for COVID-19 vaccine development.
This study provides an effective solution for identifying potential antigenic peptides based on large-scale TCR repertoire and HLA typing. The combined use of singlecell and deep TCR sequencing provided us with singlecell resolution, and also enabled us to obtain millions of immune receptors. 9 However, highly abundant T cell clones may not be disease-specific. 10 With this strategy, we grouped similar TCRs to search for evidence of convergent selection in patients. Mapping these receptors to single cell data identified novel T cell phenotypes specific to recovery COVID-19 patients. In addition, with HLA genotyping, we were able to provide individualized TCR epitopes, which allowed us to investigate their associations with recurrent TCR groups across different individuals. This method led to statistically confident antigen targets and provided guidance for efficient mRNA vaccine design. We hope that our findings and immune receptor datasets will inform the development of next-generation vaccines that can better activate natural T cell immunity for COVID-19.

A C K N O W L E D G M E N T
This work was supported by finding from the National Natural Science Foundation of China (Nos. 61822108 and 62032007 to Q.J.).

C O M P E T I N G I N T E R E S T S
The authors declare no conflict of interest.

D ATA AVA I L A B I L I T Y S TAT E M E N T
Single cell transcriptome sequencing data were deposited on Zenodo (https://doi.org/10.5281/zenodo.3747336). Custom scripts in this study are available upon request to the corresponding author. Pingping Wang, Zhaochun Xu, Wenyang Zhou contributed equally as first authors.