Virus report

Virus record identifiers, sample information, genomic locations, and products

Virus report

Virus record identifiers, sample information, genomic locations, and products

The downloaded virus package contains a virus data report in JSON Lines format in the file:

ncbi_dataset/data/data_report.jsonl

Each line of the virus data report file is a hierarchical JSON object that represents a single virus record. The schema of the virus record is defined in the tables below where each row describes a single field in the report or a sub-structure, which is a collection of fields. The outermost structure of the report is VirusAssembly.

Table fields that include a Table Field Mnemonic can be used with the dataformat command-line tool's --fields option. Refer to the dataformat CLI tool reference to see how you can use this tool to transform virus data reports from JSON Lines to tabular formats.

Sample report

{
  "accession": "NC_045512.2",
  "annotation": {
    "genes": [
      {
        "cds": [
          {
            "maturePeptide": [
              {
                "accession": "YP_009725297.1",
                "cdd": [
                  {
                    "accession": "CDD:288369",
                    "name": "Non structural protein Nsp1",
                    "range": {
                      "begin": "13",
                      "end": "127"
                    }
                  }
                ],
                "name": "leader protein",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "266",
                      "end": "805"
                    }
                  ],
                  "seqId": "NC_045512.2:266-805",
                  "sequenceHash": "BFEE0830",
                  "title": "leader protein [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "otherNames": [
                  "leader protein",
                  "non-structural protein 1",
                  "nonstructural protein 1",
                  "nsp1"
                ],
                "protein": {
                  "accessionVersion": "YP_009725297.1",
                  "seqId": "YP_009725297.1",
                  "sequenceHash": "DFF407F9",
                  "title": "leader protein [polyprotein_range=YP_009724389.1:1-180] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              },
              {
                "accession": "YP_009725298.1",
                "name": "nsp2",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "806",
                      "end": "2719"
                    }
                  ],
                  "seqId": "NC_045512.2:806-2719",
                  "sequenceHash": "E741D86",
                  "title": "nsp2 [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "otherNames": [
                  "non-structural protein 2",
                  "nonstructural protein 2",
                  "nsp2"
                ],
                "protein": {
                  "accessionVersion": "YP_009725298.1",
                  "seqId": "YP_009725298.1",
                  "sequenceHash": "58F71ADE",
                  "title": "nsp2 [polyprotein_range=YP_009724389.1:181-818] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              },
              {
                "accession": "YP_009725299.1",
                "cdd": [
                  {
                    "accession": "CDD:289172",
                    "name": "Protein of unknown function (DUF3655)",
                    "range": {
                      "begin": "102",
                      "end": "169"
                    }
                  },
                  {
                    "accession": "CDD:366746",
                    "name": "Macro domain",
                    "range": {
                      "begin": "240",
                      "end": "340"
                    }
                  },
                  {
                    "accession": "CDD:314498",
                    "name": "Single-stranded poly(A) binding domain",
                    "range": {
                      "begin": "533",
                      "end": "675"
                    }
                  },
                  {
                    "accession": "CDD:288939",
                    "name": "Coronavirus polyprotein cleavage domain",
                    "range": {
                      "begin": "680",
                      "end": "743"
                    }
                  },
                  {
                    "accession": "CDD:370080",
                    "name": "Papain like viral protease",
                    "range": {
                      "begin": "746",
                      "end": "1064"
                    }
                  },
                  {
                    "accession": "CDD:292868",
                    "name": "Nucleic acid-binding domain (NAR)",
                    "range": {
                      "begin": "1089",
                      "end": "1201"
                    }
                  },
                  {
                    "accession": "CDD:391938",
                    "name": "even-transmembrane G protein-coupled receptor",
                    "range": {
                      "begin": "1494",
                      "end": "1563"
                    }
                  },
                  {
                    "accession": "CDD:341315",
                    "name": "TM helix 5 [structural motif]",
                    "range": {
                      "begin": "1497",
                      "end": "1519"
                    }
                  },
                  {
                    "accession": "CDD:341315",
                    "name": "TM helix 6 [structural motif]",
                    "range": {
                      "begin": "1527",
                      "end": "1551"
                    }
                  }
                ],
                "name": "nsp3",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "2720",
                      "end": "8554"
                    }
                  ],
                  "seqId": "NC_045512.2:2720-8554",
                  "sequenceHash": "6A235ABB",
                  "title": "nsp3 [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "otherNames": [
                  "non-structural protein 3",
                  "nonstructural protein 3",
                  "nsp3"
                ],
                "pdbIds": [
                  "6WEY",
                  "6W9C"
                ],
                "protein": {
                  "accessionVersion": "YP_009725299.1",
                  "seqId": "YP_009725299.1",
                  "sequenceHash": "21B55819",
                  "title": "nsp3 [polyprotein_range=YP_009724389.1:819-2763] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              },
              {
                "accession": "YP_009725300.1",
                "cdd": [
                  {
                    "accession": "CDD:374495",
                    "name": "Coronavirus nonstructural protein 4 C-terminus",
                    "range": {
                      "begin": "406",
                      "end": "498"
                    }
                  }
                ],
                "name": "nsp4",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "8555",
                      "end": "10054"
                    }
                  ],
                  "seqId": "NC_045512.2:8555-10054",
                  "sequenceHash": "4BA01958",
                  "title": "nsp4 [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "otherNames": [
                  "non-structural protein 4",
                  "nonstructural protein 4",
                  "nsp4"
                ],
                "protein": {
                  "accessionVersion": "YP_009725300.1",
                  "seqId": "YP_009725300.1",
                  "sequenceHash": "2C781714",
                  "title": "nsp4 [polyprotein_range=YP_009724389.1:2764-3263] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              },
              {
                "accession": "YP_009725301.1",
                "cdd": [
                  {
                    "accession": "CDD:368429",
                    "name": "Coronavirus endopeptidase C30",
                    "range": {
                      "begin": "29",
                      "end": "306"
                    }
                  }
                ],
                "name": "3C-like proteinase",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "10055",
                      "end": "10972"
                    }
                  ],
                  "seqId": "NC_045512.2:10055-10972",
                  "sequenceHash": "C28D0EE5",
                  "title": "3C-like proteinase [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "otherNames": [
                  "3C-like proteinase",
                  "3CLpro",
                  "Mpro",
                  "main proteinase",
                  "non-structural protein 5",
                  "nonstructural protein 5",
                  "nsp5A_3CLpro",
                  "nsp5B_3CLpro"
                ],
                "protein": {
                  "accessionVersion": "YP_009725301.1",
                  "seqId": "YP_009725301.1",
                  "sequenceHash": "5CE30DBB",
                  "title": "3C-like proteinase [polyprotein_range=YP_009724389.1:3264-3569] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              },
              {
                "accession": "YP_009725302.1",
                "name": "nsp6",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "10973",
                      "end": "11842"
                    }
                  ],
                  "seqId": "NC_045512.2:10973-11842",
                  "sequenceHash": "99170F63",
                  "title": "nsp6 [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "otherNames": [
                  "non-structural protein 6",
                  "nonstructural protein 6",
                  "nsp6"
                ],
                "protein": {
                  "accessionVersion": "YP_009725302.1",
                  "seqId": "YP_009725302.1",
                  "sequenceHash": "A72B0D80",
                  "title": "nsp6 [polyprotein_range=YP_009724389.1:3570-3859] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              },
              {
                "accession": "YP_009725303.1",
                "cdd": [
                  {
                    "accession": "CDD:285878",
                    "name": "nsp7 replicase",
                    "range": {
                      "begin": "1",
                      "end": "83"
                    }
                  }
                ],
                "name": "nsp7",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "11843",
                      "end": "12091"
                    }
                  ],
                  "seqId": "NC_045512.2:11843-12091",
                  "sequenceHash": "DDAA03C2",
                  "title": "nsp7 [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "otherNames": [
                  "non-structural protein 7",
                  "nonstructural protein 7",
                  "nsp7"
                ],
                "protein": {
                  "accessionVersion": "YP_009725303.1",
                  "seqId": "YP_009725303.1",
                  "sequenceHash": "A87703C6",
                  "title": "nsp7 [polyprotein_range=YP_009724389.1:3860-3942] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              },
              {
                "accession": "YP_009725304.1",
                "cdd": [
                  {
                    "accession": "CDD:285879",
                    "name": "nsp8 replicase",
                    "range": {
                      "begin": "1",
                      "end": "198"
                    }
                  }
                ],
                "name": "nsp8",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "12092",
                      "end": "12685"
                    }
                  ],
                  "seqId": "NC_045512.2:12092-12685",
                  "sequenceHash": "759508E4",
                  "title": "nsp8 [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "otherNames": [
                  "non-structural protein 8",
                  "nonstructural protein 8",
                  "nsp8"
                ],
                "protein": {
                  "accessionVersion": "YP_009725304.1",
                  "seqId": "YP_009725304.1",
                  "sequenceHash": "27D30877",
                  "title": "nsp8 [polyprotein_range=YP_009724389.1:3943-4140] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              },
              {
                "accession": "YP_009725305.1",
                "cdd": [
                  {
                    "accession": "CDD:285872",
                    "name": "nsp9 replicase",
                    "range": {
                      "begin": "1",
                      "end": "113"
                    }
                  }
                ],
                "name": "nsp9",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "12686",
                      "end": "13024"
                    }
                  ],
                  "seqId": "NC_045512.2:12686-13024",
                  "sequenceHash": "66A7051A",
                  "title": "nsp9 [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "otherNames": [
                  "non-structural protein 9",
                  "nonstructural protein 9",
                  "nsp9",
                  "ssRNA-binding protein"
                ],
                "protein": {
                  "accessionVersion": "YP_009725305.1",
                  "seqId": "YP_009725305.1",
                  "sequenceHash": "1A720513",
                  "title": "nsp9 [polyprotein_range=YP_009724389.1:4141-4253] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              },
              {
                "accession": "YP_009725306.1",
                "cdd": [
                  {
                    "accession": "CDD:286486",
                    "name": "RNA synthesis protein NSP10",
                    "range": {
                      "begin": "12",
                      "end": "131"
                    }
                  }
                ],
                "name": "nsp10",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "13025",
                      "end": "13441"
                    }
                  ],
                  "seqId": "NC_045512.2:13025-13441",
                  "sequenceHash": "4B5A0671",
                  "title": "nsp10 [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "otherNames": [
                  "GFL",
                  "growth-factor-like protein",
                  "non-structural protein 10",
                  "nonstructural protein 10",
                  "nsp10"
                ],
                "protein": {
                  "accessionVersion": "YP_009725306.1",
                  "seqId": "YP_009725306.1",
                  "sequenceHash": "839705B4",
                  "title": "nsp10 [polyprotein_range=YP_009724389.1:4254-4392] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              },
              {
                "accession": "YP_009725307.1",
                "cdd": [
                  {
                    "accession": "CDD:284009",
                    "name": "Coronavirus RPol N-terminus",
                    "range": {
                      "begin": "14",
                      "end": "366"
                    }
                  }
                ],
                "name": "RNA-dependent RNA polymerase",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "13442",
                      "end": "13468"
                    },
                    {
                      "begin": "13468",
                      "end": "16236"
                    }
                  ],
                  "seqId": "NC_045512.2:13442-13468,13468-16236",
                  "sequenceHash": "74392C07",
                  "title": "RNA-dependent RNA polymerase [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "otherNames": [
                  "NiRAN",
                  "RNA-dependent RNA polymerase",
                  "RdRp",
                  "non-structural protein 12",
                  "nonstructural protein 12",
                  "nsp12"
                ],
                "pdbIds": [
                  "7BV2"
                ],
                "protein": {
                  "accessionVersion": "YP_009725307.1",
                  "seqId": "YP_009725307.1",
                  "sequenceHash": "6D522979",
                  "title": "RNA-dependent RNA polymerase [polyprotein_range=YP_009724389.1:4393-5324] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              },
              {
                "accession": "YP_009725308.1",
                "cdd": [
                  {
                    "accession": "CDD:350692",
                    "name": "DEXXQ-box helicase domain of Upf1-like helicase",
                    "range": {
                      "begin": "272",
                      "end": "443"
                    }
                  },
                  {
                    "accession": "CDD:224037",
                    "name": "Superfamily I DNA and/or RNA helicase [Replication, recombination and repair]",
                    "range": {
                      "begin": "323",
                      "end": "592"
                    }
                  }
                ],
                "name": "helicase",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "16237",
                      "end": "18039"
                    }
                  ],
                  "seqId": "NC_045512.2:16237-18039",
                  "sequenceHash": "6CA71C02",
                  "title": "helicase [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "otherNames": [
                  "helicase",
                  "non-structural protein 13",
                  "nonstructural protein 13"
                ],
                "protein": {
                  "accessionVersion": "YP_009725308.1",
                  "seqId": "YP_009725308.1",
                  "sequenceHash": "17B91B6E",
                  "title": "helicase [polyprotein_range=YP_009724389.1:5325-5925] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              },
              {
                "accession": "YP_009725309.1",
                "cdd": [
                  {
                    "accession": "CDD:284002",
                    "name": "pfam06471",
                    "range": {
                      "begin": "3",
                      "end": "527"
                    }
                  }
                ],
                "name": "3'-to-5' exonuclease",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "18040",
                      "end": "19620"
                    }
                  ],
                  "seqId": "NC_045512.2:18040-19620",
                  "sequenceHash": "444B1903",
                  "title": "3'-to-5' exonuclease [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "otherNames": [
                  "3'-to-5' exonuclease",
                  "non-structural protein 14",
                  "nonstructural protein 14",
                  "nsp14"
                ],
                "protein": {
                  "accessionVersion": "YP_009725309.1",
                  "seqId": "YP_009725309.1",
                  "sequenceHash": "8ED173E",
                  "title": "3'-to-5' exonuclease [polyprotein_range=YP_009724389.1:5926-6452] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              },
              {
                "accession": "YP_009725310.1",
                "cdd": [
                  {
                    "accession": "CDD:284002",
                    "name": "pfam06471",
                    "range": {
                      "begin": "1",
                      "end": "68"
                    }
                  }
                ],
                "name": "endoRNAse",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "19621",
                      "end": "20658"
                    }
                  ],
                  "seqId": "NC_045512.2:19621-20658",
                  "sequenceHash": "C4DC1059",
                  "title": "endoRNAse [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "otherNames": [
                  "endoRNAse",
                  "non-structural protein 15",
                  "nonstructural protein 15",
                  "nsp15"
                ],
                "protein": {
                  "accessionVersion": "YP_009725310.1",
                  "seqId": "YP_009725310.1",
                  "sequenceHash": "76160F5B",
                  "title": "endoRNAse [polyprotein_range=YP_009724389.1:6453-6798] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              },
              {
                "accession": "YP_009725311.1",
                "cdd": [
                  {
                    "accession": "CDD:368920",
                    "name": "Coronavirus NSP13",
                    "range": {
                      "begin": "2",
                      "end": "297"
                    }
                  }
                ],
                "name": "2'-O-ribose methyltransferase",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "20659",
                      "end": "21552"
                    }
                  ],
                  "seqId": "NC_045512.2:20659-21552",
                  "sequenceHash": "58050E29",
                  "title": "2'-O-ribose methyltransferase [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "otherNames": [
                  "2'-o-MT",
                  "2'-o-ribose methyltransferase",
                  "non-structural protein 16",
                  "nonstructural protein 16",
                  "nsp16",
                  "nsp16_OMT"
                ],
                "protein": {
                  "accessionVersion": "YP_009725311.1",
                  "seqId": "YP_009725311.1",
                  "sequenceHash": "D5F30D79",
                  "title": "2'-O-ribose methyltransferase [polyprotein_range=YP_009724389.1:6799-7096] [polyprotein=ORF1ab polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              }
            ],
            "name": "ORF1ab polyprotein",
            "nucleotide": {
              "accessionVersion": "NC_045512.2",
              "range": [
                {
                  "begin": "266",
                  "end": "13468"
                },
                {
                  "begin": "13468",
                  "end": "21555"
                }
              ],
              "seqId": "NC_045512.2:266-13468,13468-21555",
              "sequenceHash": "9F004FDB",
              "title": "ORF1ab polyprotein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
            },
            "otherNames": [
              "ORF1ab",
              "ORF1ab polyprotein",
              "open reading frame 1",
              "open reading frame 1ab",
              "orf1",
              "orf1b",
              "polyprotein 1ab",
              "pp1ab"
            ],
            "protein": {
              "accessionVersion": "YP_009724389.1",
              "range": [
                {
                  "begin": "1",
                  "end": "7096"
                }
              ],
              "seqId": "YP_009724389.1:1-7096",
              "sequenceHash": "89013D3D",
              "title": "ORF1ab polyprotein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
            },
            "uniProtKb": {
              "id": "P0DTD1",
              "name": "Replicase polyprotein 1ab"
            }
          },
          {
            "maturePeptide": [
              {
                "accession": "YP_009742608.1",
                "cdd": [
                  {
                    "accession": "CDD:288369",
                    "name": "Non structural protein Nsp1",
                    "range": {
                      "begin": "13",
                      "end": "127"
                    }
                  }
                ],
                "name": "leader protein",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "266",
                      "end": "805"
                    }
                  ],
                  "sequenceHash": "BFEE0830"
                },
                "otherNames": [
                  "leader protein",
                  "non-structural protein 1",
                  "nonstructural protein 1",
                  "nsp1"
                ],
                "protein": {
                  "accessionVersion": "YP_009742608.1",
                  "seqId": "YP_009742608.1",
                  "sequenceHash": "DFF407F9",
                  "title": "leader protein [polyprotein_range=YP_009725295.1:1-180] [polyprotein=ORF1a polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              },
              {
                "accession": "YP_009742609.1",
                "name": "nsp2",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "806",
                      "end": "2719"
                    }
                  ],
                  "sequenceHash": "E741D86"
                },
                "otherNames": [
                  "non-structural protein 2",
                  "nonstructural protein 2",
                  "nsp2"
                ],
                "protein": {
                  "accessionVersion": "YP_009742609.1",
                  "seqId": "YP_009742609.1",
                  "sequenceHash": "58F71ADE",
                  "title": "nsp2 [polyprotein_range=YP_009725295.1:181-818] [polyprotein=ORF1a polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              },
              {
                "accession": "YP_009742610.1",
                "cdd": [
                  {
                    "accession": "CDD:289172",
                    "name": "Protein of unknown function (DUF3655)",
                    "range": {
                      "begin": "102",
                      "end": "169"
                    }
                  },
                  {
                    "accession": "CDD:366746",
                    "name": "Macro domain",
                    "range": {
                      "begin": "240",
                      "end": "340"
                    }
                  },
                  {
                    "accession": "CDD:314498",
                    "name": "Single-stranded poly(A) binding domain",
                    "range": {
                      "begin": "533",
                      "end": "675"
                    }
                  },
                  {
                    "accession": "CDD:288939",
                    "name": "Coronavirus polyprotein cleavage domain",
                    "range": {
                      "begin": "680",
                      "end": "743"
                    }
                  },
                  {
                    "accession": "CDD:370080",
                    "name": "Papain like viral protease",
                    "range": {
                      "begin": "746",
                      "end": "1064"
                    }
                  },
                  {
                    "accession": "CDD:292868",
                    "name": "Nucleic acid-binding domain (NAR)",
                    "range": {
                      "begin": "1089",
                      "end": "1201"
                    }
                  },
                  {
                    "accession": "CDD:391938",
                    "name": "even-transmembrane G protein-coupled receptor",
                    "range": {
                      "begin": "1494",
                      "end": "1563"
                    }
                  },
                  {
                    "accession": "CDD:341315",
                    "name": "TM helix 5 [structural motif]",
                    "range": {
                      "begin": "1497",
                      "end": "1519"
                    }
                  },
                  {
                    "accession": "CDD:341315",
                    "name": "TM helix 6 [structural motif]",
                    "range": {
                      "begin": "1527",
                      "end": "1551"
                    }
                  }
                ],
                "name": "nsp3",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "2720",
                      "end": "8554"
                    }
                  ],
                  "sequenceHash": "6A235ABB"
                },
                "otherNames": [
                  "non-structural protein 3",
                  "nonstructural protein 3",
                  "nsp3"
                ],
                "protein": {
                  "accessionVersion": "YP_009742610.1",
                  "seqId": "YP_009742610.1",
                  "sequenceHash": "21B55819",
                  "title": "nsp3 [polyprotein_range=YP_009725295.1:819-2763] [polyprotein=ORF1a polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              },
              {
                "accession": "YP_009742611.1",
                "cdd": [
                  {
                    "accession": "CDD:374495",
                    "name": "Coronavirus nonstructural protein 4 C-terminus",
                    "range": {
                      "begin": "406",
                      "end": "498"
                    }
                  }
                ],
                "name": "nsp4",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "8555",
                      "end": "10054"
                    }
                  ],
                  "sequenceHash": "4BA01958"
                },
                "otherNames": [
                  "non-structural protein 4",
                  "nonstructural protein 4",
                  "nsp4"
                ],
                "protein": {
                  "accessionVersion": "YP_009742611.1",
                  "seqId": "YP_009742611.1",
                  "sequenceHash": "2C781714",
                  "title": "nsp4 [polyprotein_range=YP_009725295.1:2764-3263] [polyprotein=ORF1a polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              },
              {
                "accession": "YP_009742612.1",
                "cdd": [
                  {
                    "accession": "CDD:368429",
                    "name": "Coronavirus endopeptidase C30",
                    "range": {
                      "begin": "29",
                      "end": "306"
                    }
                  }
                ],
                "name": "3C-like proteinase",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "10055",
                      "end": "10972"
                    }
                  ],
                  "sequenceHash": "C28D0EE5"
                },
                "otherNames": [
                  "3C-like proteinase",
                  "3CLpro",
                  "Mpro",
                  "main proteinase",
                  "non-structural protein 5",
                  "nonstructural protein 5",
                  "nsp5A_3CLpro",
                  "nsp5B_3CLpro"
                ],
                "protein": {
                  "accessionVersion": "YP_009742612.1",
                  "seqId": "YP_009742612.1",
                  "sequenceHash": "5CE30DBB",
                  "title": "3C-like proteinase [polyprotein_range=YP_009725295.1:3264-3569] [polyprotein=ORF1a polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              },
              {
                "accession": "YP_009742613.1",
                "name": "nsp6",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "10973",
                      "end": "11842"
                    }
                  ],
                  "sequenceHash": "99170F63"
                },
                "otherNames": [
                  "non-structural protein 6",
                  "nonstructural protein 6",
                  "nsp6"
                ],
                "protein": {
                  "accessionVersion": "YP_009742613.1",
                  "seqId": "YP_009742613.1",
                  "sequenceHash": "A72B0D80",
                  "title": "nsp6 [polyprotein_range=YP_009725295.1:3570-3859] [polyprotein=ORF1a polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              },
              {
                "accession": "YP_009742614.1",
                "cdd": [
                  {
                    "accession": "CDD:285878",
                    "name": "nsp7 replicase",
                    "range": {
                      "begin": "1",
                      "end": "83"
                    }
                  }
                ],
                "name": "nsp7",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "11843",
                      "end": "12091"
                    }
                  ],
                  "sequenceHash": "DDAA03C2"
                },
                "otherNames": [
                  "non-structural protein 7",
                  "nonstructural protein 7",
                  "nsp7"
                ],
                "protein": {
                  "accessionVersion": "YP_009742614.1",
                  "seqId": "YP_009742614.1",
                  "sequenceHash": "A87703C6",
                  "title": "nsp7 [polyprotein_range=YP_009725295.1:3860-3942] [polyprotein=ORF1a polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              },
              {
                "accession": "YP_009742615.1",
                "cdd": [
                  {
                    "accession": "CDD:285879",
                    "name": "nsp8 replicase",
                    "range": {
                      "begin": "1",
                      "end": "198"
                    }
                  }
                ],
                "name": "nsp8",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "12092",
                      "end": "12685"
                    }
                  ],
                  "sequenceHash": "759508E4"
                },
                "otherNames": [
                  "non-structural protein 8",
                  "nonstructural protein 8",
                  "nsp8"
                ],
                "protein": {
                  "accessionVersion": "YP_009742615.1",
                  "seqId": "YP_009742615.1",
                  "sequenceHash": "27D30877",
                  "title": "nsp8 [polyprotein_range=YP_009725295.1:3943-4140] [polyprotein=ORF1a polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              },
              {
                "accession": "YP_009742616.1",
                "cdd": [
                  {
                    "accession": "CDD:285872",
                    "name": "nsp9 replicase",
                    "range": {
                      "begin": "1",
                      "end": "113"
                    }
                  }
                ],
                "name": "nsp9",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "12686",
                      "end": "13024"
                    }
                  ],
                  "sequenceHash": "66A7051A"
                },
                "otherNames": [
                  "non-structural protein 9",
                  "nonstructural protein 9",
                  "nsp9",
                  "ssRNA-binding protein"
                ],
                "protein": {
                  "accessionVersion": "YP_009742616.1",
                  "seqId": "YP_009742616.1",
                  "sequenceHash": "1A720513",
                  "title": "nsp9 [polyprotein_range=YP_009725295.1:4141-4253] [polyprotein=ORF1a polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              },
              {
                "accession": "YP_009742617.1",
                "cdd": [
                  {
                    "accession": "CDD:286486",
                    "name": "RNA synthesis protein NSP10",
                    "range": {
                      "begin": "12",
                      "end": "131"
                    }
                  }
                ],
                "name": "nsp10",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "13025",
                      "end": "13441"
                    }
                  ],
                  "sequenceHash": "4B5A0671"
                },
                "otherNames": [
                  "GFL",
                  "growth-factor-like protein",
                  "non-structural protein 10",
                  "nonstructural protein 10",
                  "nsp10"
                ],
                "protein": {
                  "accessionVersion": "YP_009742617.1",
                  "seqId": "YP_009742617.1",
                  "sequenceHash": "839705B4",
                  "title": "nsp10 [polyprotein_range=YP_009725295.1:4254-4392] [polyprotein=ORF1a polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              },
              {
                "accession": "YP_009725312.1",
                "name": "nsp11",
                "nucleotide": {
                  "accessionVersion": "NC_045512.2",
                  "range": [
                    {
                      "begin": "13442",
                      "end": "13480"
                    }
                  ],
                  "seqId": "NC_045512.2:13442-13480",
                  "sequenceHash": "CA400AB",
                  "title": "nsp11 [polyprotein=ORF1a polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "otherNames": [
                  "non-structural protein 11",
                  "nonstructural protein 11",
                  "nsp11"
                ],
                "protein": {
                  "accessionVersion": "YP_009725312.1",
                  "seqId": "YP_009725312.1",
                  "sequenceHash": "32B0077",
                  "title": "nsp11 [polyprotein_range=YP_009725295.1:4393-4405] [polyprotein=ORF1a polyprotein] [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
                },
                "proteinCompleteness": "COMPLETE"
              }
            ],
            "name": "ORF1a polyprotein",
            "nucleotide": {
              "accessionVersion": "NC_045512.2",
              "range": [
                {
                  "begin": "266",
                  "end": "13483"
                }
              ],
              "seqId": "NC_045512.2:266-13483",
              "sequenceHash": "96BED0ED",
              "title": "ORF1a polyprotein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
            },
            "otherNames": [
              "ORF1a",
              "ORF1a polyprotein",
              "open reading frame 1a",
              "orf1a",
              "polyprotein 1ab",
              "pp1a"
            ],
            "protein": {
              "accessionVersion": "YP_009725295.1",
              "range": [
                {
                  "begin": "1",
                  "end": "4405"
                }
              ],
              "seqId": "YP_009725295.1:1-4405",
              "sequenceHash": "A6EBC4B0",
              "title": "ORF1a polyprotein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
            },
            "uniProtKb": {
              "id": "P0DTC1",
              "name": "Replicase polyprotein 1a"
            }
          }
        ],
        "geneId": 43740578,
        "name": "ORF1ab"
      },
      {
        "cds": [
          {
            "cdd": [
              {
                "accession": "CDD:370471",
                "name": "Spike receptor binding domain",
                "range": {
                  "begin": "330",
                  "end": "583"
                }
              },
              {
                "accession": "CDD:279881",
                "name": "Coronavirus S2 glycoprotein",
                "range": {
                  "begin": "1",
                  "end": "1273"
                }
              }
            ],
            "name": "surface glycoprotein",
            "nucleotide": {
              "accessionVersion": "NC_045512.2",
              "range": [
                {
                  "begin": "21563",
                  "end": "25384"
                }
              ],
              "seqId": "NC_045512.2:21563-25384",
              "sequenceHash": "B32D3CC0",
              "title": "surface glycoprotein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
            },
            "otherNames": [
              "ORF 2",
              "open reading frame 2",
              "spike",
              "spike protein",
              "surface glycoprotein"
            ],
            "pdbIds": [
              "6VYB"
            ],
            "protein": {
              "accessionVersion": "YP_009724390.1",
              "range": [
                {
                  "begin": "1",
                  "end": "1273"
                }
              ],
              "seqId": "YP_009724390.1:1-1273",
              "sequenceHash": "DF1539B0",
              "title": "surface glycoprotein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
            },
            "uniProtKb": {
              "id": "P0DTC2",
              "name": "spike glycoprotein"
            }
          }
        ],
        "geneId": 43740568,
        "name": "S"
      },
      {
        "cds": [
          {
            "cdd": [
              {
                "accession": "CDD:288183",
                "name": "Coronavirus accessory protein 3a",
                "range": {
                  "begin": "1",
                  "end": "274"
                }
              }
            ],
            "name": "ORF3a protein",
            "nucleotide": {
              "accessionVersion": "NC_045512.2",
              "range": [
                {
                  "begin": "25393",
                  "end": "26220"
                }
              ],
              "seqId": "NC_045512.2:25393-26220",
              "sequenceHash": "EC770D42",
              "title": "ORF3a protein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
            },
            "otherNames": [
              "3a",
              "ORF 3a",
              "ORF3a",
              "open reading frame 3a"
            ],
            "protein": {
              "accessionVersion": "YP_009724391.1",
              "range": [
                {
                  "begin": "1",
                  "end": "275"
                }
              ],
              "seqId": "YP_009724391.1:1-275",
              "sequenceHash": "CA130D0F",
              "title": "ORF3a protein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
            },
            "uniProtKb": {
              "id": "P0DTC3",
              "name": "Protein 3a"
            }
          }
        ],
        "geneId": 43740569,
        "name": "ORF3a"
      },
      {
        "cds": [
          {
            "name": "envelope protein",
            "nucleotide": {
              "accessionVersion": "NC_045512.2",
              "range": [
                {
                  "begin": "26245",
                  "end": "26472"
                }
              ],
              "seqId": "NC_045512.2:26245-26472",
              "sequenceHash": "CFFD0414",
              "title": "envelope protein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
            },
            "otherNames": [
              "E",
              "ORF 4",
              "ORF4",
              "envelope",
              "envelope protein",
              "open reading frame 4"
            ],
            "protein": {
              "accessionVersion": "YP_009724392.1",
              "range": [
                {
                  "begin": "1",
                  "end": "75"
                }
              ],
              "seqId": "YP_009724392.1:1-75",
              "sequenceHash": "8A7D03C2",
              "title": "envelope protein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
            },
            "uniProtKb": {
              "id": "P0DTC4",
              "name": "Envelope small membrane protein"
            }
          }
        ],
        "geneId": 43740570,
        "name": "E"
      },
      {
        "cds": [
          {
            "cdd": [
              {
                "accession": "CDD:279907",
                "name": "Coronavirus M matrix/glycoprotein",
                "range": {
                  "begin": "4",
                  "end": "221"
                }
              }
            ],
            "name": "membrane glycoprotein",
            "nucleotide": {
              "accessionVersion": "NC_045512.2",
              "range": [
                {
                  "begin": "26523",
                  "end": "27191"
                }
              ],
              "seqId": "NC_045512.2:26523-27191",
              "sequenceHash": "4A3D0AA4",
              "title": "membrane glycoprotein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
            },
            "otherNames": [
              "M",
              "ORF 5",
              "ORF5",
              "matrix glycoprotein",
              "matrix protein",
              "membrane",
              "membrane glycoprotein",
              "open reading frame 5"
            ],
            "protein": {
              "accessionVersion": "YP_009724393.1",
              "range": [
                {
                  "begin": "1",
                  "end": "222"
                }
              ],
              "seqId": "YP_009724393.1:1-222",
              "sequenceHash": "404E09E2",
              "title": "membrane glycoprotein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
            },
            "uniProtKb": {
              "id": "P0DTC5",
              "name": "Membrane protein"
            }
          }
        ],
        "geneId": 43740571,
        "name": "M"
      },
      {
        "cds": [
          {
            "cdd": [
              {
                "accession": "CDD:288948",
                "name": "Open reading frame 6 from SARS coronavirus",
                "range": {
                  "begin": "1",
                  "end": "61"
                }
              }
            ],
            "name": "ORF6 protein",
            "nucleotide": {
              "accessionVersion": "NC_045512.2",
              "range": [
                {
                  "begin": "27202",
                  "end": "27387"
                }
              ],
              "seqId": "NC_045512.2:27202-27387",
              "sequenceHash": "252702F1",
              "title": "ORF6 protein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
            },
            "otherNames": [
              "ORF 6",
              "ORF6",
              "open reading frame 6"
            ],
            "protein": {
              "accessionVersion": "YP_009724394.1",
              "range": [
                {
                  "begin": "1",
                  "end": "61"
                }
              ],
              "seqId": "YP_009724394.1:1-61",
              "sequenceHash": "543902B5",
              "title": "ORF6 protein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
            },
            "uniProtKb": {
              "id": "P0DTC6",
              "name": "Non-structural protein 6"
            }
          }
        ],
        "geneId": 43740572,
        "name": "ORF6"
      },
      {
        "cds": [
          {
            "cdd": [
              {
                "accession": "CDD:370117",
                "name": "SARS coronavirus X4 like",
                "range": {
                  "begin": "16",
                  "end": "98"
                }
              }
            ],
            "name": "ORF7a protein",
            "nucleotide": {
              "accessionVersion": "NC_045512.2",
              "range": [
                {
                  "begin": "27394",
                  "end": "27759"
                }
              ],
              "seqId": "NC_045512.2:27394-27759",
              "sequenceHash": "25E705AF",
              "title": "ORF7a protein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
            },
            "otherNames": [
              "7a",
              "ORF 7a",
              "ORF7a",
              "open reading frame 7a"
            ],
            "pdbIds": [
              "6W37"
            ],
            "protein": {
              "accessionVersion": "YP_009724395.1",
              "range": [
                {
                  "begin": "1",
                  "end": "121"
                }
              ],
              "seqId": "YP_009724395.1:1-121",
              "sequenceHash": "3B2A0535",
              "title": "ORF7a protein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
            },
            "uniProtKb": {
              "id": "P0DTC7",
              "name": "Protein 7a"
            }
          }
        ],
        "geneId": 43740573,
        "name": "ORF7a"
      },
      {
        "cds": [
          {
            "name": "ORF7b",
            "nucleotide": {
              "accessionVersion": "NC_045512.2",
              "range": [
                {
                  "begin": "27756",
                  "end": "27887"
                }
              ],
              "seqId": "NC_045512.2:27756-27887",
              "sequenceHash": "ADD30274",
              "title": "ORF7b [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
            },
            "otherNames": [
              "7b",
              "ORF 7b",
              "ORF7b",
              "open reading frame 7b"
            ],
            "protein": {
              "accessionVersion": "YP_009725318.1",
              "range": [
                {
                  "begin": "1",
                  "end": "43"
                }
              ],
              "seqId": "YP_009725318.1:1-43",
              "sequenceHash": "2469019F",
              "title": "ORF7b [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
            },
            "uniProtKb": {
              "id": "P0DTD8",
              "name": "Protein non-structural 7b"
            }
          }
        ],
        "geneId": 43740574,
        "name": "ORF7b"
      },
      {
        "cds": [
          {
            "cdd": [
              {
                "accession": "CDD:152528",
                "name": "Coronavirus NS8 protein",
                "range": {
                  "begin": "1",
                  "end": "118"
                }
              }
            ],
            "name": "ORF8 protein",
            "nucleotide": {
              "accessionVersion": "NC_045512.2",
              "range": [
                {
                  "begin": "27894",
                  "end": "28259"
                }
              ],
              "seqId": "NC_045512.2:27894-28259",
              "sequenceHash": "43A50622",
              "title": "ORF8 protein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
            },
            "otherNames": [
              "ORF 8",
              "ORF8",
              "open reading frame 8"
            ],
            "protein": {
              "accessionVersion": "YP_009724396.1",
              "range": [
                {
                  "begin": "1",
                  "end": "121"
                }
              ],
              "seqId": "YP_009724396.1:1-121",
              "sequenceHash": "47D40569",
              "title": "ORF8 protein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
            },
            "uniProtKb": {
              "id": "P0DTC8",
              "name": "Non-structural protein 8"
            }
          }
        ],
        "geneId": 43740577,
        "name": "ORF8"
      },
      {
        "cds": [
          {
            "cdd": [
              {
                "accession": "CDD:279305",
                "name": "Coronavirus nucleocapsid protein",
                "range": {
                  "begin": "14",
                  "end": "368"
                }
              }
            ],
            "name": "nucleocapsid phosphoprotein",
            "nucleotide": {
              "accessionVersion": "NC_045512.2",
              "range": [
                {
                  "begin": "28274",
                  "end": "29533"
                }
              ],
              "seqId": "NC_045512.2:28274-29533",
              "sequenceHash": "4D3C10AF",
              "title": "nucleocapsid phosphoprotein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
            },
            "otherNames": [
              "ORF 9",
              "ORF9",
              "nucleocapsid",
              "nucleocapsid phosphoprotein",
              "open reading frame 9"
            ],
            "pdbIds": [
              "6VYO",
              "6WJI"
            ],
            "protein": {
              "accessionVersion": "YP_009724397.2",
              "range": [
                {
                  "begin": "1",
                  "end": "419"
                }
              ],
              "seqId": "YP_009724397.2:1-419",
              "sequenceHash": "9B7912B4",
              "title": "nucleocapsid phosphoprotein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
            },
            "uniProtKb": {
              "id": "P0DTC9",
              "name": "Nucleoprotein"
            }
          }
        ],
        "geneId": 43740575,
        "name": "N"
      },
      {
        "cds": [
          {
            "name": "ORF10 protein",
            "nucleotide": {
              "accessionVersion": "NC_045512.2",
              "range": [
                {
                  "begin": "29558",
                  "end": "29674"
                }
              ],
              "seqId": "NC_045512.2:29558-29674",
              "sequenceHash": "761401EA",
              "title": "ORF10 protein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
            },
            "otherNames": [
              "ORF 10",
              "ORF10",
              "open reading frame 10"
            ],
            "protein": {
              "accessionVersion": "YP_009725255.1",
              "range": [
                {
                  "begin": "1",
                  "end": "38"
                }
              ],
              "seqId": "YP_009725255.1:1-38",
              "sequenceHash": "231201DA",
              "title": "ORF10 protein [organism=Severe acute respiratory syndrome coronavirus 2] [isolate=Wuhan-Hu-1]"
            },
            "uniProtKb": {
              "id": "A0A663DJA2",
              "name": "ORF10 protein"
            }
          }
        ],
        "geneId": 43740576,
        "name": "ORF10"
      }
    ]
  },
  "bioprojects": [
    "PRJNA485481"
  ],
  "completeness": "COMPLETE",
  "geneCount": 11,
  "host": {
    "lineage": [
      {
        "name": "cellular organisms",
        "taxId": 131567
      },
      {
        "name": "Eukaryota (eucaryotes)",
        "taxId": 2759
      },
      {
        "name": "Opisthokonta",
        "taxId": 33154
      },
      {
        "name": "Metazoa (metazoans)",
        "taxId": 33208
      },
      {
        "name": "Eumetazoa",
        "taxId": 6072
      },
      {
        "name": "Bilateria",
        "taxId": 33213
      },
      {
        "name": "Deuterostomia (deuterostomes)",
        "taxId": 33511
      },
      {
        "name": "Chordata (chordates)",
        "taxId": 7711
      },
      {
        "name": "Craniata",
        "taxId": 89593
      },
      {
        "name": "Vertebrata (vertebrates)",
        "taxId": 7742
      },
      {
        "name": "Gnathostomata (jawed vertebrates)",
        "taxId": 7776
      },
      {
        "name": "Teleostomi",
        "taxId": 117570
      },
      {
        "name": "Euteleostomi (bony vertebrates)",
        "taxId": 117571
      },
      {
        "name": "Sarcopterygii",
        "taxId": 8287
      },
      {
        "name": "Dipnotetrapodomorpha",
        "taxId": 1338369
      },
      {
        "name": "Tetrapoda (tetrapods)",
        "taxId": 32523
      },
      {
        "name": "Amniota (amniotes)",
        "taxId": 32524
      },
      {
        "name": "Mammalia (mammals)",
        "taxId": 40674
      },
      {
        "name": "Theria",
        "taxId": 32525
      },
      {
        "name": "Eutheria (placentals)",
        "taxId": 9347
      },
      {
        "name": "Boreoeutheria",
        "taxId": 1437010
      },
      {
        "name": "Euarchontoglires",
        "taxId": 314146
      },
      {
        "name": "Primates",
        "taxId": 9443
      },
      {
        "name": "Haplorrhini",
        "taxId": 376913
      },
      {
        "name": "Simiiformes",
        "taxId": 314293
      },
      {
        "name": "Catarrhini",
        "taxId": 9526
      },
      {
        "name": "Hominoidea (apes)",
        "taxId": 314295
      },
      {
        "name": "Hominidae (great apes)",
        "taxId": 9604
      },
      {
        "name": "Homininae",
        "taxId": 207598
      },
      {
        "name": "Homo (humans)",
        "taxId": 9605
      },
      {
        "name": "Homo sapiens (human)",
        "taxId": 9606
      }
    ],
    "sciName": "Homo sapiens",
    "taxId": 9606
  },
  "isAnnotated": true,
  "isolate": {
    "collectionDate": "2019-12",
    "name": "Wuhan-Hu-1"
  },
  "length": 29903,
  "location": {
    "geographicLocation": "China",
    "geographicRegion": "Asia"
  },
  "maturePeptideCount": 26,
  "molType": "ssRNA(+)",
  "nucleotide": {
    "accessionVersion": "NC_045512.2",
    "seqId": "NC_045512.2",
    "sequenceHash": "A926D55E",
    "title": "Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome"
  },
  "nucleotideCompleteness": "complete",
  "proteinCount": 12,
  "releaseDate": "2020-01-13",
  "sourceDatabase": "RefSeq",
  "updateDate": "2020-07-18",
  "virus": {
    "lineage": [
      {
        "name": "Viruses",
        "taxId": 10239
      },
      {
        "name": "Riboviria (RNA viruses)",
        "taxId": 2559587
      },
      {
        "name": "Orthornavirae",
        "taxId": 2732396
      },
      {
        "name": "Pisuviricota",
        "taxId": 2732408
      },
      {
        "name": "Pisoniviricetes",
        "taxId": 2732506
      },
      {
        "name": "Nidovirales",
        "taxId": 76804
      },
      {
        "name": "Cornidovirineae",
        "taxId": 2499399
      },
      {
        "name": "Coronaviridae",
        "taxId": 11118
      },
      {
        "name": "Orthocoronavirinae",
        "taxId": 2501931
      },
      {
        "name": "Betacoronavirus",
        "taxId": 694002
      },
      {
        "name": "Sarbecovirus",
        "taxId": 2509511
      },
      {
        "name": "Severe acute respiratory syndrome-related coronavirus",
        "taxId": 694009
      },
      {
        "name": "Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)",
        "taxId": 2697049
      }
    ],
    "pangolinClassification": "B",
    "sciName": "Severe acute respiratory syndrome coronavirus 2",
    "taxId": 2697049
  }
}

VirusAssembly Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
accessionaccessionAccessionstringThe accession.version of the viral nucleotide sequence. Includes both GenBank and RefSeq accessionsNC_045512.2
isAnnotatedis-annotatedIs AnnotatedboolThe viral genome has been annotated by either the submitter (GenBank) or by NCBI (RefSeq)
isolateisolate-IsolateVirusAssembly.Isolate
sourceDatabasesourcedbSource databasestringIndicates if the source of the viral nucleotide record is from a GenBank submitter or from NCBI-derived curation (RefSeq)RefSeq
GenBank
proteinCountprotein-countProtein countuint32The total count of annotated proteins including both proteins and polyproteins but not processed mature peptides
hosthost-HostOrganismTaxon from which the virus sample was isolated
virusvirus-VirusOrganismViral taxon
bioprojects repeatedbioprojectsBioProjectsstringAssociated BioProject accessions, when availablePRJNA485481
locationgeo-GeographicVirusAssembly.CollectionLocation
updateDateupdate-dateUpdate datestringDate the viral nucleotide accession was last updated in NCBI Virus
releaseDaterelease-dateRelease datestringDate the viral nucleotide accession was first released in NCBI Virus
completenesscompletenessCompletenessVirusAssembly.Completeness
lengthlengthLengthuint32Length of the viral nucleotide sequence
geneCountgene-countGene countuint32Total count of genes annotated on the viral nucleotide sequence
maturePeptideCountmatpeptide-countMature peptide countuint32Total count of processed mature peptides annotated on the viral nucleotide sequence
biosamplebiosample-accBioSample accessionstringAssociated Biosample accessionsSAMN15394129
molTypemol-typeMolecule typestringICTV (International Committee on Taxonomy of Viruses) viral classification based on nucleic acid composition, strandedness and method of replication
annotationVirusAnnotation
nucleotideSeqRangeSetFastaThe whole genomic nucleotide record of the CDS feature.
purposeOfSamplingpurpose-of-samplingPurpose of SamplingPurposeOfSampling
sraAccessions repeatedsra-accsSRA AccessionsstringSRA accessions linked to the genbank genome

ConservedDomain Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
accessionaccessionAccessionstringcdd accession
namenameNamestring
rangerange-RangeRangerange on the protein

SeqRangeSetFasta Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
seqIdseq-idSequence IDstringSeq_id may include location info in addition to a sequence accession
accessionVersionaccessionAccessionstringAccession and version of the viral nucleotide sequence
titletitleTitlestring
sequenceHashhashHashstringUnique identifier for identical sequences
range repeatedrange-RangeRangeSeries of intervals on above accession_version

VirusAnnotation Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
genes repeatedgene-GeneVirusGene

VirusAssembly.CollectionLocation Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
geographicLocationlocationlocationstringCountry of virus specimen collectionUSA
France
geographicRegionregionregionstringRegion of virus specimen collectionAsia
North America

VirusAssembly.Isolate Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
namelineageLineagestringBioSample harmonized attribute names https://www.ncbi.nlm.nih.gov/biosample/docs/attributes/
sourcelineage-sourceLineage sourcestringSource material from which the viral specimen was isolatedblood
feces
lung
collectionDatecollection-dateCollection datestringThe collection date for the sample from which the viral nucleotide sequence was derived

VirusGene Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
namenameNamestring
geneIdgene-idNCBI GeneIDuint32
nucleotidegenomic-GenomicSeqRangeSetFastaThe interval on the genomic nucleotide record of the CDS feature.
cds repeatedcds-CDSVirusPeptidepolyprotein or protein cds

VirusPeptide Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
accessionaccessionAccessionstringProtein accession and version
namenameNamestringProtein name
otherNames repeatedother-namesOther NamesstringAlternate names for this protein
nucleotidenuc-fasta-Nucleotide FASTASeqRangeSetFastaThe interval on the genomic nucleotide record of this mature-peptide feature
proteinprotein-fasta-Protein FASTASeqRangeSetFastaThe full polyprotein record or interval on the polyprotein for mature-peptide features
pdbIds repeatedpdb-idsPDB IDsstringPDB identifiers for this protein
cdd repeatedcdd-CDDConservedDomainConserved Domains associated with this protein
uniProtKbuniprot-UniProtVirusPeptide.UniProtIdUniProt identifier
maturePeptide repeatedmat-peptideMature PeptideVirusPeptideEnzymatically processed products of a polyprotein
proteinCompletenessprot-completenessProtein CompletenessVirusAssembly.CompletenessProtein completeness

VirusPeptide.UniProtId Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
ididIDstringUniProt ID
namenameNamestringUniProt name

PurposeOfSampling Enumeration

NameNumberDescription
PURPOSE_OF_SAMPLING_UNKNOWN0
PURPOSE_OF_SAMPLING_BASELINE_SURVEILLANCE1

VirusAssembly.Completeness Enumeration

NameNumberDescription
UNKNOWN0
COMPLETE1
PARTIAL2

Organism Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
taxIdtax-idTaxonomic IDuint32NCBI Taxonomy identifier9606
2697049
organismNameorganism-nameOrganism NamestringScientific nameHomo sapiens
Severe acute respiratory syndrome coronavirus 2
commonNamecommon-nameCommon NamestringCommon namehuman
pangolin
MERS
SARS2
lineage repeatedLineageOrganismLineage ordered from superkingdom level to increasingly more specific taxonomic entries
strainstrainStrainstringSE11
pangolinClassificationpangolinPangolin ClassificationstringB.1.1.7

BioProject Structure

A BioProject is a collection of biological data related to a single initiative, originating from a single organization or from a consortium. A BioProject record provides users a single place to find links to the diverse data types generated for that project. The record can be retrieved from NCBI BioProject

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
accessionaccessionAccessionstringBioProject accessionPRJEB35387
titletitleTitlestringTitle of the BioProject provided by the submitterSciurus carolinensis (grey squirrel) genome assembly, mSciCar1
parentAccessions repeatedparent-accessionsParent AccessionsstringBioProject accession containing multiple children BioProjects["PRJNA489243","PRJEB33226","PRJEB40665"]

BioProjectLineage Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
bioprojects repeatedlineage-LineageBioProjectA BioProject is a collection of biological data related to a single initiative, originating from a single organization or from a consortium

LineageOrganism Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
taxIdcoming sooncoming soonuint32NCBI Taxonomy identifier11118
namecoming sooncoming soonstringScientific nameCoronaviridae

Range Structure

A 1-based range on a sequence record.

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
beginstartStartuint64
endstopStopuint64
orientationorientationOrientationOrientation
ordercoming sooncoming soonuint32I don’t think this needs to be included in gene reports but it is currently thereso it needs to be available in the spec until/unless it gets removed from that report

Orientation Enumeration

NameNumberDescription
none0
plus1
minus2

Scalar Value Types

Protocol buffers typeNotesC++PythonJavaGo
doubledoublefloatdoublefloat64
floatfloatfloatfloatfloat32
int32Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.int32intintint32
int64Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead.int64int/longlongint64
uint32Uses variable-length encoding.uint32int/longintuint32
uint64Uses variable-length encoding.uint64int/longlonguint64
sint32Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.int32intintint32
sint64Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s.int64int/longlongint64
fixed32Always four bytes. More efficient than uint32 if values are often greater than 2^28.uint32intintuint32
fixed64Always eight bytes. More efficient than uint64 if values are often greater than 2^56.uint64int/longlonguint64
sfixed32Always four bytes.int32intintint32
sfixed64Always eight bytes.int64int/longlongint64
boolboolbooleanbooleanbool
stringA string must always contain UTF-8 encoded or 7-bit ASCII text.stringstr/unicodeStringstring
bytesMay contain any arbitrary sequence of bytes.stringstrByteString[]byte
Generated November 19, 2021