Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Feedback

Help

Welcome to NCBI Virus!

This page will help you to get started. It will guide you through the resource pages and explain available functionalities.

Please contact us if you have further questions.


How To

 


   

What is NCBI Virus?

Main functionalities

  1. Compare your sequence to those in the NCBI Virus database using NCBI BLAST algorithm. Learn more.

  2. Search, view and download nucleotide and protein sequences using virus name or taxonomy group. Learn more.

  3. Quickly access common data sets for all viruses, all human viruses, bacteriophages, or sequences released in the past month. Learn more.

  4. Explore the massive, normalized datasets and identify data trends. Learn more.

Back to Top

Ways to access NCBI Virus data

Select one of the three options to access NCBI Virus data.

Option 1:

Through the navigation menu in Find data tab select one of the dropdown links:

  • Search by sequence to use virus-specific NCBI BLAST tool. Learn more.

  • Search by virus to perform virus sequence search based on virus name or taxonomy. Lead more.

  • All viruses, Human viruses, Bacteriophages and New sequences (past one month) to view preselected data sets. Learn more.

Find data in menu

Option 2:

The same functionalities can be accessed through the buttons Search by sequence and Search by virus located on NCBI Virus home page.

The results can be viewed in tabled form and further refined via engaging various sequence attributes (metadata) using Refine results panel on the right and adding/removing columns of the table. You can also download results, perform multiple sequence alignment, and build phylogenetic trees based on selected results.

Find data buttons

Find more about BLAST search results table here, and virus name/taxonomy based search results here.

Back to Top

Option 3:

Through NCBI Visual Data Dashboard via statistics buttons located in the top row of the dashboard. Learn more.


NCBI Virus BLAST™ tool

The NCBI Virus BLAST™ tool provides rapid insight into query sequences by presenting BLASTn and BLASTp results alongside normalized metadata, when available. These attributes include: isolation source, host, country, collection and release date, as well as taxonomy and genetic attributes such as completeness, and segment or protein names when applicable. The normalized metadata is generated via an internal, curator-guided data-processing pipeline that maps sequence-record attributes to standardized vocabularies to provide a user-friendly view of the data.

Compare your sequence to those in the NCBI Virus database using the BLAST algorithm

  • Press on the button Search by sequence (or select this option from the Find data navigation tab on the top of the page).

  • Select Nucleotide or Protein tab. Nucleotide tab allows to perform BLASTn search (search against all NCBI virus nucleotide sequences). Protein tab allows to perform BLASTp search (search against all NCBI virus protein sequences). Read more about BLAST™ searches at NCBI BLAST Guide.

  • In NCBI Virus Search by sequence input form enter NCBI sequence accession sequence in plain text or FASTA format and click Start search.

  • The BLAST search results will open in separate window in a tabular format.

Blast

Back to Top

Compare BLAST results in tabular display

Nucleotide tab allows to perform BLASTN search (using Megablast - optimize for highly similar sequences - search against all NCBI virus nucleotide sequences).

Protein tab allows to perform BLASTP search (search against all NCBI virus protein sequences). Read more about BLAST algorithms on NCBI BLAST help documentation.

In BLAST search results table you can compare search results in tabular display using the following sortable default columns:

  • Accession - the NCBI accession number of the NCBI Virus database sequence.

  • Coverage - query coverage.

  • Identity - the highest percent identity of all query-subject alignments.

  • Species – virus species name.

  • Country – country/region of virus specimen collection.

  • Host – virus isolation host (read more about isolation host vocabulary mapping). If isolation host is unknown (/host field of the GenBank record), but laboratory host is present (as indicated in /lab_host field of the GenBank record), the laboratory host will be present in the host column of the results table. If both isolation host and laboratory host can be mapped, only isolation host will be presented in the host column of the table.

  • Collection Date – virus specimen collection date.

Blast default columns

BLAST results can be customized by adding/removing additional columns from the results table in Select columns dropdown menu.

Additional columns include:

  • Score - the total alignment scores (Total score) from all alignment segments.

  • Release date - the date when sequence was released (publicly appeared) in GenBank or other INSDC databases.

  • Genus.

  • Family.

  • Sequence type – complete/partial/proviral/refseq read more about sequence type here.

  • Genotype.

  • Genome region.

  • Segment – segment name in case of segmented viruses.

  • Protein – protein name in case if protein is selected.

  • Isolation source – sequence isolation source read more about isolation source here.

  • Lenght - sequence lenght.

  • BioSample – NCBI BioSample accession number.

  • GenBank title.

The default number of rows displayed in the results table is 200. You can change the number of table rows by selecting number results per page (200, 100, 50 or 25) in Select Columns menu.

Blast additional columns

Back to Top

View BLAST Alignment of selected sequences results

To compare search results in pair-wise alignment:

  • Select sequences to display.

  • Click on View BLAST Alignment of selected sequences results link displayed in the center of the Info panel located above the results table.

The new page will show graphical view of pair-wise alignments of selected BLAST results with query included as a reference with feature map (if available) of query on the top.

Blast graphic

Read more how to use alignment viewer please refer to NCBI Multiple Sequence Alignment Viewer documentation.

Back to Top

Build multiple sequence alignment of selected BLAST results

To build multiple sequences alignment based on selected BLAST results:

  • Select sequences that you want to align.

  • Press the button Align on the right above the results table.

Multiple sequence alignment will open at the new page. Multiple sequence alignments calculated using MUSCLE.

Blast align

Read more how to use alignment viewer please refer to NCBI Multiple Sequence Alignment Viewer documentation.

Back to Top

Build phylogenetic tree of selected BLAST results

To build a phylogenetic tree to see the relationships of selected sequences:

  • Select sequences to display.

  • Press the button labeled Build Phylogenetic Tree on the right above the results table.

The tree will be calculated and available in tree viewer on a separate page.

Blast tree

For more about Tree Viewer and how to use it, please refer to NCBI Tree Viewer help documentation located here.

Back to Top

Refine tabular BLAST results via filters:

1. Virus name or taxonomy

To Restrict search results to the particular virus group:

On BLAST result page in Refine Results panel (left upper corner) click on Virus.

In the text box paste or start typing a single virus taxonomy name, or taxid (only 5 top taxa will be shown).

Select your taxid (NCBI taxonomy database ID) from the flyout menu.

The filtered results will be presented in results table with the following 5 default sortable columns: accession, coverage, identity, species, country, host, collection date. Additional columns to display connected metadata can be added via the Customize Table menu. The query sequence will be highlighted in the first raw of the table.

Blast

Back to Top

2. Sequence type

All sequences (Nucleotide or Protein) available in the NCBI Virus resource viral sequences can be filtered based on following sequence types - complete, partial and refseq - and allows for multiple filters to be selected concurrently.

Complete nucleotide sequences - filter for all NCBI viral nucleotide sequences, where GenBank ASN.1 format contains the following descriptors: descr/molinfo/completeness=complete or there is a word 'complete' present in the record’s definition line (defline). It also includes complete reference records (RefSeqs).

Partial nucleotide sequence – filter for sequences that are not complete according to the definition above.

Proviral sequences - filter for sequnces that have "/proviral" source qualifier in the GenBank record.

Refseq filtered nucleotide sequences include all reference sequences for the selected virus. Note, that few refseqs are partial genomes, based on International Committee on Taxonomy of Viruses (ICTV) proposal.

If Protein tab selected and complete nucleotide sequence type filter applied, results will include all proteins from complete genomes or individiual complete segments in case of segmented viruses.

if Protein tab is selected, and refseq nucleotide sequence type filter applied, results will include all proteins derived from nuclotide reference sequences.

Back to Top

3. Geographic region

The Geographic region filter allows to type your country of interest in the text box or select the continent(s) of interest. Selecting a continent also selects all the countries within that continent automatically.

Clicking on the arrow next to a continent's name opens a secondary selection menu to (un)select country(s) belonging to the continent of interest. The selected countries are listed below the continent name.

If an entire continent is selected, the continent's name will be shown in a pillbox below, indicating that all countries for the continent are selected. If at least one country is selected, the corresponding continent is no longer displayed and instead, a pilbox for each selected country is shown below the associated continent. Each continent’s behavior is independent of the other continents.

Selection can be deselected by clicking on the pillboxes, and multiple concurrent selections are supported.

Blast filter geographic region

Back to Top

4. Isolation host or taxonomy

Enter a host name or taxid to the text box and several host terms will be suggested (only 20 top taxids will be shown). Select the desired host term and hit Enter. The results will be restricted to sequences in the database with the indicated host term. Multiple hosts can be filtered on simultaneously by adding additional host terms to the filter.

The terms for isolation host are parsed from the source/host field in a sequence's GenBank record. Parsed terms are mapped to a standardized vocabulary, which was derived by curators by aggregating the variety of terms in GenBank files. Common mis-spellings are also included in this mapping strategy. For example, "Accipter cooperii" is mapped to "Accipiter cooperii".

The terms for isolation hosts are displayd in the host column of the results table. In case if isolation source is unknown, but laboratory host is present (as indicated in /lab_host field of the GenBank record), the laboratory host will be present in the host column of the results table. If both isolation host and laboratory host can be mapped, only isolation host will be presented in the table (host column).

Blast filter host

Back to Top

5. Isolation source

The terms for isolation source are parsed from the isolation source field in a sequence's GenBank record. Examples of parsed terms are serum and plasma, which are all mapped to the standardized vocabulary term blood.

Common mis-spelling as well as regional spelling differences are included in the mapping strategy. Multiple terms can be selected.

Blast filter isolation source

Back to Top

6. Sample collection date

Collection date (From, To) - is the collection date for the sample from which the sequence was derived.

By default, the To: date is set to the current date.

Use mm/dd/yyyy or yyyy formats or click on the calendar icon and select dates.

Blast filter collection date

Back to Top

7. Sequence release date

Release date (From, To) – the date when sequence was released (publicly appeared) in GenBank or another INSDC database.

By default, the To: date is set to the current date.

Use mm/dd/yyyy or yyyy formats or click on the calendar icon and select dates.

Back to Top

8. Environmental samples

Environmental source filter allows to select virus sequences isolated from the environmental sources. Generally, environmental isolates are identified by searching for key terms, such as sewage or ocean water from /isolation_source and /note fields of GenBank records when /host field is empty.

Select Include - to include all sequences isolated from environmental sources to the results table.

Select Exclude - to exclude all sequences isolated from environmental sources to the results table.

Select Only - to view only sequences isolated from environmental sources.

Blast filter enviromental samples

Back to Top

9. Laboratory samples

Lab host filter allows to view laboratory isolated virus sequences. Lab host identified by searching lab host name in /lab_host field of GenBank record. Additionally (only for bacteriophages) if /host and /lab_host fields are empty, lab host identified by parsing lab host name from bacteriophage organism name of GenBank record.

Select Include - to include all laboratory isolated virus sequences to the results table.

Select Exclude - to exclude all laboratory isolated virus sequences to the results table.

Select Only - to view only laboratory isolated virus sequences.

Note: lab host name can be viewed in result table (in host column) only in cases when isolation host cannot be identified (/host field of GenBank record is empty).

Blast filter laboratory samples

Back to Top

10. Vaccine strains

Vaccine strains filter allows to find virus vaccine strain sequences. Vaccine strains identified by searching vaccine strain terms in /isolation_source, /note, /host and definition line of GenBank record.

Select Include - to include all virus vaccine strain sequences to the results table.

Select Exclude - to exclude all virus vaccine strain sequences to the results table.

Select Only - to view only virus vaccine strain sequences.

Blast filter vaccine strain samples

Back to Top


Search for sequences by virus name or taxonomy group

Find your virus sequence(s)

Option 1:

Select Search by virus dropdown option from navigation menu Find Data tab on any of NCBI Virus pages. This will open selection menu.

Start typing in the text box, then select your taxid (NCBI taxonomy database ID). To select all viral sequences, enter and then select the term viruses.

The results will be shown in the table.

Note: Please view a list of all viral taxonomy terms using the NCBI taxonomy pages.

Search by virus through menu

Option 2:

Click on button Search by virus located in the central part of NCBI virus home page.

Start typing in the text box, then select your taxid (NCBI taxonomy database ID).

This will open the tabular interface with sequences from selected taxonomy group.

Search by virus through buttons

Back to Top

Compare results in tabular display

Click on the Nucleotide tab to access genomic sequences, or the Protein tab to access amino acid sequences for individual proteins.

In virus search results table you can compare search results in tabular display using the following sortable default columns:

  • Accession - the NCBI accession number of the NCBI Virus database sequence.

  • Species – virus species name.

  • Sequence type – complete/partial/refseq (read more about sequence type here).

  • Country – country/region of virus specimen collection.

  • Host – virus isolation host (Read more about isolation source vocabulary mapping here). If isolation host is unknown (/host field of the GenBank record), but laboratory host is present (as indicated in /lab_host field of the GenBank record), the laboratory host will be present in the host column of the results table. If both isolation host and laboratory host can be mapped, only isolation host will be presented in the host column of the table.

  • Isolation source – sequence isolation source (read more about isolation source here).

  • Collection Date – virus specimen collection date.

Search results can be customized by adding/removing additional columns from the results table in Select Columns dropdown menu.

Search by virus results table

Additional columns include:

  • Release date - the date when sequence was released (publicly appeared) in GenBank or other INSDC databases.

  • Genus.

  • Family.

  • Genotype.

  • Genome region.

  • Segment – segment name in case of segmented viruses.

  • Length - sequence length.

  • Protein – protein name in case if protein is selected.

  • Length – sequence length.

  • BioSample – NCBI BioSample accession number.

  • GenBank title.

The default number of rows displayed in the results table is 200. You can change the number of table rows by selecting number results per page (200, 100, 50 or 25) in Select Columns menu.

Search by virus select columns

Back to Top

Build multiple sequence alignment of selected results

Please, reffer to the Build multiple sequence alignment of selected BLAST results, since functionality is the same.

Back to Top

Build phylogenetic tree of selected results

Please, reffer to the Build phylogenetic tree of selected BLAST results, since functionality is the same.

Back to Top

Refine tabular results via filters

Please, reffer to the Refine tabled BLAST results via filters, since functionality is the same.

Back to Top


View and download specific virus sequence sets

Find specific data sets

Option 1:

From navigation menu Find data tab select the desired group of viruses: Human viruses, Bacteriophages, Sequences uploaded to NCBI Virus for the past month or All virus sequences available in NCBI Virus.

Search for data set through menu

Option 2:

Click on button Search by sequence located in the central part of NCBI virus home page.

Select the desired popular virus searches group button located beneath the text box.

Search for data set through button

Both options will open the tabular display with the information about viruses from selected group.

Learn more how to compare results in tabular display, build multiple sequence alignment of selected results, build phylogenetic tree of selected results or refine tabled results via filters.

Option 3:

Use NCBI Visual Data Dashboard to explore, view and download the massive, normalized datasets. Learn more.

Download sequences

To download sequences in a variety of formats (FASTA, accession list, CSV or XML), choose Nucleotide or Protein tab and select sequences to download.

Press Download button on the upper right corner.

This will open the download menu consisting from 3 steps.

Step 1: Select Data Type.

  • Nucleotide, protein or coding region sequence (CDS) in FASTA format.
  • Accession list.
  • Results set - the result of the current table view with data from all selected columns in CSV format (tabled format) or in XML format.

Download menu step 1

Step 2: Select Records.

Select if you want download only selected records, or all available records.

Download menu step 2

Step 3

If in step 1 you selected Sequence Data (FASTA format), in step 3 you can select FASTA definition line for the sequences that you are going to download.

In case if nucleotide or protein sequence data were selected in Step 1, the default FASTA definition line will be presented in the format (accession) | (GenBank title) and will include the GenBank sequence accession number and GenBank title:

>AAO17794 |VP4 spike protein[Human rotavirus A].

In case if coding region option was selected, the default definition line format will be (nucleotide accession)_cds_(protein accession) | (GenBank title) and will include the related GenBank nucleotide sequence accession number, the indication that this is a coding region (cds), related GenBank protein accession number and related protein GenBank title:

>FJ839692_cds_ACP30660 |DNA helicase[Escherichia virus RB14].

You can change this default defline to fit your own needs by selecting Build custom sequence title option. Country, host, species and sequence type can be added to the defline.

Download menu step 3

If in Step 1 you selected the Results Set (the result of the current table view) in CSV format, the downloaded results will show all selected columns data. You can modify the selected columns and choose the columns you need in Step 3: Select columns to include in results set.

Back to Top

NCBI Visual Data Dashboard

NCBI Virus’s Visual Data Dashboard supports exploration and discovery across the massive, normalized datasets we offer. It may be used to identify data trends and select specific subsets of the data based on these trends.

Access sequence data via statistics buttons located in the top row of dashboard.

The following statics are available:

  • RefSeq Genomes - all viral nucleotide reference sequences available at NCBI (find more about reference sequences here).
  • GenBank Genomes – all NCBI viral nucleotide sequences, where GenBank ASN.1 format contains the following descriptors: descr/molinfo/completeness=complete or there is a word 'complete' present in the record’s definition line (defline). It also includes complete reference records (RefSeqs).
  • GenBank Nucleotides – all viral nucleotide records available at NCBI. It also includes RefSeqs.
  • RefSeq Proteins – all viral reference protein records available at NCBI.
  • GenBank Proteins - all NCBI viral protein sequences. It also includes RefSeq proteins.

By clicking on each statistic button, you will be referred to the corresponding sequences displayed in table form. The results can be further refined for various sequence attributes (metadata) via filters available on the results page (learn more here).

Statistics buttons

Explore virus taxonomy hierarchy using sunburst chart.

Virus taxonomy can be explored via an interactive sunburst chart. The default view represents the classification for all available NCBI viral taxa. The inner layer (ring) represents four non-taxonomic groups of viruses: RNA viruses, DNA viruses, DNA/RNA viruses (which includes reverse-transcribing viruses), and Unclassified viruses. Only 4 levels of the whole hierarchy are visible on the plot at a given time.

To explore virus taxonomy, click on any slice (section) of any layer on the sunburst chart. This will trigger the plot to zoom into the selected taxa and display any additional taxa below the selection. Each viral taxa name is displayed on a corresponding slice or can be viewed in the hover-over tooltip by placing your cursor over the slice. Dynamic breadcrumbs with viral taxa names are located above the sunburst plot. Breadcrumbs are also a secondary navigation system that show the location of the taxa in the hierarchy and clicking on one will refocus the plot on the selected taxa. You can also see breadcrumbs by hovering over any slice in the sunburst. Clicking on the center of the sunburst chart will return you to the parent taxa.

Taxonomy widget

Select specific virus taxonomy group and view statistics for specific sequence sets with quick links to download them.

After selecting a specific taxonomy group on sunburst chart, you can view and explore the updated statistics in the top row of the dashboard.

Taxonomy widget with statistics

Select a host term from the Host Distribution bar chart and see the distribution of that host among the various viral taxa.

The interactive Host Distribution chart shows the distribution of virus host species. Each host bar is proportional to the number of virus sequences isolated from this host. The total number of virus sequences for each bar can be viewed by hovering over the bar.

To select a host species, click on a bar or on a corresponding host name. This will highlight selected host, as well as all virus taxonomy groups containing sequences isolated from the selected host. Only one host can be selected at a time. Clicking on the selected host a second time will de-select it or you can use the Reset option available in the top right corner of the host chart. The statistics in the top row of the dashboard will be updated based on the selected host.

You can search for a host species by scrolling the scrollbar on Host Distribution Chart, or by using keyboard combination "CTRL+F".

Host widget with statistics

You can recet Host Distribution chart the the original view by pressing on button "Reset" in the upper right corner of the chart.

Reset Host widget

Explore viral taxonomy hierarchy within a given taxon highlighted by the host selection.

By clicking on a highlighted taxonomy group, you can further explore viral taxonomy hierarchy on sunburst chart. The lower layers that include taxa with sequences from the selected host will be highlighted. While zooming in, not all taxa will be highlighted if not all taxa include sequences from the selected host.

Host widget with taxonomy widget

Back to Top

How to find, view and download HIV-1 sequences and related metadata?

Public HIV-1 nucleotide and protein sequence data are displayed in HIV-1 data hub.

HIV-1 data hub can be accessed by typing and selecting HIV-1 in Search by virus name or taxonomy input form.

Alternatively, it can be accessed from NCBI home page by typing HIV-1 in search window. This will open another page with HIV-1 virus genome assembly information. Press on NCBI virus button to access HIV-1 data hub.

These are early days for HIV-1 data support in NCBI Virus. Please stay tuned for updates and further details relevant to HIV-1.

Back to Top