Virus Variation database querry builder

  1. Select the sequence type [Protein or Nucleotide]
  2. Select/enter values from the virus diagram, lists and/or text boxes to build your query. The Ctrl and Shift keys can be used to make multiple selections.
  3. Select Full-length genomes only to restrict the querry to full-length or nearly full-length genomes. This selection applies to all queries in the Query Builder table.
  4. Press the "Add to Query Builder," and the querry and number of matching sequences will be displayed in the "Querry results table." Please note, all fields will be combined by "AND." For example "Type 1 AND Disease DF AND From UTR5 to M". A line with the details of your query as well as the number of sequences matching it will be added to the Query Builder. Multiple queries can be built in this way.
  5. Queries can be (de)selected with the checkmark at the beginning of each Query Builder line. The [X] button deletes selected queries. The "Show results" button will open a new window with a detailed view of the sequence results and options to create sequence alignments and phylogentic trees from the retrieved sequences.

Downloading selected sequences

  1. Sequences or GenBank accessions returned from queries can be downloaded from both the query builder and results pages.
  2. Please note: When you download sequences from query builder and results pages, the complete sequence contained within the selected GenBank records is returned. Though individual GenBank records are created for each protein in a genome, individual records are not created for mature peptides. So even though a specific mature peptide(s) may have been selected, the entire protein or nucleotide sequence contained within the corresponding GenBank record will be returned in your download.
  3. To download only selected genome regions select "Do Multiple Alignment" from the query results page. Once the alignment loads, you can download it in FASTA format. The downloaded file will contain only the sequences of the selected genome region with dashes inserted where the are gaps in the alignment.

Query results interface

  1. Sequences in the results view can be sorted in decending or ascending order.
  2. Sequences can be (de)selected with the checkmark in the first column of the table.
  3. Selected sequences can be aligned with MUSCLE or used to build phylogentic trees. Alignments are pre-calculated and are constrained to the genome region selected in the Query Builder.

Multiple sequence alignment viewer

  1. Please note that only the region selected in the query is shown for each sequence. The consensus and variability of the alignment is shown in the top frame of the multiple sequence alignment viewer.
  2. In the default view, residues identical to the consensus are highlighted in grey and gaps are denoted by dashes.
  3. Left click on a sequence to display a menu with the following options:
    1. Link out to view the GenBank record for that sequence.
    2. Select the sequence as the new anchor sequence, replacing the consensus.
    3. Change the display of all sequences so that invariant residues are displayed as dots.
  4. Alignments can be downloaded in fasta, clustal, phylip, nexus, ASN.1 formats

Sequence clustering and phylogenetic analysis

Scope
Interactive tool DatasetExplorer is a part of the NCBI Virus Variation Resource that provides an easy way to perform preliminary analysis on nucleotide and protein sequences from the NCBI Virus Variation Sequence Database. Datasets are visually represented using phylogenetic/clustering trees. Users can select an algorithm to be used for building a tree as well as similarity criterion.
Overview of the Methodology
Construction of clustering/phylogenetic trees can be started either from the Results view or from the Alignment view. Either way, the tree is based on the pre-calculated alignment (see above) of the selected region of the query results.
Sequence Region Selection
After selecting "Build Tree", the graphic view of the multiple alignments of sequences selected from the previous step is displayed. The black and red colors in the graphics represent the presence and absence of amino acid residues at the corresponding positions. The positions in the longest sequence of the selected set for the first and last amino acid of each sequence are shown. A histogram showing the total number of amino acid residues at each position is displayed at the top of the page. The program automatically selects the sequence region to be analyzed so that the majority of the sequences in the set will be included. The sequence region can also be defined by users by first selecting all sequences in the set, and then entering the start and end positions in the boxes provided. When clicking the "Select sequences" button, the region from sequences that have complete coverage between the two positions will be selected, and sequences excluded from the selection will be highlighted with a background color in the graphic view.
Phylogenetic/Clustering Tree
A clustering or phylogenetic tree can be built by selecting one of the clustering algorithms and a distance calculating method from the list, and clicking the "Next step" button.
Protein and Nucleotide Distances
We offer different distance measures for calculating nucleotide and protein pairwise sequence distances, such as those based on Felsenstein F84 distance and Hammering distance for nucleotide sequences; the Dayhoff PAM matrix, the JTT matrix model, the PBM model, and Kimura's approximation for protein sequences implemented in the PHYLIP package, as well as the mPAM weight matrix for protein sequences.
Tree Modification
An adaptive approach is used to visualize the tree in an aggregated form adapted to the user's screen, allowing users to interactively refine or aggregate visualization of different parts of the tree (see a paper for details). A branch on the tree can be selected by clicking the root node, and the resolution of the selected branch can be changed by moving along the scale bar. Sequences on the tree can be searched by the fields in the database, and the resulting sequences or groups will be highlighted in green color.
Tree Export
The complete tree can be exported in the Newick format by clicking the "Download full tree" button. The downloaded tree can be displayed by many tree-viewing programs.