Virus Variation database querry builder
- Select the sequence type [Protein or Nucleotide]
- Select/enter values from the virus diagram, lists and/or text boxes to build your query. The Ctrl and Shift keys can be used to make multiple selections.
- Select Full-length genomes only to restrict the querry to full-length or nearly full-length genomes. This selection applies to all queries in the Query Builder table.
- Press the "Add to Query Builder," and the querry and number of matching sequences will be displayed in the "Querry results table." Please note, all fields will be combined by "AND."
For example "Type 1 AND Disease DF AND From UTR5 to M". A line with the details of your query
as well as the number of sequences matching it will be added to the Query Builder. Multiple queries can be built in this way.
- Queries can be (de)selected with the checkmark at the beginning of each Query Builder line.
The [X] button deletes selected queries. The "Show results" button will open a new window
with a detailed view of the sequence results and options to create sequence alignments and phylogentic trees from the retrieved sequences.
Downloading selected sequences
- Sequences or GenBank accessions returned from queries can be downloaded from both the query builder and results pages.
- Please note: When you download sequences from query builder and results pages, the complete sequence contained within the selected
GenBank records is returned. Though individual GenBank records are created for each protein in a genome, individual records are
not created for mature peptides. So even though a specific mature peptide(s) may have been selected, the entire protein or nucleotide
sequence contained within the corresponding GenBank record will be returned in your download.
- To download only selected genome regions select "Do Multiple Alignment" from the query results page.
Once the alignment loads, you can download it in FASTA format. The downloaded file will contain only the sequences
of the selected genome region with dashes inserted where the are gaps in the alignment.
Query results interface
- Sequences in the results view can be sorted in decending or ascending order.
- Sequences can be (de)selected with the checkmark in the first column of the table.
- Selected sequences can be aligned with MUSCLE
or used to build phylogentic trees. Alignments are pre-calculated and are constrained to
the genome region selected in the Query Builder.
Multiple sequence alignment viewer
- Please note that only the region selected in the query is shown for each sequence.
The consensus and variability of the alignment is shown in the top frame of the multiple sequence alignment viewer.
- In the default view, residues identical to the consensus are highlighted in grey and gaps are denoted by dashes.
- Left click on a sequence to display a menu with the following options:
- Link out to view the GenBank record for that sequence.
- Select the sequence as the new anchor sequence, replacing the consensus.
- Change the display of all sequences so that invariant residues are displayed as dots.
- Alignments can be downloaded in fasta, clustal, phylip, nexus, ASN.1 formats
Sequence clustering and phylogenetic analysis
- Interactive tool DatasetExplorer is a part of the NCBI Virus Variation
Resource that provides an easy way to perform preliminary
analysis on nucleotide and protein sequences from the NCBI Virus Variation
Sequence Database. Datasets are visually
represented using phylogenetic/clustering trees. Users can select an
algorithm to be used for building a tree as well as similarity
- Overview of the Methodology
- Construction of clustering/phylogenetic trees can be started either
from the Results view or from the Alignment view. Either way, the tree
is based on the pre-calculated alignment (see above) of the selected
region of the query results.
- Sequence Region Selection
- After selecting "Build Tree", the graphic view of the multiple alignments of sequences selected
from the previous step is displayed. The black and red colors in the
graphics represent the presence and absence of amino acid residues at
the corresponding positions. The positions in the longest sequence of
the selected set for the first and last amino acid of each sequence are
shown. A histogram showing the total number of amino acid residues at
each position is displayed at the top of the page. The program
automatically selects the sequence region to be analyzed so that the
majority of the sequences in the set will be included. The sequence
region can also be defined by users by first selecting all sequences in
the set, and then entering the start and end positions in the boxes
provided. When clicking the "Select sequences" button, the region from
sequences that have complete coverage between the two positions will be
selected, and sequences excluded from the selection will be highlighted
with a background color in the graphic view.
- Phylogenetic/Clustering Tree
- A clustering or phylogenetic tree can be built by selecting one of
the clustering algorithms and a distance calculating method from the
list, and clicking the "Next step" button.
- Protein and Nucleotide Distances
- We offer different distance measures for calculating nucleotide
and protein pairwise sequence distances, such as those based on
Felsenstein F84 distance and Hammering distance for nucleotide
sequences; the Dayhoff PAM matrix, the JTT matrix model, the PBM model,
and Kimura's approximation for protein sequences implemented in the
PHYLIP package, as well as the mPAM weight matrix for protein
- Tree Modification
- An adaptive approach is used to visualize the tree in an
aggregated form adapted to the user's screen, allowing users to
interactively refine or aggregate visualization of different parts of
the tree (see a paper for details). A branch on the tree can be
selected by clicking the root node, and the resolution of the selected
branch can be changed by moving along the scale bar. Sequences on the
tree can be searched by the fields in the database, and the resulting
sequences or groups will be highlighted in green color.
- Tree Export
- The complete tree can be exported in the Newick format by clicking
the "Download full tree" button. The downloaded tree can be displayed
by many tree-viewing programs.