NCBI Logo
NCBI News




In this issue


The Reference Human Genome

SARS Coronavirus Resource

Gene Expression Omnibus (GEO)

Major Histocompat-ibility Complex database (dbMHC)

RefSeq Release 1 Ready for Download

GenBank Release 137

New Microbial Genomes in GenBank

Sequence Revision History Page Offers New Comparison Function

BLAST Lab

Masthead





SARS Coronavirus Resource

The first complete sequence of the SARS Coronavirus, determined by the BC Cancer Agency Genome Sciences Centre in Canada, was submitted to GenBank prior to publication as an unannotated nucleotide sequence and assigned GenBank accession number AY274119. The sequence was subsequently processed through the NCBI viral genome annotation pipeline and made available in Entrez Genomes under RefSeq accession NC_004718 as the SARS-CoV reference sequence within about 24 hours of its submission. The results of this computational analysis can be accessed from the Analysis section of the new SARS Coronavirus Resource, a Web page providing a point of access to sequence data and a wealth of other information about the SARS Coronavirus. The types of analyses available on the SARS Coronavirus Resource page are described below.

Pair-wise global alignments of NC_004718 with other viral genomic sequences, pre-computed using the “band” version of the Needleman-Wunsch algorithm, are shown in graphical representations highlighting mutations, deletions, and insertions among the sequences as shown in Figure 1. Global alignments are updated automatically as new virus sequences enter GenBank.

Figure 1.  Graphical format of global alignment of NC_004718 with other viral genome sequences.  The alignment is updated automatically as new virus sequences enter GenBank.

Click on image to view larger

Figure 1: Graphical format of global alignment of NC_004718 with other viral genome sequences. The alignment is updated automatically as new virus sequences enter GenBank.

Predicted SARS proteins are listed in a separate table, complete with information on the corresponding gene, accession number, length, and pre-computed comparison to other proteins. Pre-computed alignments of SARS-CoV protein sequences from the RefSeq collection of complete genomes in Entrez, are accessible from the column “mA” in the table. The alignments, such as that shown in Figure 2, were constructed using the ClustalX program and, in some cases, manually edited. Similarities are highlighted in color, if at least 80% of residues in a column are identical or fall into at least one of the following amino acid groups: aromatic (FHWY), aliphatic (ILVA), hydrophobic (ACFILMVWY), alcohol (STC), charged (DEHKR), polar (CDEHKNQRST), tiny (AGS), small (ACDGNPSTV), or bulky (EFIKLMQRWY).

Figure 2: Precomputed multiple alignment of the coronavirus nsP2 proteins.

Click on image to view larger

Figure 2: Precomputed multiple alignment of the coronavirus nsP2 proteins.

SARS protein sequences, compared by BLAST/PSI-BLAST to sequences with known 3D structures, are listed in the Related Structures section. The links present sequence alignments to 3D structures and mapping displays using the Cn3D molecular graphics viewer. Additional related structures were selected from the VAST 3D-structure neighbors of the proteins identified by BLAST/PSI-BLAST. Structure links are also updated automatically as new data enters the databases.

Figure 3a.

Figure 3a and 3b: Portion of the alignment of the sequence of putative nsp2 protein NP_828863 to that of the coronavirus main proteinase from Transmissible gastroenteritis virus, Protein Databank code 1LVO, Chain F. In the sequence alignment, identical residues are darker. In the Cn3D rendering of the structure 1LVO, residues that are identical in the alignment are shown in a spacefilling representation while intervening residues are shown as a backbone trace, illustrating a strong degree of sequence conservation in core regions of the protein.

Click on image to view larger

Figure 3a and 3b: Portion of the alignment of the sequence of putative nsp2 protein NP_828863 to that of the coronavirus main proteinase from Transmissible gastroenteritis virus, Protein Databank code 1LVO, Chain F. In the sequence alignment, identical residues are darker. In the Cn3D rendering of the structure 1LVO, residues that are identical in the alignment are shown in a spacefilling representation while intervening residues are shown as a backbone trace, illustrating a strong degree of sequence conservation in core regions of the protein.

In addition to the sequence analysis performed, automatic searches of SARS-related information in the Entrez databases—PubMed, Genomes, Nucleotide, Protein, Structures—are provided. Links to resources, such as the Center for Disease Control and the World Health Organization, are listed to provide comprehensive disease information. Access the SARS resource from Entrez Genome or directly at:

—VP




Continue to:  GEO


NCBI News | Fall/Winter 2002 NCBI News: Spring 2003