NCBI Logo NCBI News
National Center for Biotechnology InformationSummer 2000


In this issue

Conserved Domain
Database Debuts with
RPS-BLAST Search
Interface

Enhanced Access to
Taxonomy Database

New Human-Mouse
Homology Map

Catch the Gene
Expression Omnibus

A Pair of Pathogens
Added to GenBank

Protein Molecular
Weight Field
Now in Entrez

OMIM In Entrez:
New Searching Power

Web Server
Software Available
for BLAST

Recent Publications

News Briefs

BLAST Lab

PSI-BLAST 2.1
Offers Composition-
Based Statistics

Slight Address Change
for NCBI FTP Server

Masthead



Conserved Domain Database Debuts
with RPS-BLAST Search Interface

Proteins often contain several domains, each with a distinct function. Such domains have evolved as modules that are combined in various arrangements to produce proteins of unique function. Conserved domains are structural modules that have been reused frequently during the process of evolution. NCBIs new Conserved Domain Search (CD-Search) service can be used to identify conserved domains in a protein sequence. The service provides a Web interface for searching NCBIs new Conserved Domain Database (CDD), with the Reverse Position-Specific BLAST program (RPS-BLAST), and retrieving domain alignments including 3-D structures.

The CDD contains domains derived principally from two public protein domain collections, the Simple Modular Architecture Research Tool (SMART)1 and Pfam,2 which include collections of multiple sequence alignments for the conserved domains they contain.

To produce the CDD, alignments from SMART and Pfam are processed to provide links from each sequence in the alignment to the protein division of Entrez. Sequences that cannot be found in Entrez databases are either omitted or replaced with a closely related sequence. Whenever possible, non-structurally anchored sequences in the alignment are replaced with closely related sequences that have direct links to 3-D structures.

From the sequences in the alignment for a domain, a representative sequence, preferably with a structure link, is chosen. A PSI-BLAST (Position-Specific Iterated BLAST)3 type Position-Specific Score Matrix (PSSM) is then calculated from the multiple sequence alignment for the aligned range of the representative sequence. The CD-Search service next uses the RPS-BLAST algorithm to search the resulting databases of PSSMs and identify conserved domains in a protein sequence.

RPS-BLAST is a variant of the PSI-BLAST program. Whereas PSI-BLAST first builds a PSSM that is used as a query for subsequent database searches, RPS-BLAST uses a protein sequence query to search a database of precalculated PSSMs in a single pass. The role of the PSSM has changed from “query” to “subject”, hence the term “reverse” in RPS-BLAST.

A link to the CD-Search tool is found on the main BLAST page. The search form accepts an amino acid query sequence as input. The query sequence is compared, using RPS-BLAST, to the CDD database of PSSMs. Search results may be displayed as pairwise alignments of the query sequence with a representative domain sequence, as shown in Figure 1, for the representative sequence for the aminotransferase class-I domain. The extent of the alignment is illustrated graphically at the top of the output and links to Pfam are given in the RPS-BLAST summary header. From this output page, a multiple sequence alignment may then be generated between the query and other representatives of the aminotransferase class-I domain shown in Figure 2. If a 3-D structure exists for the domain sequence, it may be viewed using Cn3D. In this example, a structure does exist. Cn3D would load the 1B8G_B structure into its structure window and the multiple sequence alignment into its linked sequence window. In this manner, the structural context of the query sequence, implied by the alignment, could be examined in detail.


Figure 1

Figure 1: CD-Search display showing alignment between a query sequence and the representative sequence from CDD for aminotransferase class-I domain defined in the Pfam database.


Figure 2

Figure 2: Multiple sequence alignment between a CD-Search query sequence and four representatives of the aminotransferase class-I domain in the CDD database.


The source databases used in the CDD are updated several times a year, in roughly bimonthly intervals. NCBI follows these updates and will adjust the CDD with not more than two months
delay. Try a CD-search at www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. DW


Notes
1. Bateman, A, et. al. Nucleic Acids Res 28:263–6, 2000.
2. Schultz, J, et. al. Nucleic Acids Res 28:231–4, 2000.
3. Altschul, SF, et. al. Nucleic Acids Res 25:3389–402, 1997.

Continue


NCBI News | Summer 2000