NCBI LogoNCBI News Masthead

In this issue...

Model Maker

Virus Reference

Release of the
1,000th Virus
Reference Genome

New MapViewer

Other Genomic

Mouse Genome

New Genomes
in GenBank



Trace Archive


Find Out
“About NCBI”


Barbara Rapp
Leaves NCBI


ProtEST: A Window on Protein Matches to ESTs

Trace Archive Expands

The NCBI Trace Archive contains traces for EST, WGS, shotgun, and finishing projects from a variety of organisms, including traces from human, mouse, rat, zebrafish, worm, mosquito, frog, soybean, rice, and others. The most current listing can be found by following the “Trace Archive” link from the NCBI home page to the Trace Archive page. The trace data may be queried using a text-based search interface on the Trace Archive page, and an FTP link from this page allows for bulk downloads. The trace data may also be searched using MegaBLAST via the link provided.

ProtEST is a new NCBI tool, analogous to BLASTLink, that presents a graphical view of pre-computed alignments between protein sequences and the translations of UniGene nucleotide sequences.

To generate the alignments, the 6-frame translations of mRNA and EST sequences in UniGene are compared to protein sequences using BLAST. The proteins compared for ProtEST are limited to those from eight model organisms: Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Arabidopsis thaliana, and Escherichia coli. In order to exclude protein sequences that are derived from conceptual translations or models, the protein sequences from these model organisms are further limited to those derived from the structural databases, Swissprot, PIR, PDB, or PRF. ProtEST reports are updated in tandem with UniGene protein similarities.

ProtEST is accessible from any UniGene cluster page via links from the model organism protein similarities. For each nucleotide sequence match, the report shows the UniGene cluster ID, the GenBank accession number of the sequence, and the percent identity between the protein and nucleotide translation in the aligned region. When trace data is available, a link is also provided to the sequence trace in the NCBI Trace Archive.

A schematic of the aligned region is linked to a BLAST2Sequences alignment display. Entries in the report can be sorted on the basis of percent identity, alignment length, alignment origin, and alignment end-point, UniGene cluster ID, and GenBank accession code. The entries shown can then be filtered by percent identity or by the length of the alignment.

Figure 1: ProtEST report for BLASTX alignments to human acid phosphatase I.

A typical ProtEST report is shown in Figure 1 above for human acid phosphatase I. This protein hits ESTs from Bos taurus, Arabidopsis thaliana, Rattus norvegicus, and Xenopus laevis, however, the display has been limited to ESTs from Xenopus and sorted by alignment length. Two ESTs from Xenopus are associated with sequencing traces available in the Trace Archive as indicated by the “T” link next to the accession number.


NCBI News | Spring 2002 NCBI News Footer NCBI News