A Window on Protein Matches to ESTs
The NCBI Trace Archive contains
traces for EST, WGS, shotgun, and finishing projects
from a variety of organisms, including traces from
human, mouse, rat, zebrafish, worm, mosquito, frog,
soybean, rice, and others. The most current listing
can be found by following the Trace Archive
link from the NCBI home page to the Trace Archive
page. The trace data may be queried using a text-based
search interface on the Trace Archive page, and an
FTP link from this page allows for bulk downloads.
The trace data may also be searched using MegaBLAST
via the link provided.
ProtEST is a new NCBI tool, analogous to BLASTLink,
that presents a graphical view of pre-computed alignments between protein
sequences and the translations of UniGene nucleotide sequences.
To generate the alignments, the 6-frame translations of mRNA and EST sequences
in UniGene are compared to protein sequences using BLAST. The proteins
compared for ProtEST are limited to those from eight model organisms:
Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster,
Caenorhabditis elegans, Saccharomyces cerevisiae, Arabidopsis thaliana,
and Escherichia coli. In order to exclude protein sequences that
are derived from conceptual translations or models, the protein sequences
from these model organisms are further limited to those derived from the
structural databases, Swissprot, PIR, PDB, or PRF. ProtEST reports are
updated in tandem with UniGene protein similarities.
ProtEST is accessible from any UniGene cluster
page via links from the model organism protein similarities. For each
nucleotide sequence match, the report shows the UniGene cluster ID, the
GenBank accession number of the sequence, and the percent identity between
the protein and nucleotide translation in the aligned region. When trace
data is available, a link is also provided to the sequence trace in the
NCBI Trace Archive.
A schematic of the aligned region is linked to a BLAST2Sequences alignment
display. Entries in the report can be sorted on the basis of percent identity,
alignment length, alignment origin, and alignment end-point, UniGene cluster
ID, and GenBank accession code. The entries shown can then be filtered
by percent identity or by the length of the alignment.
ProtEST report for BLASTX alignments to
human acid phosphatase I.
A typical ProtEST report is shown in Figure
1 above for human acid phosphatase I. This protein hits ESTs from Bos
taurus, Arabidopsis thaliana, Rattus norvegicus, and Xenopus laevis,
however, the display has been limited to ESTs from Xenopus and
sorted by alignment length. Two ESTs from Xenopus are associated
with sequencing traces available in the Trace Archive as indicated by
the T link next to the accession number.