NCBI Logo
NCBI News




In this issue

Plasmodium
falciparum


Third Party
Annotation


Map Viewer

What’s the
Longest Sequence
in GenBank?


Structure Summaries

PubMed Central

The NCBI
Handbook


BLAST Lab

New Microbial
Genomes


GenBank
Release 133


Masthead

 



What’s the Longest Sequence in GenBank?
How About the Largest Protein?


The Entrez search system makes it relatively easy to determine the answers to both of these questions. A bit of trial and error yields:

  25000000:50000000[Sequence Length]
NOT “srcdb refseq”[Properties

This query, which ensures that the sequence we find is in the primary database, GenBank, and is not a derivative record from the NCBI RefSeq database, picks up a single record:

 

Locus

AE014297 27890790 bp
DNA linear CON 18-SEP-2002
  Definition Definition Drosophila melanogaster chromosome 3R,
complete sequence.

This sequence is part of the recently deposited build 3 of the Drosophila melanogaster genome visible in the Map Viewer.

The longest protein, found using...

  30000:50000[Sequence Length]

...turns out to be human Titin, NP_ 596869, which is an astounding 34,350 amino acids in length. Titin is a muscle protein that binds to the Z-disc region and the Z-line and M-line of the sarcomere, respectively, so that a single titin molecule spans half the length of a sarcomere. As part of its processing of this RefSeq, NCBI has identified 274 “Immunoglobulin” and 264 “Fibronectin” domains within this isoform of titin.



Continue


NCBI News | Fall/Winter 2002 NCBI News