NCBI Logo
NCBI News




In this issue


Field Guide
to GenBank


Human Reference Sequence

UniGene Expands

Rat Genome Assembly

Taxonomy Browser

Search the
NCBI Web


Recent Publications

New Genomes
in GenBank


Entrez Quiz

Submissions Corner

GenBank Cumulative Updates

GenBank
Release 135


Masthead

 








Submitting a Segmented Set


A segmented set consists of a number of non-contiguous sequence blocks with a known order and orientation that are grouped together as a set on the basis of their physical proximity. Examples include a set of exon or intron sequences for a gene, or a set of internal transcribed spacers for a ribosomal RNA gene cluster. The submission of a segmented set to GenBank is a fairly straightforward procedure using NCBI’s Sequin program.

In a typical scenario, one may have sequenced the six exons of a gene but not the introns. The sequences can be submitted as a segmented set in order to show that the exon sequences belong to one gene. The set can be used to show the relationship between the exons to form a joined mRNA and coding region using the locations of the exons from the individual “parts” entries.

Since GenBank represents each contiguous piece of DNA as one entry, six accession numbers for the “parts” of the set will be issued, however when the set is released into GenBank an additional accession number, beginning with the letters AH, is assigned to the set as a whole. Searches by gene name in Entrez, retrieve both the segmented entry and the six entries consisting of the individual exon “parts”.

Statistics for the first version of the human genome reference sequence, NCBI human genome build 33

Figure 1: Graphical view of a segmented set in Sequin. The first bar from the top indicates that the entire group of sequences covers 3,617 bases. The next two bars show the positions of the six exon segments of the set. The lower three bars depict the gene, APOLIV, one of its transcripts, and the coding sequence derived from the transcript, respectively. The transcript variant shown lacks exon 3.

The Sequin graphical display of a segmented set of exons for an apolipoprotein L-IV variant which lacks exon 3 view appears in Figure 1. The segmented sequence is made up of six parts depicted in bars 2 and 3 from the top. The gene, shown next, spans all six exons . The mRNA variant, below, consists of the exons 1, 2, 4, 5 and 6. The coding region, shown at the bottom, spans the coding portions of the exons covered by the transcript.

Instructions on how to download and use Sequin are provided at:
www.ncbi.nlm.nih.gov/Sequin/index.html.




Three Easy Steps to Submission


You can use Sequin to submit a segmented set comprised of exons in three easy steps.

STEP 1
Save the nucleotide sequences of the “parts” to a file as a set of catenated FASTA formatted sequences. For example:

>seq_first [organism=Homo sapiens] first exon
GAGGTGCTGGGGAGCA....

>seq_last [organism=Homo sapiens] last exon
CCCCTCTTTTCCTGCCCAAG....


STEP 2
Save the amino acid sequence of each of the conceptual translations resulting from all relevant combinations of exons in FASTA format to separate files. For example:

>[protein=apolipoprotein L-IV splice variant a]
MEGAALLKIFVVCIWVQQNHPGWTVAGQFQEKKRFTEEVIEYFQ...


STEP 3
In Sequin, choose “Start New Submission”. Fill in the submission and contact information as usual, choose “Segmented sequence” in the “Sequence Format” panel and import the set of nucleotide sequences prepared as above by clicking on “Import Nucleotide FASTA”.

To add the coding region information, click on “Annotate–Coding Region and Transcript-CDS”. Then use “File-Import Protein FASTA” to import your protein translations one at a time. As each amino acid sequence is imported, Sequin calculates the correct coding region nucleotide locations with respect to the “parts”. Add the protein name under “Protein-Name”, then click on “Accept”.

Add the mRNA feature using “Annotate–Coding Region and Transcript-mRNA”. Add the name of the mRNA under “mRNA-Name”. Under the “Location” tab, choose the appropriate “SeqId” and “Strand”, and fill in the “From” and “To” information for each of the exons. You may wish to create the gene feature, by using “properties–Gene-New”. Add the “Gene Symbol”, for example APOLIV in this case. Finally, click on “Accept” and “Validate” to check if there are any errors. To submit your segmented set, save the record in Sequin and e-mail it to: gb-sub@ncbi.nlm.nih.gov




Continue to: GenBank Cumulative Updates


NCBI News | Fall/Winter 2002 NCBI News: Spring 2003