 |

Field Guide
to GenBank
Human Reference Sequence
UniGene Expands
Rat Genome Assembly
Taxonomy Browser
Search the
NCBI Web
Recent Publications
New Genomes
in GenBank
Entrez Quiz
Submissions Corner
GenBank Cumulative Updates
GenBank
Release 135
Masthead
|
 |

Submitting a Segmented Set
A segmented set consists of a number
of non-contiguous sequence blocks with a known order and orientation
that are grouped together as a set on the basis of their physical proximity.
Examples include a set of exon or intron sequences for a gene, or a set
of internal transcribed spacers for a ribosomal RNA gene cluster. The
submission of a segmented set to GenBank is a fairly straightforward
procedure using NCBI’s Sequin program.
In a typical scenario, one may have sequenced the six exons of a gene
but not the introns. The sequences can be submitted as a segmented set
in order to show that the exon sequences belong to one gene. The
set can be used to show the relationship between the exons to form a
joined mRNA and coding region using the locations of the exons from the
individual “parts” entries.
Since GenBank represents each contiguous piece of DNA as one entry, six
accession numbers for the “parts” of the set will be issued,
however when the set is released into GenBank an additional accession
number, beginning with the letters AH, is assigned to the set as a whole.
Searches by gene name in Entrez, retrieve both the segmented entry and
the six entries consisting of the individual exon “parts”.
Figure 1: Graphical
view of a segmented set in Sequin. The first bar from the top indicates
that the entire group of sequences covers 3,617 bases. The next two bars
show the positions of the six exon segments of the set. The lower three
bars depict the gene, APOLIV, one of its transcripts, and the coding
sequence derived from the transcript, respectively. The transcript variant
shown lacks exon 3.
The Sequin graphical display of a segmented set of exons for an apolipoprotein
L-IV variant which lacks exon 3 view appears in Figure 1. The segmented
sequence is made up of six parts depicted in bars 2 and 3 from the top.
The gene, shown next, spans all six exons . The mRNA variant, below,
consists of the exons 1, 2, 4, 5 and 6. The coding region, shown at the
bottom, spans the coding portions of the exons covered by the transcript.
Instructions on how to download and use Sequin are provided at:
www.ncbi.nlm.nih.gov/Sequin/index.html.
Three Easy Steps
to Submission
You can use Sequin to submit a segmented set comprised of
exons in three easy
steps.
STEP 1
Save the nucleotide sequences of the “parts” to a file as
a set of catenated FASTA formatted sequences. For example:
>seq_first [organism=Homo sapiens] first exon
GAGGTGCTGGGGAGCA....
>seq_last [organism=Homo sapiens] last exon
CCCCTCTTTTCCTGCCCAAG....
STEP 2
Save the amino acid sequence of each of the conceptual translations resulting
from all relevant combinations of exons in FASTA format to separate files. For
example:
>[protein=apolipoprotein L-IV splice variant a]
MEGAALLKIFVVCIWVQQNHPGWTVAGQFQEKKRFTEEVIEYFQ...
STEP 3
In Sequin, choose “Start New Submission”. Fill in the submission
and contact information as usual, choose “Segmented sequence” in
the “Sequence Format” panel and import the set of nucleotide sequences
prepared as above by clicking on “Import Nucleotide FASTA”.
To add the coding region information, click on “Annotate–Coding Region
and Transcript-CDS”. Then use “File-Import Protein FASTA” to
import your protein translations one at a time. As each amino acid sequence is
imported, Sequin calculates the correct coding region nucleotide locations with
respect to the “parts”. Add the protein name under “Protein-Name”,
then click on “Accept”.
Add the mRNA feature using “Annotate–Coding Region and Transcript-mRNA”.
Add the name of the mRNA under “mRNA-Name”. Under the “Location” tab,
choose the appropriate “SeqId” and “Strand”, and fill
in the “From” and “To” information for each of the exons.
You may wish to create the gene feature, by using “properties–Gene-New”.
Add the “Gene Symbol”, for example APOLIV in this case. Finally,
click on “Accept” and “Validate” to check if there are
any errors. To submit your segmented set, save the record in Sequin and e-mail
it to: gb-sub@ncbi.nlm.nih.gov
|