NCBI Logo
NCBI News




In this issue


Entrez Query Goes “Global”

Register Your Genome Project Online at NCBI

New Genome Builds and Annotations

Entrez Gene Database Debuts

Recent Publications by NCBI Staff

New Microbial Genomes in GenBank

KOGs and COGs Now in CDD

Submission Corner


GenBank Release 139

UniGene Adds Four

RefSeq Version 3 Released

Masthead




 


Submitting a Population or Phylogenetic Sequence Set

The Entrez PopSet database accommodates four varieties of sequence set, to represent versions of a gene or sequence region derived from varying sources. The sources may be isolates of a single organism, comprising a “population set”, an ensemble of organisms, comprising a “phylogenetic set”, individuals from a population of unclassified or unknown organisms, comprising an “environmental set”, or various mutational forms, comprising a “mutation set”. Figure 1 shows a PopSet, consisting of GenBank records AF474412-AF474791 for Adelie penguin mitochondrial DNA. PopSets such as this one can be submitted to GenBank in four easy steps using NCBI’s sequence-submission tool, Sequin.

Step 1—Generate an alignment or FASTA set

Sequin can import alignments in any of the FASTA+GAP, PHYLIP, or NEXUS formats illustrated at:

The source information, such as isolates and specimen vouchers, can be included in the definition lines of the alignment to be imported by Sequin. The definition line begins with a “>” sign and is included at the bottom of the alignment for the PHYLIP or NEXUS format or just above the sequence for the FASTA+GAP format. An example of a definition line for a population set in NEXUS format from Escherichia coli strain ECOR10 is:

>[organism= Escherichia coli] [strain=ECOR10] [clone=1]

The modifier information must be included in square brackets with no spaces on either side of the “=” sign. There is no limit to the number of modifiers you can add from the list found at:


If the sequences are included in the form of an aligned or unaligned multiple FASTA sequence set, include the sequence identifier, in this case, “seqid”, after the “>” sign followed by the modifiers, as in:

>seqid1 [organism= Escherichia coli] [strain=ECOR10] [clone=1]

Step 2—Import the Sequences

After entering the submission and contact information in Sequin, choose “Population study”, “Phylogenetic study”, “Mutation study” or “Environmental samples” in the “Sequence Format” panel. Next, import the set of nucleotide sequences in the desired alignment format by clicking on “Import Nucleotide”.

Step 3—Add and propagate features

If you imported the sequences as an alignment, as opposed to an unaligned multiple FASTA set, you have an easy way of propagating features from one member to all members of the set. Select the first entry under “Target Sequence” and add the appropriate features to this entry using the Annotate panel. Then, you may propagate all or the selected features to the remaining entries using the Edit—Feature propagate option.

Step 4—Adding Distinguishing information

NCBI strongly encourages distinguishing information for the individual sequences in PopSets. This information can include strains for cultured bacteria, algae, fungi and laboratory animals; clones for sequences obtained by direct PCR-amplification and cloning of an environmental bulk DNA sample; and specimen vouchers for sets of multicellular organisms.

Strain identifiers help distinguish specific cultures from other isolates of the same taxon. A strain may be designated in a variety of ways, such as by the name of an individual, by a culture collection number or locality, by an arbitrary identifier, or by a label used within the submitting laboratory.

A specimen voucher is the remainder from which a sequence has been obtained, or, where coidentity is assured, a representative of the sequence source specimen. Vouchers should be deposited in respositories accessible to the public, such as herbaria or museum collections. Specimen vouchers allow verification of the identity of a taxon and serve as a source for additional molecular analyses. A common format for vouchers includes the collector's name and a unique number, plus the repository or its abbreviation. For example:

C.S. Shen 2459 (HMAS)
A.J. Smith 12.iii.2002 (AMNH)
H. Perrier s.n. (P)

Figure1. Views from Sequin and Entrez, respectively, of a population set of Adelie penguin mitochondrial DNA sequences taken from ancient bone and fresh blood for a study of rates of evolution.

Click on image to view larger

Figure1. Views from Sequin and Entrez, respectively, of a population set of Adelie penguin mitochondrial DNA sequences taken from ancient bone and fresh blood for a study of rates of evolution.

In the absence of specimen vouchers, the following source modifiers are helpful:

- cultivar, strain, isolate, breed, ecotype, or genotype name
- germplasm, seed, or stock center accession number
- collection locality, date, and/or collection number.

Along with the specimen voucher information, you may provide online images of the specimens that will be made available in Entrez through LinkOut. For an example, retrieve the entry AY090229 in Entrez and click on LinkOut through "Links" on the right hand side of the page to view an image of an insect specimen.

When your submission is complete, save the entries in the native Sequin format and e-mail to:

You will receive confirmation of your submission along with your accession numbers within approximately 2 business days.

—MB

 


Continue to:  GEO


NCBI News | Fall/Winter 2002 NCBI News: Spring 2003