Genome Submission Wizard

Genome Submission Wizard opens a stand-alone, tabbed dialog in which a genome submission can be created. The process is similar to that used by other NCBI submission tools BankIt and Sequin to direct the input of sequence and other metadata to create a complete submission for GenBank. The submission file can then be further edited and validated in Genome Workbench before submitting the file to GenBank.

Open the dialog by choosing Genome Submission Wizard, the first item in the Submission menu, or by opening an existing FASTA or ASN.1 file with the File Open dialog. When starting the Submission Wizard, a submitter will be prompted to open a FASTA or ASN.1 file if no file is already open. If a file is already open, the Submission Wizard will import any information contained within the open file.

Information entered into the Submission Wizard can be exported and imported as template files. Template files can also be generated using the GenBank Submission Template.

Submitter

Submitter Name

Submitter Name

The Submitter/Name page in the Submission Wizard collects name and email contact information for the person primarily responsible for the submission. This contact name, the names listed as the authors of the sequence, and the authors of journal articles or other publications referencing this submission can be different but will be copied by default. The contact name will not appear in the GenBank record unless it is also listed as a sequence author or as an author on the related publication (see the Reference tab for more information).

Submitter Affiliation

Submitter Affiliation

The Submitter/Affiliation page collects additional information about the submitter. The affiliation information will also be used for the sequence authors and associated publications. State/province names are optional and should be entered only for those countries that use them. Items with an asterisk are required.

General

Submission General

The General tab page collects the BioProject and BioSample accessions associated with this submission. A BioProject is a collection of biological data related to a single initiative, originating from a single organization or from a consortium. The BioSample database contains descriptions of biological source materials used in experimental assays. See BioProjectand BioSample pages for information on how to obtain BioProject and BioSample accession numbers.

The BioProject and BioSample accessions will be listed in the flatfile display in the DBLink section.

DB Link

The General tab also collects the release date for the submission. Select a release date within the next three years or select “Immediately after processing” to have the genome released as soon as its processing has completed. The user can request an extension to the release date for an unreleased genome at any time. However, the genome will be released on its release date or when the accession or data is published or is publicly available, whichever is first.

Genome Info

Genome Info - Assembly

The Genome info/Assembly page collects information about how the genome was assembled.

Genome Info Assembly

Genome Info - Sequencing Information

The Genome info/Sequencing information page collects information about how the genome was sequenced.

Genome Info Other

Genome submissions require assembly information to be included within the Genome Assembly-Data structured comment. This structured comment includes the following metadata:

  • Assembly Name: a short name suitable for display (for example, LoxAfr_3.0 for a Loxodonta africana assembly, version 3.0)
  • Assembly Method: includes version or date the program was run (for example, Newbler v. 2.3 or Celera Assembly v. May 2010)
  • Genome Coverage (for example, 12x)
  • Sequencing Technology (for example, ABI 3730; 454 GS-FLX Titanium; Illumina GAIIx)

The Assembly Name is optional. Assembly Method requires 'v. ' between the algorithm name and its version (or the month and year it was run). If more than one sequencing technology was used, separate them with a semi-colon (for example, Sanger; Illumina GAIIx)

The information entered in these tabs will be displayed in the sequence record as a Genome Data Assembly comment, which appears in the COMMENT section and is framed by the default tags ##Genome-Assembly-Data-START## and ##Genome-Assembly-Data-END##.

Genome Assembly Data Flat File

This information can also be applied to the sequence data using the Submission->Comments->GenomeDataAssembly Comment menu item, and can be edited by clicking on the pen icon (pen icon) in the left margin of the flatfile record view.

Organism Info

The Organism info tab’s General and Additional qualifiers pages collect information about the biological source from which the sequence was isolated.

Organism Info - General

Organism Info General

Organism can be an established, known name, or it can be a new name, which the NCBI Taxonomy group will verify during processing. A strain name can be an established, known name, or it can be a new strain of an established or new organism. Isolate, cultivar, or breed may be reported instead of strain, but at least one of the four values must be supplied.

Organism Info - Additional Qualifiers

Organism Info Advanced

Organism information values can be entered into the fields individually, or a tab-delimited table of values can be imported using the ‘Import tab-delimited table’ button. (This table reader can also be accessed from the Submission->Import->Use Table Reader menu item.)

The information from these pages will be displayed in the sequence record as a source feature with associated qualifiers in the Features section.

Source Flat File

The submitter can also edit biological source information using the Submission->Tools->Bulk Source Edit menu item, where the submitter can add different information for different sequences. The user can also edit the source information for a single sequence by clicking on the pen icon (pen icon) in the left margin next to the source feature in the flatfile record view.

The Genome Submission Wizard is intended to be used for sequences where source information is the same for all sequences, with the exception of identifying sequences as chromosomes, plasmids, or organelles

Molecule Info

The Molecule info tab has three pages with options that allow the submitter to label sequences as chromosomes, plasmids, or organelles, as appropriate.

Molecule Info - Chromosome

Chromosome

If an organism has only one chromosome, no chromosome name is needed. If the organism has multiple chromosomes, each sequence that is localized to a chromosome should be labeled with the appropriate chromosome name.

The checkbox for “Is the chromosome” should be checked only if the chromosome is represented by a single sequence in the submission and if that sequence represents the entire chromosome (with or without gaps). If an individual chromosome is in more than one sequence, do not select the “Is the chromosome” option.

If the “Is the chromosome” option is selected, and the biological topology of the chromosome is circular check the circular box. If your circular chromosome is in one piece, but you were unable to circularize the molecule because there is a gap between the ends, add 100 N's to the end of the sequence to indicate the gap. If the ends of the circle overlap, the sequence should be trimmed so that the ends abut with no overlap. Fragments of circular chromosomes should not be marked as circular. If a chromosome is in more than one sequence, do not select the "circular" option.

Molecule Info - Plasmid

Plasmid

The Plasmid page allows a submitter to label sequences that represent plasmids or pieces of plasmids. If a plasmid sequence has no gaps and represents the complete molecule, it should be marked as complete and circular (if appropriate). If a circular plasmid sequence has gaps, but represents the complete molecule, it should still be marked as circular but not complete. If the sequence is just a fragment of the plasmid it should still be labelled as a plasmid, but not marked as circular or complete.

Molecule Info - Organelle

Organelle

The Organelle page allows the submitter to label sequences that represent organelles or fragments of organelles. If an organelle sequence has no gaps and represents the complete molecule, it should be marked as complete and circular (if appropriate). If a circular organelle sequence has gaps, but represents the complete molecule, it should still be marked as circular but not complete. If the sequence is just a fragment of the organelle it should still be labelled as an organelle, but not marked as circular or complete.

Annotation

The Annotation tab allows a submitter to import Feature Table files that describe the features (CDSs, rRNAs, tRNAs, misc_features, etc) on the sequences.

Annotation

A submitter may also use the Submission->Import->Import 5 Column Feature Table and Submission->Import->Import GFF3 File menu items to import annotation files to be added to the sequences, or may use the items in the Submission->Features menu to add individual features to a sequence. More information about importing feature tables can be found in the Import Manual.

The Feature Table shown on this page can also be displayed using the Submission->Reports->Show Feature Table menu item. Еxisting features can be edited by clicking on the pen icon (pen icon) in the left margin next to the Feature in the flatfile display of the sequence record in the Text View.

Reference

The Reference tab has two subpages, Sequence authors and Publication.

Reference - Sequence Authors

Sequence Authors

The Sequence authors page collects the names of the researchers associated with the sequencing and related analysis of the data.

Reference - Publication

Publication

The Publication page collects information about the paper, book chapter, thesis, etc associated with this submission, which can be Unpublished, In-press, or Published.

Publications can also be added to the sequence data using the Submission->Add Publication menu item.

Validation

The Validation page has two pages: Validate and Submitter Report. These reports are not run automatically. A submitter must click “Validate record” on the Validate page or “Refresh Submitter Report” on the Submitter Report page after finishing or making changes to a record to view new results.

Validation - Validate

Sub Validator

The Validator can also be launched with the Submission->Reports->Validate menu item. More information about the Validator can be found in the Validator Manual.

Validation - Submitter Report

Sub Submitter Report

The Submitter Report can also be launched with the Submission->Reports->Show Submitter Report menu item. More information about the Submitter Report can be found in the Submitter Report Manual.

To complete a submission, click the “Finish” button on the Submitter Report page. If any information is missing from the submission, a Missing Fields report pop-up will list the missing items.

If no items are missing, a submitter is prompted to save the finished submission as an .asn file, which can then be further edited in Genome Workbench or submitted to GenBank.

For more information please see the full documentation for NCBI Genome Workbench Editing Package.

Support Center

Last updated: 2019-07-03T16:38:43Z