Introduction

Workflow

Intro Workflow

Using Sequence Editing Package to Submit Genome Data to GenBank

Genome Workbench offers a Sequence Editing Package that allows users to create, edit, validate, and submit a genome sequence submission to GenBank. The package includes a pop-up, tabbed wizard that directs a submitter through the data input steps needed to create a submission and a menu of editing and reports tools that can be used on an existing submission.

Getting Started

The Sequence Editing Package must be enabled before using Genome Workbench to edit sequence data or prepare a submission for GenBank. First, select Packages from the Tools menu.

Enable Editing Package 1

Next, select Sequence Editing from the list on the left. If the dialog shows Sequence Editing as “Loaded” then the Package has already been enabled. Click Cancel to exit the dialog. If the dialog shows Sequence Editing as Not Loaded, check the Enable box and click OK. Genome Workbench will then prompt you to restart so that the change can take effect.

Enable Editing Package 2

After restarting Genome Workbench, a new menu named “Submission” will be displayed.

Enable Editing Package 3

Begin the process of preparing a submission using the Submission Wizard (Submission -> Genome Submission Wizard). This tabbed dialog is the only mechanism for adding the contact information that is required for a submission. It also guides the user through the process of adding other metadata (for example, organism information, sequencing and assembly methods, reference information, and feature annotation). Open the dialog by choosing Genome Submission Wizard, the first item in the Submission menu, or by opening an existing FASTA or ASN.1 file with the File Open dialog. When starting the Submission Wizard, a submitter will be prompted to open a FASTA or ASN.1 file if no file is already open. If a file is already open, the Submission Wizard will import any information contained within the open file.

Some of the information that can be entered and edited via the Submission Wizard can be saved and loaded from a “template file” and reused for a different data set, for example, when the same set of authors should be used for multiple submissions. GenBank Sunbmission Template can also be used to generate template files.

A more detailed information can be found in the comprehensive manual for the Submission Wizard.

Visual Inspection

There are three main mechanisms for reviewing a sequence data record visually: the Text View, the Graphical Sequence View, and the Flatfile Summary.

Text View

The Text View provides a preview of how the data will appear as a GenBank record in the NCBI Nucleotide database, and allows the user to edit the data displayed. The Text View is automatically launched by the Submission Wizard if one has not already been created. To create a Text View without using the Submission Wizard, choose Open New View from the View menu and choose Text View under Generic.

Open Text View

Use the pen icon (pen icon) in the left margin of the Text View to edit the record's data, or use the X icon (x icon) to remove data. For example, if an author’s name was misspelled in a reference click on the pen icon to open an editing dialog to correct it.

More information about the Text View can be found in the Text View tutorial.

Graphical Sequence View

The Graphical Sequence View provides a visual display of the sequence data and the locations of features annotated on the sequence. It can be used to find mistakes in annotation by finding overlapping features that should not share nucleotide locations, or by finding features that should have common intervals that do not (for example, the exons of an mRNA and the corresponding coding region should match).

Launching the Graphical Sequence Viewer is similar to launching the Text View: choose Open New View from the View menu and select Graphical Sequence View under Sequence.

Graphical Sequence View 1

Selections in the Graphical Sequence View are mirrored in the Text View. Use the Text View to launch editors for objects seen in the Graphical Sequence View that need to be edited.

More information about the Graphical Sequence View can be found in the Graphical Sequence View tutorial.

FlatFile Summary

The FlatFile Summary is a tool for looking at which lines of text are either repeated or unique within the FlatFile display of all the sequences in a file. To launch the FlatFile Summary, choose Submission->Reports->FlatFile Summary menu item.

Flat File Summary 1

Lines of text are sorted by FlatFile display sections and displayed in the upper panel with the number of times that line appears in the FlatFile representations. When a line of text is selected, the objects that produced the line of text are listed in the lower panel. The FlatFile Summary can be used to find text that should be the same, but is different (for example, the organism names for all of the sequences in a genome submission should be the same), or to find text that should be different but is the same (for example, the chromosome names should all be different if all of the sequences are complete). It can also be used to navigate quickly between items in the sequence record that have the same type of information (for example, the user might want to look at all features for which Enzyme Commission Numbers (EC numbers) have been provided).

The FlatFile Summary is not automatically updated after changing the data. Use the Refresh button at the bottom of the dialog to incorporate the most recent changes.

More information about the FlatFile Summary can be found in the Flat File Summary manual.

Detecting Errors and Inconsistencies

The Validation Report and the Submitter Report are two tools that are designed to find errors and inconsistencies in sequence data. The Validator reports individual items that have problems, while the Submitter Report looks for signs of systemic problems in the record.

Validation Report

To launch the Validator dialog, select Validation Report from the Submission->Reports menu.

Validator 1

Validation Report errors can be filtered by Severity or Error title, and can be sorted by Severity, Sequence, Error title, and Message & Object Description. An editing dialog or tool can be launched to fix the problem reported by the error by clicking on the pen icon (pen icon) in the left column in the Text View of the sequence record. Clicking on underlined text will navigate to the portion of the FlatFile display that describes the item that has the problem.

Because validation may take a few seconds, the results shown will not automatically be updated after the data has been updated. Click the Refresh button to see which problems remain.

More information about the Validator can be found in the Validator manual.

Submitter Report

Select Submitter Report from the Submission->Reports menu to launch the Submitter Report.

The Submitter Report performs a series of tests to look for patterns and potential problems in sequence data, and produces a list of the items that were identified by each test. Some of the tests provide results that must be examined by the user to determine if there is a problem or not. For example, the Submitter Report will list the number of tRNA features that are present, which should be compared to the user’s expectations. The user may have a priori knowledge of how many of this type of feature should be present, given the completeness of the sequence and the identity of the organism from which the sequence was derived.

Submitter Report

The left panel in the Submitter Report dialog lists tests for which there are results. The right panel lists the items that were flagged by the selected test.

For guidance on interpreting Submitter Report tests and how to resolve problems discovered please see GenBank Common Discrepancy Reports.

More information about the Submitter Report can be found in the Submitter Report manual.

Saving the file and submitting

After resolving problems with the sequence data and doing a final proofread using the visual inspection tools, select Save Submission File from the Submission menu item to save the data to a file and upload the file at the NCBI Submission Portal.

The submitter will be asked to choose “Single” or “Batch/multiple” genomes. Both options can be used to upload a single file but will present the user with different options for providing metadata. After reviewing the summary page, click the Submit button. A temporary identifier will be provided to use when checking the status of the submission in Submission Portal or corresponding with GenBank staff, if necessary, until GenBank accession numbers have been assigned to the sequences.

For more information please watch the Introducing the Genome Submission Wizard Video Tutorial and see the full documentation for NCBI Genome Workbench Editing Package.

Support Center

Last updated: 2019-07-08T17:21:39Z