Reports

Feature Table

Feature Table opens a dialog that shows a five-column, tab-delimited table format of the features (source, CDS, gene, mRNA, rRNA, etc) applied to the current sequence. The table can be saved (exported), edited, and re-imported to correct or otherwise change the features on the current sequence.

GWB Reports Show Feature Table

What is a Feature Table?

A five-column, tab-delimited feature table contains features, their nucleotide locations, and their qualifiers in a specific format that can be read by Genome Workbench to add features to a sequence.

Valid features and qualifiers are restricted those approved by the International Nucleotide Sequence Database Collaboration. For information about making and importing a feature table, see Import 5 Column Feature Table manual

The feature table format specifies the type, location, and additional descriptive information of each feature, allowing Genome Workbench to process and add the features based on the sequence to which they apply. It also allows Genome Workbench to translate CDS features into proteins, which are shown on the sequence record.

The first line of a feature table has this format: >Feature SequenceID (‘greater than’ symbol – the word ‘Feature’ – space – SequenceID used in the corresponding FASTA file to identify the sequence).

The SequenceID is identical to that used to identify the sequence in the FASTA file. Subsequent lines of the table list nucleotide locations, features, and qualifiers. Each feature is on a separate line. Qualifiers describing that feature are on the line below. Columns are separated by tabs.

Line 1: Column 1: Start nucleotide location of feature Column 2: Stop nucleotide location of feature Column 3: Feature name Line 2: Column 4: Qualifier name Column 5: Qualifier value

Example of Feature Table for FASTA file beginning:

>SEQ1
CTGGGTTGTGTTATCAAAAACGCCAAGCGCAAGAAGCACCTAGTCGAACATGAAGAAGGG
ATCCCCTGGACAACTACATGGTTGCCGAAGATCCTTTTTTAGGACCCTTTTTAAC
…

>Feature SEQ1
1   750 gene
            gene    abc1
1   750 CDS
            product ABC1
925 795 rRNA
            product 16S ribosomal RNA
930 955 tRNA
            product tRNA-Phe
1005    >1480   gene
            gene    def2
1005    1250    CDS
1370    >1480
            product DEF2
            note        similar to yeast defensin

This example illustrates several characteristics about the table’s format.

  • Features that are on the complementary strand, such as the 16S rRNA, are indicated by reversing the interval locations.
  • Locations of partial (incomplete) features are indicated with a ">" or "<" next to the number. In this example, the def2/DEF2 gene and CDS end downstream of the end of the 1480 nucleotide sequence. The ">" symbol indicates that they are 3' partial features.
  • If a feature contains multiple intervals, like the DEF2 CDS, each interval is listed on a separate line by its start and stop position before subsequent qualifier lines.
  • Gene features are always a single interval, and their location should cover the intervals of all the relevant features. For example, the gene def2 is a single interval even though it’s corresponding CDS has two intervals.
  • If the gene feature spans the intervals of the CDS or mRNA features for that gene, there is no need to include gene qualifiers on those features in the table, because they will be read from the overlapping gene feature.
  • All CDS features must have at least one product.
  • A /note qualifier can be added to any feature using the qualifier note in the table. A note has been added to the DEF2 CDS.

In the Feature Table dialog, the Show option allows the Protein/Transcript Id and the Source Feature to be included in the display or not. After choosing to show or not show them, click Refresh to re-display the Feature Table with changes.

Feature Table displaying Protein Ids and Source Feature: GWB Reports Show Feature Table Show Both

Feature Table without Protein Ids and Source Feature: GWB Reports Show Feature Table Show Neither

The Feature Table can be searched by entering text in the box next to the binocular icon and clicking that icon. Case matching can be chosen or not chosen, depending on the stringency required for the search.

Validation Report

Validator 1

Genome Workbench provides two tools to help users look for problems with a file that is being prepared for submission to GenBank, the Validation Report and the Submitter Report. The Validation Report is focused on individual items that have problems, while the Submitter Report provides information about patterns in the submission. Not all items reported by the Submitter Report are necessarily a sign of a problem. For example, the Submitter Report will list the number of coding regions that are present, which can be compared to the user’s expectations.

The Validation Report can be launched in the Submission Dialog or from the menu item Submission->Reports->Validate.

Validation Report error messages will have a severity, associated sequence, title, and specific message. The user can click on any of these items to navigate to the object the message is describing or click on the pen icon in the leftmost column to launch an editor for the offending object or a tool for fixing the error. For example, if a sequence does not have any biological source information, the validator will report error code “NoSourceDescriptor”. Clicking on the pen icon (pen icon) for this error will launch a dialog to allow the user to add the biological source information for that sequence. If a sequence has ambiguity (N) characters at either end, the validator will report the error code “TerminalNs”. Clicking on the pen icon (pen icon) for this error will launch a dialog to trim the ambiguous characters from the ends of all sequences in the file. Note that this action could resolve multiple error messages, as it would affect all sequences with this problem. Also note that when trimming these ambiguity characters, the features that have already been annotated on the sequence will be adjusted so that they still cover the same relative nucleotide positions. The validator reports semantic and syntactic errors and are generally indicative of problems with the data in the submission. For example, a coding region that contains stop codons in the open reading frame and cannot produce a valid protein or a source qualifier value does not conform to INSDC syntax.

Validator messages are flagged with a severity level (REJECT, ERROR, WARNING, INFO). Prior to submission, all REJECT and ERROR level validator messages should be resolved. WARNING and INFO level messages report issues that may, in some instances be valid so it is not necessary to resolve these before submission.

The validator will not automatically update after the user has edited the data. The user can hit the Refresh to see the remaining problems.

The user can filter the messages to show all messages with a severity equal to or higher than a given level, or filter by error title.

FlatFile Summary

FlatFile Summary 1

The FlatFile Summary dialog provides a summary of the nucleotide sequences in the file. The content is produced by sorting the lines of the FlatFile representations for each nucleotide sequence in the file by section and counting the number of times each line appears identically. For UNIX users, this is similar to applying sort | uniq -c to the FlatFile sections.

The sorted text appears in the top panel of the dialog, grouped into the appropriate sections. These sections can be expanded to show the actual lines of text and the number of times that particular line appears. Clicking on a line of text will cause a list of the items that contain this text to appear in the bottom panel. Clicking on the item in the list will navigate to that item in the Text View, and double-clicking on the item will launch an editing dialog for the item. The FlatFile Summary tool is designed to help users quickly look for consistency and find unexpected duplicates. For example, a user might want to confirm that all sequences in a genome submission have the same organism name, but each sequence has a different chromosome value.

FlatFile Summary 2

The tool can also be a convenient mechanism for navigating a large record. For example, if the user wants to examine all of the features for which an Enzyme Commission Number (EC number) has been assigned, the user could expand the FEATURE section and look at the /EC_number section to see the list of features, and click on them one at a time.

Note that the FlatFile Summary is not automatically updated after changing the data, so the user should be sure to use the Refresh button to incorporate the most recent changes.

Submitter Report

Submitter Report

The Submitter Report is a tool to help users look for patterns and potential problems with a file. It performs a set of tests and lists the items flagged by each test. For example, the Submitter Report will list product names that may be uninformative, misspelled, contain text that would be more appropriate in a different field, or have other problems using the "SUSPECT_PRODUCT_NAMES" test. Not all items reported by the Submitter Report are necessarily a sign of a problem. For example, the "FEATURE_COUNT" test will list the number of each type of features that are present, which can be compared to the user’s expectations. Information about how to interpret individual tests and how to fix problems (when appropriate) can be found here.

The Submitter Report dialog consists of two panels at the top, a text search box, and some additional buttons at the bottom. The panel on the left lists the tests for which there are results. Clicking on an item in the left panel will cause the results for the test to be displayed in the panel on the right. For tests that refer to coding regions, RNA features, genes, or biological source information, double-clicking on the panel on the left will launch a Bulk Editor to help the user edit the affected items. The user may also click on individual items in the panel on the right to navigate to the item, or double-click on the item to launch an editor for the item. The Search control below the two panels can be used to find text in the left panel.

Note that the Submitter Report does not automatically refresh after editing, so the user must click on Refresh to see the updated results.

For more information please see the full documentation for NCBI Genome Workbench Editing Package.

Support Center

Last updated: 2019-07-03T16:39:25Z