Editing Tools

Select Specific Sequence by Sequence ID

This menu item will allow you to type in a sequence ID then navigate to the record for that sequence in the Flat File View, instead of scrolling to it using the sequence list.

Lookup Taxonomy and Cleanup Record

This tool compares the organism name on the records to the NCBI taxonomy database to determine the taxon ID. If a matching organism is found the taxon ID is added to the record, the translation table for all coding regions is set to use the correct genetic code for the source organism. The coding regions are not retranslated. Use Retranslate Coding Regions to fix the translations. Some additional clean up functions are also run on the record. Lookup taxonomy will not work if the organism name is not yet in the NCBI taxonomy database. If the organism does not have a taxon ID but you know what genetic code should be used to translate proteins, one can be applied using Batch Apply Genetic Code under the Editing Tools menu.

Batch Apply Genetic Code

This tool can be used to add a genetic code to all coding regions of a genome when lookup taxonomy does not work (an organism does not have a taxon ID) and the genetic code to be used is known. Select the blank box and a list will appear with the available genetic codes to choose from. More information about the different genetic codes can be found in Taxonomy Browser Genetic Codes

Bulk Source Edit

Bulk Edit Source will display a table with all source qualifiers on all sequences. This list is initially displayed with duplicate values compressed and sorted based on the organism name. Click on the + value in the first column to expand the compressed duplicates.

Tools Expand Compressed Duplicates

Clicking on the top left corner will select all rows and order the features according to the feature ID number. Use the scroll bar on the right to scroll down to see all rows.

Tools Select All Rows

Clicking on the name of the qualifier in the column header will sort the table according to the selected qualifier. Those with identical values will be compressed and you will see the + sign again in the first column. In the below example sorting is based on plasmid-name.

Tools Plasmid Name Sorting

All columns except location can be edited. Individual rows can be selected by the buttons on the left side of each row. Multiple rows can be selected by holding down the ctrl button when selecting on the left side of the row. Individual qualifiers can be edited by clicking on that value and typing in the correction.

In the Select Rows menu, specific rows can be selected or unselected using text constraint of the qualifiers for editing by the action menu. Select All will highlight all rows in the table while Unselect All will remove the highlighting from any selected rows. In the example below, source fields are selected based on having the term plasmid in the genome qualifier.

Tools Select Rows Menu

The Action menu is where selected features can be globally modified. Select the action to perform in the first box and the qualifier to perform the action on in the second box. There is an Undo button in case the changes made are not the ones desired. There is also the option to perform the action on all of the rows (Apply To All) or only the rows that have been selected (Apply To Selected).

Apply will simply apply whatever text is entered to the selected qualifier.

Tools Action Menu

If text already exists you will be given options on whether to replace, append or ignore the new text. This box is also displayed when using the convert or parse actions if the qualifiers already contain text.

Tools Add New Text

Edit will make changes based on the values entered in the find and replace fields.

Tools Edit Find Replace

Remove will remove the qualifier selected.

Tools Remove Selected

Convert qualifiers moves the value in one qualifier to another qualifier. Convert qualifiers with the ‘leave on original’ button toggled is the same as ‘copy qualifiers’.

Tools Convert Qualifiers

Parse is a powerful action which allows the movement of parts of qualifiers from one field to the other. This can be used to move incorrectly fielded source information to another field.

Tools Parse Qualifiers

Change case can be used to change the way a qualifier is capitalized. This is useful for example when the genus name is not properly capitalized in the host field.

Tools Change Case

Once all changes have been completed (manual edits and/or using the action menu and applying changes using Apply to All and/or Apply to Selected), use the Accept button to commit the changes to the working file and close the bulk editor. The Cancel button will close the bulk editor and cancel all edits made. No changes will be made in the submission file.

Bulk CDS Edit

Bulk Edit CDS will display a table with all coding region qualifiers. This list is initially displayed with duplicate values compressed and sorted based on the product name. Click on the + value in the first column to expand the compressed duplicates. Use the scroll bar on the right to scroll down to see all rows.

Tools Expand Compressed Duplicates CDS

Clicking on the top left corner will select all rows and order the features according to the feature ID number.

Tools Select All Rows CDS

Clicking on the name of the qualifier in the column header will sort the table according to the selected qualifier. In the below example sorting is based on ec number.

Tools Sort By Selected Qualifier CDS

All columns except location can be edited. Individual rows can be selected by the buttons on the left side of each row. Multiple rows can be selected by holding down the ctrl button when selecting on the left side of the row. Individual qualifiers can be edited by clicking on that value in the table and typing in the correction.

In the Select Rows menu, specific rows can be selected or unselected using text constraint of the qualifiers for editing by the action menu. Select All will highlight all rows in the table while Unselect All will remove the highlighting from any selected rows. In the example below, rows are selected when the protein name contains 'hypothetical'.

Tools Protein Name Contains Hypothetical CDS

The Action menu is where selected features can be globally modified. Select the action to perform in the first box and the qualifier to perform the action on in the second box. There is an Undo button in case the changes made are not the ones desired. There is also the option to perform the action on all of the rows (Apply To All) or only the rows that have been selected (Apply To Selected).

Apply will simply apply whatever text is entered to the selected qualifier.

Tools Action Menu CDS

If text already exists you will be given options on whether to replace, append or ignore the new text. This box is also displayed when using the convert or parse actions if the qualifiers already contain text.

Tools Add Text CDS

Edit will make changes based on the values entered in the find and replace fields.

Tools Edit Text CDS

Remove will remove the qualifier selected.

Tools Remove Qualifier CDS

Convert qualifiers moves the value in one qualifier to another qualifier.

Convert qualifiers with the ‘leave on original’ button toggled is the same as ‘copy qualifiers’.

Tools Convert Qualifier CDS

Parse is a powerful action which allows the movement of parts of qualifiers from one field to another.

Tools Parse Qualifier CDS

Change case can be used to change the way a qualifier is capitalized. This can be used to adjust the case of locus names to conform to nomenclature guidelines.

Tools Change Case CDS

Once all changes have been completed (manual edits and/or using the action menu and applying changes using Apply to All and/or Apply to Selected), use the Accept button to commit the changes to the working file and close the bulk editor. The Cancel button will close the bulk editor and cancel all edits made. No changes will be made in the submission file.

Bulk Gene Edit

Bulk Edit Gene will display a table with all gene qualifiers. This list is initially displayed with duplicate values compressed and sorted based on the locus name. Click on the + value in the first column to expand the compressed duplicates. Use the scroll bar on the right to scroll down to see all rows.

Tools Expand Compressed Duplicates Gene

Clicking on the top left corner will select all rows and order the features according to the feature ID number.

Tools Select All Rows Gene

Clicking on the name of the qualifier in the column header will sort the table according to the selected qualifier. In the below example sorting is based on synonym.

Tools Sort By Selected Qualifier Gene

All columns except location can be edited. Individual rows can be selected by the buttons on the left side of each row. Multiple rows can be selected by holding down the ctrl button when selecting on the left side of the row. Individual qualifiers can be edited by clicking on that value in the table and typing in the correction.

In the Select Rows menu, specific rows can be selected or unselected using text constraint of the qualifiers for editing by the action menu. Select All will highlight all rows in the table while Unselect All will remove the highlighting from any selected rows. In the example below, rows are selected when the locus does not contain ubi.

Tools Locus Contains Ubi Gene

The Action menu is where selected features can be globally modified. Select the action to perform in the first box and the qualifier to perform the action on in the second box. There is an Undo button in case the changes made are not the ones desired. There is also the option to perform the action on all of the rows (Apply To All) or only the rows that have been selected (Apply To Selected).

Apply will simply apply whatever text is entered to the selected qualifier.

Tools Action Menu Gene

If text already exists you will be given options on whether to replace, append or ignore the new text. This box is also displayed when using the convert or parse actions if the qualifiers already contain text.

Tools Add Text Gene

Edit will make changes based on the values entered in the find and replace fields

Tools Edit Text Gene

Remove will remove the qualifier selected.

Tools Remove Qualifier Gene

Convert qualifiers moves the value in one qualifier to another qualifier.

Convert qualifiers with the ‘leave on original’ button toggled is the same as ‘copy qualifiers

Tools Convert Qualifier Gene

Parse is a powerful action which allows the movement of parts of qualifiers from one field to another.

Tools Parse Qualifier Gene

Change case can be used to change the way a qualifier is capitalized. This can be used to adjust the case of locus names to conform to nomenclature guidelines.

Tools Change Case Gene

Once all changes have been completed (manual edits and/or using the action menu and applying changes using Apply to All and/or Apply to Selected), use the Accept button to commit the changes to the working file and close the bulk editor. The Cancel button will close the bulk editor and cancel all edits made. No changes will be made in the submission file.

Bulk RNA Edit

Bulk Edit RNA will display a table with all RNA qualifiers from all RNA features (rRNA, tRNA, ncRNA, etc.). This list is initially displayed with duplicate values compressed and sorted based on the product name. Click on the + value in the first column to expand the compressed duplicates.

Tools Expand Compressed Duplicates RNA

Clicking on the top left corner will select all rows and order the features according to the feature ID number. Use the scroll bar on the right to scroll down to see all rows.

Tools Select All Rows RNA

Clicking on the name of the qualifier in the column header will sort the table according to the selected qualifier. Those with identical values will be compressed and you will see the + sign again in the first column. In the below example sorting is based on comment.

Tools Sort By Selected Qualifier RNS

All columns except location can be edited. Individual rows can be selected by the buttons on the left side of each row. Multiple rows can be selected by holding down the ctrl button when selecting on the left side of the row. Individual qualifiers can be edited by clicking on that value in the table and typing in the correction.

In the Select Rows menu, specific rows can be selected or unselected using text constraint of the qualifiers for editing by the action menu. Select All will highlight all rows in the table while Unselect All will remove the highlighting from any selected rows. In the example below, product names are selected that start with tRNA.

Tools Trna Product Names RNA

The Action menu is where selected features can be globally modified. Select the action to perform in the first box and the qualifier to perform the action on in the second box. There is an Undo button in case the changes made are not the ones desired. There is also the option to perform the action on all of the rows (Apply To All) or only the rows that have been selected (Apply To Selected).

Apply will simply apply whatever text is entered to the selected qualifier.

Tools Action Menu RNA

If text already exists you will be given options on whether to replace, append or ignore the new text. This box is also displayed when using the convert or parse actions if the qualifiers already contain text.

Tools Add Text RNA

Edit will make changes based on the values entered in the find and replace fields.

Tools Edit Text RNA

Remove will remove the qualifier selected.

Tools Remove Qualifier RNA

Convert qualifiers moves the value in one qualifier to another qualifier.

Convert qualifiers with the ‘leave on original’ button toggled is the same as ‘copy qualifiers

Tools Convert Qualifier RNA

Parse is a powerful action which allows the movement of parts of qualifiers from one field to another. This can be used to trim long product names by moving extra text to another qualifier.

Tools Parse Qualifier RNA

Change case can be used to change the way a qualifier is capitalized.

Tools Change Case RNA

Once all changes have been completed (manual edits and/or using the action menu and applying changes using Apply to All and/or Apply to Selected), use the Accept button to commit the changes to the working file and close the bulk editor. The Cancel button will close the bulk editor and cancel all edits made. No changes will be made in the submission file.

Remove Features

This menu item launches the dialog displayed below.

Tools Remove Features Dialog

This dialog allows the user to remove specific feature types. More than one feature type can be selected by holding the control button when selecting. Constraints can be added to limit the features removed. The constraints can include only removing features when specific text is present or conversely only when specific text is not present. There can be a constraint to only remove features on certain sequences by limiting by sequence identifier or conversely to remove all features on only the unlisted sequences. There are also location constraints that will be used rarely such as only removing features on the plus strand or the minus strand or removing only protein sequences or only nucleotide sequences.

Remove All Features

This menu item will remove all features from all nucleotide sequences as well as all protein sequences that are products of any CDS features.

Select Features

The Select Features tool can be used to select certain features within the submission file. In particular this can be used to select features to be converted with the convert feature function in the edit tools menu.

Tools Select Features Dialog

In the top window select the feature you wish to select. If you want to select all features of that type, simply hit accept and you will see all of the features highlighted with a green bar that have been selected. See the following example for the selection of tRNAs.

Tools Trnas Selection

Only the tRNAs have the green bar on the side.

If you want to select a subset of features, you can use the Feature type and qualifier boxes to select based on the text within a qualifier. For example, if you want to select only coding regions that have the word hypothetical in the product name, you can select CDS in the top box, select CDS in the feature type box, select product name in the qualifier box and then type the word hypothetical in the text box. Hitting accept will then put a bar next to all coding regions with the word hypothetical in the product name.

Convert Features

This menu item will allow the conversion of one class of feature to another. For example, misc_RNAs which should have been ncRNAs can be converted by selecting misc_RNA in the From box and ncRNA in the To box. If the conversion is allowed a description of what will occur will appear in the conversion function box. Not all conversions are supported. If the conversion is not allowed the note Conversion not supported will appear in the Conversion function window. There is a box to do the conversion and leave the original feature if you wish (this copies a feature to another feature). If no features are selected in the flatfile, all features of the type selected in the from box will be converted.

Tools Convert Features Dialog

If you only want to convert a subset of the features, you can select them first in the flatfile view and only those features will be converted (add link to select features instructions). Features can be selected by holding down the ctrl button and selecting in the flatfile or by using the Select Features function in the editing tools menu. Alternatively, you can use the constraints box to limit the features converted.

Note, if the new feature type does not allow some of the qualifiers from the old features, they will be removed. You may also end up with new features that are missing required qualifiers. If this happens, you will need to add these qualifiers back in one of the bulk editing menus or by clicking on the feature itself and editing that feature directly.

Retranslate Coding Regions

This menu item calculates the protein sequence expected for all coding regions that are not marked as a pseudogene using the provided coding region nucleotide spans and genetic code. Any existing protein sequence will be updated to match the expected translation. If no protein seqence exists, this function will create a new protein sequence for the associated coding region.

Adjust Features for Gaps

Coding regions are not allowed to cross gaps of unknown length. Features should also not begin or end in gaps. Coding region translations should not be more than 50% ambiguous because of N’s in the underlying sequence. The Adjust Features For Gaps tool can be used to trim features, remove, features or split features which are partially located within a gap or completely cross a gap.

Tools Select Feature Type

First select a specific feature type you wish to adjust. If you want to see all features which may be adjusted select All in the Feature box. For most cases, select CDS since most times it is a coding region feature that needs adjusting.

Second select the type of gap you want to adjust for. If you have already added the assembly gaps to your sequence using Sequences -> Add Assembly Gaps to Sequences, choose whether you want to adjust for Unknown length gaps or for Known length gaps or both. If you have not added gaps to the sequences, choose Ns and you can adjust features for any run of N’s in the sequence.

Finally choose whether to Trim the ends of features that are in gaps, Remove features entirely in gaps, or Split for internal gaps. Features can also be trimmed, removed or split even if the gap is in an intron using the last check box [Even when gaps are in introns]. The default setting is for all truncated features to be made partial. This can be changed using the Make truncated ends partial box to make all features partial unless pseudo or to never make truncated features partial.

Once all boxes are set accordingly, you will see a list of features which may be adjusted in the upper window.

Tools See Feature List

In the above example, all features that cross gaps of unknown length which can be split are selected for. There is one gene and one CDS which can be adjusted in the sequence. Column 1 lists the Feature type, Column 2 lists the name or identifier associated with that feature, Column 3 lists the action available, and Column 4 lists the location.

If only one or a subset of the listed features should be adjusted, either highlight the features to change in the top box or change the settings in the Feature box or the settings in the dialog. In this case, only the CDS feature should be updated, the gene feature can cross the gap.

Here are the features in the flatfile prior to splitting:

     gene            complement(214..1023)
                     /locus_tag="C3E89_00005"
     CDS             complement(214..1023)
                     /locus_tag="C3E89_00005"
                     /inference="COORDINATES: protein motif:HMM:PF02591.13"
                     /codon_start=1
                     /transl_table=11
                     /product="hypothetical protein"
                     /protein_id="test:C3E89_00005"
                     /translation="MGEIEGLWELQKHANILKDIGKSLKKIGSGDRIKSLSVKIEGTE
                     KRLLDLERKIEEKENRLNKANLVLKEYDSKLQEIEESLYTGSISDLKQLTFLNQEREN
                     IKSKIEDKEIEILYLLEEMEELKKEFILIREDFQAMRKEYKKVVKECKSIIEELRDKA
                     LCEKKRIEEISTVLDEKSLEKYNEXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
                     TKNRGIAVVEVIDNRCSGCNMVLPAIIIDKLKNNNSISYCENCDRILYLKK"
     assembly_gap    371..469
                     /estimated_length=unknown
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"

Here are the features after splitting:

     gene            complement(214..1023)
                     /locus_tag="C3E89_00005"
     CDS             complement(214..>370)
                     /locus_tag="C3E89_00005"
                     /inference="COORDINATES: protein motif:HMM:PF02591.13"
                     /note="coding region disrupted by sequencing gap"
                     /codon_start=2
                     /transl_table=11
                     /product="hypothetical protein"
                     /protein_id="test:C3E89_00005"
                     /translation="TKNRGIAVVEVIDNRCSGCNMVLPAIIIDKLKNNNSISYCENCD
                     RILYLKK"
     assembly_gap    371..469
                     /estimated_length=unknown
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"
     CDS             complement(<470..1023)
                     /locus_tag="C3E89_00005"
                     /inference="COORDINATES: protein motif:HMM:PF02591.13"
                     /note="coding region disrupted by sequencing gap"
                     /codon_start=1
                     /transl_table=11
                     /product="hypothetical protein"
                     /protein_id="test:C3E89_00005_1"
                     /translation="MGEIEGLWELQKHANILKDIGKSLKKIGSGDRIKSLSVKIEGTE
                     KRLLDLERKIEEKENRLNKANLVLKEYDSKLQEIEESLYTGSISDLKQLTFLNQEREN
                     IKSKIEDKEIEILYLLEEMEELKKEFILIREDFQAMRKEYKKVVKECKSIIEELRDKA
                     LCEKKRIEEISTVLDEKSLEKYNE"

The gene feature remains the same. The coding region is split into two pieces on either side of the gap and both new features are partial at the boundaries. A note was added to each of the split coding regions: coding region disrupted by sequencing gap. In addition, each protein now has a unique protein_id.

Here is an example of features which begin or end in a gap:

Tools Feature Gap Example

In this case, the features begin in a gap which is not allowed.

Here are the initial features in the flatfile:

     gene            complement(214..381)
                     /locus_tag="C3E89_00005"
     CDS             complement(214..381)
                     /locus_tag="C3E89_00005"
                     /inference="COORDINATES: protein motif:HMM:PF02591.13"
                     /note="coding region disrupted by sequencing gap"
                     /codon_start=1
                     /transl_table=11
                     /product="hypothetical protein"
                     /protein_id="test:C3E89_00005"
                     /translation="XXXXTKNRGIAVVEVIDNRCSGCNMVLPAIIIDKLKNNNSISYC
                     ENCDRILYLKK"
     assembly_gap    371..469
                     /estimated_length=unknown
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"

After trimming, both the coding region and gene features are adjusted to be 5’ partial at the gap boundary:

     gene            complement(214..>370)
                     /locus_tag="C3E89_00005"
     CDS             complement(214..>370)
                     /locus_tag="C3E89_00005"
                     /inference="COORDINATES: protein motif:HMM:PF02591.13"
                     /note="coding region disrupted by sequencing gap"
                     /codon_start=2
                     /transl_table=11
                     /product="hypothetical protein"
                     /protein_id="test:C3E89_00005"
                     /translation="TKNRGIAVVEVIDNRCSGCNMVLPAIIIDKLKNNNSISYCENCD
                     RILYLKK"
     assembly_gap    371..469
                     /estimated_length=unknown
                     /gap_type="within scaffold"
                     /linkage_evidence="paired-ends"

For more information please see the full documentation for NCBI Genome Workbench Editing Package.

Support Center

Last updated: 2019-07-03T16:38:04Z