Searching in Genome Workbench
Introduction
The search functionality is available in the following views: Graphical Sequence View, Text View, Tree View, Generic Table View, Alignment Summary View, and VCF Table View. All these views have a search panel with a search box and a Find Mode dropdown list with a set of the search options available for view.
This manual addresses four types of searches:
Graphical Sequence View: exact match search by feature, SNP rsID, position or range
Open Genome Workbench and click File => Open. In the Open dialog select Data from GenBank, and type NC_00001.11 in the “Accession to load” box, click Next and Finish. A new project should be created and shown in the Project View. Right click on NC_00001.11 to open the context menu, select Open New View and open the molecule in the Graphical Sequence View (GSV). You should see an image similar to the one below. We configured the view using the Gear icon/Configure Tracks dialog to see only the Sequence track, Gene track(s), and Clinical SNP track for clear representation. You might have a different track configuration.
The upper panel of the GSV has search controls (search box, binocular icon to start search, and the Find mode dropdown list which allows searching for a query with match and not match case letters). A query should not have blanks.
The search option in GSV allows you to search corresponding tracks for:
- feature, for example by gene name (AMY1B, tRNA-Asn) or part of the name (AMY, t-RNA) or if there is no gene name, by gene LOC number (LOC100996442) Note: searches by protein or RNA accessions (like NM_, NP_, XP_, etc.) are not supported.
- SNP rsID (rs201288184)
- Range (10M-20M, 130909-150040) and position (123456)
Let’s type “AMY1” in the search box and hit the binocular icon. This search will return three genes – AMY1A, AMY1B and AMY1C. Click the binocular icon multiple times to cycle your view to each search result.
Note: The search will return the gene of interest for every Gene track that is configured in the view where the gene is present (even if the Gene track view is collapsed). When you reach the last matched gene, you will see the pop-up message:
To perform a search by not matching case letters (amy1, Amy1) you need to select “Do not match case” in the Find Mode dropdown list.
Now let’s try a search for SNP track by ID (rs201288184). You will see the view zoomed to the sequence level with this SNP selected.
You can try to search other feature tracks on your own. Search will work, for example, for Biological regions tracks if search is done by a feature name (or part of the name) such as nucleotide_motif, or enhancer, etc.
Exact match search in the Text View
The exact match search is available in the Text View as well. Let’s select region 103554675-103760572 on NC_000001.11 (this region includes AMY gene cluster). To do it, mouse over the upper ruler, push the left mouse button and drag right or left. Then open the context menu (right click), choose Open New View => Text View, choose a molecule with the range you just selected and click Finish:
The Text View will open the selected region. Search options are similar to the ones in the Graphical Sequence View with the extra possibility to search by sequence. Search queries allow separated words without quotes (for example, alpha-amylase 1A precursor).
Select Sequence from the Find mode dropdown list, perform a search by tttgttgaaaaatctg and observe the sequence found.
Advanced Search in Tree View and Table Views
The search functionality in the Tree View and in the tables views (Generic Table View, Alignment Summary View and VCF Table View) is more advanced.
We will perform advanced searches in Generic Table View and Tree View using the tree in the ASN format as an example. File is located at https://ftp.ncbi.nlm.nih.gov/toolbox/gbench/samples/tree_midpoint_root.asn.
For convenience please save the file to your local drive.
Open the sample file
To load the test file to the Genome Workbench as a new project, click File => Open, in the Open dialog select File Import, and select the path to the sample file, click Next and Finish. A new project should be created and seen in the Project View. Open this project in the Tree View and in the Generic Table View (for Tree View we use the Layout: Rectangle Cladogram and Zoom Behavior: Vertical).
Search controls and search properties
In both views you can see search controls in the upper bar. The controls include the Search text box and the search values history dropdown button, the String Matching options dropdown list, the Start search button, Search in progress indicator, Stop search button, Search filter button, All checkbox, and Previous/Next selection arrows.
String matching options include:
- Exact match
- Wildcards
- Regular expression
- Phonetic
There is also a Case sensitive checkbox.
Our sample tree has a lot of properties we can search the values of. The easiest way to see all properties and values that are available for searching is to look at the tree in the Generic table view. This table is sortable by columns. It is also possible to hide/unhide columns. Right clicking on the header of any column will open the list of all properties (column’s headers) that the tree has. They are all check marked by default. To hide a column, you need to remove the checkmark.
The properties and their values are also shown in the tooltips for every node of the tree. To open the tooltip you need to hover over the node of interest in the Tree View.
Exact match search
Exact match search is a string search that searches all properties fields for a particular string. The query can be plain text or text with spaces (blanks) which represent separated words.
For example, queries:
Typhimurium or “serovar Typhimurium” – both return the same result: 39 entities where the property field “scientific_name” has string with these worlds.
“Salmonella bongori”, “GCF_000430145.1”, “Gapless Chromosome” – return correspondingly 2, 1, and 43 entities.
The above queries work similar to how a search works in Google – it searches all the fields in all the records of the data source and returns any entries where the above character string(s) appear. Note that, unlike Google, only one-character string may be provided.
The following queries that have data belonging to the different fields are not valid for this type of search: “Salmonella scaffold”, “GCA_000430145.1 Typhimurium”.
Let us perform a simple exact match search for “serovar Typhimurium”. All rows with the query string become selected (gray background). Scroll down to see selected rows.
You can apply filtering to see only the rows with the feature of interest. Click on the Filter button (it becomes activated) and observe that now only 39 rows are shown.
While all 39 rows stay selected, let’s check broadcasting between Generic Table View and Tree View of the same project. Go to the Tree View you opened previously. Observe that one subbranch is selected. This selected subbranch include all Salmonella enterica/serovar Typhimurium that were found in the Generic Table View. Note: the same search can be performed in the Tree View window itself.
Now, adjust the Tree View window next to the Generic Table View window to see them both simultaneously. Remove the checkmark from the All checkbox in the Generic Table View and use Previous selection/Next selection buttons to jump between search result rows, selecting them one by one. Observe that every time, the corresponding subbranch is also selected in the Tree View (you might need to zoom in to see it clearly).
Repeat a search for “serovar Typhimurium” in the Tree View. Implement filtering with the All checkbox selected and see that now the tree looks grayish with the compact branch selected. This branch represents serovar Typhimurium.
Now remove the checkmark from All checkbox and use the Next/Previous arrow buttons to perform filtering one by one. You will see an image similar the one below. Open the tooltip for the selected terminal node and see that the node represents serovar Typhimurium.
Wildcard search
Wildcards are used in search queries to represent one or more other characters. This search is useful when searching for data based on a specified pattern match. The two most usable wildcards are an asterisk (*) and a question mark (?).
- (*) Matches zero or more non-space characters
- (?) Matches exactly one non-space character.
Let’s search for query GCF_*.2. This search should return all accessions (used as the labels in our example tree) that have version 2. In the two images below, you can see the filtered result. In the Table View the search returned 13 rows with all labels having version 2 for accessions (GCF_*.2). In the Tree View all selected branches are branches with the same (GCF_*.2) accessions.
Regular expression search
Search based on multiple criteria and SQL-like syntax is also supported. Queries are made up of one or more connected true/false (Boolean) expressions. These expressions are built up from comparison operators that compare fields in the data source to other fields or values provided in the query. The below comparison operators will work with either character, numeric or Boolean (true/false) values:
- = (equals)
- < (less than)
- > (greater than)
- <= (less than or equal to)
- >= (greater than or equal to)
- Like (equals where ‘?’ matches any single character and ‘*’ matches zero or more characters)
- Between (check if a value is between two other values)
- In (check if a value is equal to any in a list of values)
Any comparison in the query returns a Boolean value. A query can combine comparisons into more complex expressions using the following Boolean operators, which only work with Boolean values:
- AND (True only if both expressions are true)
- OR (True if either or both expressions are true)
- XOR (True if one expression is true and the other one is false)
- NOT (True if the associated expression is false)
Note that the keywords Like, Between, In, AND, XOR and NOT are not case sensitive – they can be lower case, upper case, or a mixture of both. Also, expressions can be grouped together using parentheses to create different logical expressions, e.g.:
A and (B or C) is True if A is true and B or C is true; (A and B) or C is True if both A and B are true or C is true
Let us try a regular expression search: first, for the String matching option select Regular Expressions, then paste this query dist >0.1 AND scientific_name LIKE ATCC in the search box and hit the search button. The search returns 20 entries with ATCC numbers in their names and with distances to their parents longer than 0.1.
Here are a few more queries to try:
asm_level_txt = "Gapless Chromosome" OR asm_level_txt = Scaffold - returns 56 entries with Gapless Chromosome and Scaffold levels of assemblies.
scientific_name LIKE *Dublin* AND (asm_level_txt LIKE *Chromosome OR asm_level_txt = Scaffold) - returns 4 entries with serovar Dublin and Chromosome, Gapless Chromosome, and Scaffold levels of assemblies.
scientific_name LIKE *Dublin* AND asm_level_txt LIKE Chromosome – returns 1 entry with serovar Dublin and Chromosome level of assembly.
NOT assembly_method LIKE Ray* AND scientific_name LIKE *Bareilly* - returns 5 entries with serovar Bareilly and assembly methods not Ray.
More examples can be found in the Query Synatx manual.
Phonetic search
Phonetic search uses "Metaphone" - a phonetic algorithm for indexing words by their pronunciation. It creates approximate phonetic representation and is used to match words and names which sound similar. The Metaphone algorithm is useful when the text being searched has misspelled words.
Take the scientific name “Salmonella enterica subsp. enterica serovar Tennessee str. TXSC_TXSC08-19” as an example, searching for "Tennessee".
Phonetic search will match the following misspellings:
- Tenese
- Tennese
- Tenesse
- Tenesee
All Phonetic searches with the above misspelling queries will return three entities:
Current Version is 3.8.2 (released December 12, 2022)
General
Help
Tutorials
- Basic Operation
- Using Active Objects Inspector
- Configure tracks and track display settings
- Working with Non-Public Data
- Viewing Multiple Alignments and Trees
- Broadcasting
- Genes and Variation
- Generating and Viewing Sequence Overlap Alignment
- Working with BAM Files
- Loading Tabular Data
- Working with VCF Files
- Sequence View Markers
- Opening Projects in Genome Workbench
- Publication quality graphics (PDF/SVG image export)
- Editing in Genome Workbench
- Create Protein Alignments using ProSplign
- GFF-CIGAR export for alignments
- Exporting Tree Nodes to CSV
- Generic Table View
- Running BLAST search against custom BLAST databases
- Using Phylogenetic Tree
- Coloring methods in Multiple Alignment View
- Displaying translation discrepancies
- Searching in Genome Workbench
- Graphical View Navigation and Manipulation
- Using the Text View to Review and Edit a Submission
- BAM haplotype filtering
- Displaying new non-NCBI molecules with annotations
- Creating phylogenetic tree from precalculated multiple alignment
- Creating phylogenetic tree starting from search
- Video Tutorials
General use Manuals
- Tree Viewer Formatting
- Tree Viewer Broadcasting
- Genome Workbench Macro
- Query Syntax in Genome Workbench and Tree Viewer
- Multiple Sequence Aligners
- Running Genome Workbench over X Window System
NCBI GenBank Submissions Manuals
- Table of Contents
- Introduction
- Genome Submission Wizard
- Save Submission File
- Reports
- Import
- Sequences
- Add Features
- Add Publication
- Comments
- Editing Tools