Information Hubs
Course Home Modules Schedule Exercises Comments Credits

Find concise summary of sequence records for gene of interest

  Sample User Question Comments/Analysis Step By Step Guide Additional Tips  

Sample User Question back to
top

 
I'd like to retrieve a concise set of nucleotide sequences for human presenilin 1.
 

Comments / Analysis back to
top

Comprehensive, archival databases, such as those included in Entrez Nucleotides, often contain many sequence records for a particular gene. Therefore, search results can be somewhat redundant.

A simple database search can often retrieve hundreds or even thousands of records. For example, a basic search of Entrez CoreNucleotide for human presenilin 1 retrieves more than 100 records.

The records available for a single gene can: (1) be from different labs; (2) represent different molecule types (e.g., mRNA, genomic DNA); (3) contain sequence data of varying quality (e.g., expressed sequence tag, patent sequence, characterized gene, high throughput genomic sequence contig); and (4) have varying levels of biological annotation (e.g., some labs include more or less biological annotation on their sequence submissions).

Users interested in a concise list of the highest quality, most informative sequence records can use a variety of advanced Entrez search features to hone their results to a small, manageable set of records. This exercise shows how to retrieve a non-redundant set of curated sequence records for the human presenilin 1 gene.

Also, there are now gene-centered resources in which curators have collected representative sequence records from different molecule types for individual genes. They can serve as an alternative starting point to advanced searches of the sequence databases. An example is given under additional tips, below. Subsequent exercises in this module focus on gene-centered resources.

Step By Step Guide back to top

Open Entrez CoreNucleotide

  1. Search for presenilin 1 in the Title field of the nucleotide database


    • enter presenilin 1 in the search box
      Tip:
      • If you surround a phrase with quotes, that will force Entrez to search it as a phrase.
      • If you do not surround a phrase with quotes, Entrez will separate the terms with a default Boolean AND if the terms do not exist as a phrase in the index of the desired field. Press the Details button after a search is done to see how Entrez parsed the query.
      • In this case it doesn't matter if quotes are used because presenilin 1 does exist in the index as a phrase.
    • select the Limits option beneath the search box
    • on the Limits page, select Title from the pop-up menu of searchable fields
    • press Go

  2. Do a new search for human in the Organism field


    • press the Clear button on the nucleotide search page to begin a new search
    • uncheck the Limits checkbox that appears in the grey area under the search box to deactivate the limit chosen in the earlier search
    • enter human in the search box
    • on the Limits page, select Organism from the pop-up menu of searchable fields
    • press Go

  3. Combine the two searches above using History


    • select the History option beneath the search box
    • uncheck the Limits checkbox that appears in the grey area under the search box to clear that setting
    • in the text box, enter the numbers of the searches that you want to combine using the following syntax:  #1 AND #2  (your actual search numbers might be different, depending on how many other searches you might have recently done in this particular Entrez database)
    • press Go

  4. If desired, use the Limits page to restrict your search to a specific molecule type, such as mRNA


    • select the Limits option
    • select mRNA from the Molecule pop-up menu
    • press Go

  5. If desired, use the Limits page to further restrict your search to records from a specific source database, such as the curated RefSeq database


    • select the Limits option
    • select RefSeq from the Only From pop-up menu
    • press Go

Additional Tips back to
top

Complex Boolean query

The sample search above was broken up into separate steps for clarity and so you could see how each step further narrows your search results. Once you are familiar with the search techniques, many of the steps can be combined. For example, steps 4 and 5 can be done together.

Also, the search can be done in a single step by entering the search as a complex Boolean query. For example:

presenilin 1[titl] AND human[orgn] AND biomol_mrna[prop] AND srcdb_refseq[prop]

Limits Page vs. Properties Field

If you use the Limits page rather than a complex Boolean query, note that the options shown on the Limits page vary by database. For many Entrez databases, the Limits page shows the most commonly used search restrictions, not an exhaustive list. The Entrez Nucleotide Limits page, for example, shows only a subset of the molecule type and source database choices that are available. To see more choices, browse the index of the Properties field by selecting that field from the lower portion of the Preview/Index page and pressing the Index button. If you don't enter a search term before pressing the Index button, Entrez will take you to the top of the index. If you want to jump to a particular section of the index, you can enter a word stem such as biomol or srcdb before pressing the Index button.

Alternative Search in Gene-Centered Resource: Entrez Gene

Another way to access a concise set of sequence records for a given gene is to use a gene-centered resource, such as Entrez Gene, in which curators have collected a set of representative sequences for individual genes, as well as a wide range of other information for each gene. Retrieve the human presenilin 1 (PSEN1) record to see an example of the variety of information it provides for a single gene. Try the search in several different ways and compare results. For example, enter presenilin 1 as the search term without any limits. Now try a search for PSEN1, also without any limits. Then try a search for PSEN1 in the Gene Name field and human in the organism field. Follow the link for Gene ID 5663 and view the NCBI Reference Sequences (RefSeq) and Related Sequences sections of the record. Note that only a representative set of sequence records was chosen for inclusion in the record, and you can now access any one of them with a single click. If a user then wants to see a broader set of sequence records, they can use the "Related Sequences" option in the "Links" menu for the record(s) of interest.

Comprehensive Search: Use of synonyms in query

If a user wants a comprehensive set of nucleotide sequence records rather than a concise set, search all of the Entrez Nucleotide database instead of restricting retrieval to RefSeq. Also consider including synonyms, since most of the source databases in Entrez Nucleotide do not use a controlled vocabulary and the way in which a submitter describes his/her sequence can vary. For example, in addition to searching for "presenilin 1", include the official gene symbol and alternate gene symbols in the query. (Those can be obtained from the Entrez Gene record for human presenilin 1.) For example:

(presenilin 1[titl] OR PSEN1[gene] OR AD3[gene] OR FAD[gene] OR PS1[gene] OR S182[gene]) AND human[Organism] AND biomol_mrna[prop]


Information Hubs Return to Slides (*.html or *.mht format)
Return to Exercises List
Revised 08/03/2007