Find concise summary of sequence records for gene of interest
|
| Sample User Question |
 |
| |
|
I'd like to retrieve a concise set of nucleotide sequences for human
presenilin 1.
|
|
|
| Comments / Analysis |
 |
Comprehensive, archival databases, such as those included in Entrez Nucleotides,
often contain many sequence records for a particular gene. Therefore,
search results can be somewhat redundant.
A simple database search can often retrieve hundreds or even thousands of records.
For example, a basic search of Entrez CoreNucleotide for human presenilin 1 retrieves more
than 100 records.
The records available for a single gene can: (1) be from different labs; (2)
represent different molecule types (e.g., mRNA, genomic DNA); (3) contain sequence
data of varying quality (e.g., expressed sequence tag, patent sequence,
characterized gene, high throughput genomic sequence contig); and (4) have varying
levels of biological annotation (e.g., some labs include more or less biological
annotation on their sequence submissions).
Users interested in a concise list of the highest quality, most informative
sequence records can use a variety of advanced Entrez search features to
hone their results to a small, manageable set of records. This exercise shows how
to retrieve a non-redundant set of curated sequence records for the human
presenilin 1 gene.
Also, there are now gene-centered resources in which curators have
collected representative sequence records from different molecule types for
individual genes. They can serve as an alternative starting point to advanced
searches of the sequence databases. An example is given under additional tips, below. Subsequent exercises in this module focus on
gene-centered resources.
|
| Step By Step Guide |
 |
|
Open Entrez CoreNucleotide
- Search for presenilin 1 in
the Title field of the nucleotide database
- enter presenilin 1 in the search box
Tip:
- If you surround a phrase with quotes, that will force Entrez to
search it as a phrase.
- If you do not surround a phrase with quotes, Entrez will
separate the terms with a default Boolean AND if the terms do not exist as a
phrase in the index of the desired field. Press the Details button after a
search is done to see how Entrez parsed the query.
- In this case it doesn't matter if quotes are used because presenilin 1
does exist in the index as a phrase.
- select the Limits option beneath the search box
- on the Limits page, select Title from the pop-up menu of searchable
fields
- press Go
- Do a new search for human in the Organism
field
- press the Clear button on the nucleotide search page to begin a new
search
- uncheck the Limits checkbox that appears in the grey area
under the search box to deactivate the limit chosen in the earlier search
- enter human in the search box
- on the Limits page, select Organism from the pop-up menu of
searchable fields
- press Go
- Combine the two searches above using History
- select the History option beneath the search box
- uncheck the Limits checkbox that appears in the grey area
under the search box to clear that setting
- in the text box, enter the numbers of the searches that you want to
combine using the following syntax: #1 AND #2 (your actual
search numbers might be different, depending on how many other searches you might
have recently done in this particular Entrez database)
- press Go
- If desired, use the Limits page to restrict your search to a
specific molecule type, such as mRNA
- select the Limits option
- select mRNA from the Molecule pop-up menu
- press Go
- If desired, use the Limits page to further restrict your search to
records from a specific source database, such as the curated RefSeq
database
- select the Limits option
- select RefSeq from the Only From pop-up menu
- press Go
|
| Additional Tips |
 |
Complex Boolean query
The sample search above was broken up into separate steps for clarity and so
you could see how each step further narrows your search results. Once you are
familiar with the search techniques, many of the steps can be combined. For
example, steps 4 and 5 can be done together.
Also, the search can be done in a single step by entering the search as a complex Boolean query. For example:
presenilin 1[titl] AND human[orgn] AND biomol_mrna[prop] AND
srcdb_refseq[prop]
|
Limits Page vs. Properties Field
If you use the Limits page rather than a complex Boolean query, note that the
options shown on the Limits page vary by database.
For many Entrez databases, the Limits page shows the most commonly used
search restrictions, not an exhaustive list. The Entrez Nucleotide Limits
page, for example, shows only a subset of the molecule type and source
database choices that are available. To see more choices, browse the index
of the Properties field by selecting that field from the lower portion of
the Preview/Index page and pressing the Index button. If you don't
enter a search term before pressing the Index button, Entrez will take you to the
top of the index. If you want to jump to a particular section of the index, you
can enter a word stem such as biomol or srcdb before pressing the
Index button.
|
Alternative Search in Gene-Centered Resource: Entrez Gene
Another way to access a concise set of sequence records for a given gene
is to use a gene-centered resource, such as Entrez Gene, in which curators have collected a set of
representative sequences for individual genes, as well as a wide range of other
information for each gene. Retrieve the human presenilin 1 (PSEN1) record to see
an example of the variety of information it provides for a single gene. Try the
search in several different ways and compare results. For example, enter
presenilin 1 as the search term without any limits. Now try a search for
PSEN1, also without any limits. Then try a search for PSEN1 in the
Gene Name field and human in the organism field. Follow the
link for Gene ID 5663 and view the NCBI Reference Sequences (RefSeq)
and Related Sequences sections of the record. Note that only a
representative set of sequence records was chosen for inclusion in the record, and
you can now access any one of them with a single click. If a user then wants to
see a broader set of sequence records, they can use the "Related Sequences" option
in the "Links" menu for the record(s) of interest.
|
Comprehensive Search: Use of synonyms in query
If a user wants a comprehensive set of nucleotide sequence records
rather than a concise set, search all of the Entrez Nucleotide database
instead of restricting retrieval to RefSeq. Also consider including
synonyms, since most of the source databases in Entrez Nucleotide do not
use a controlled vocabulary and the way in which a submitter describes his/her
sequence can vary. For example, in addition to searching for "presenilin 1",
include the official gene symbol and alternate gene symbols in the query. (Those
can be obtained from the Entrez Gene record for human presenilin 1.) For example:
(presenilin 1[titl] OR PSEN1[gene] OR AD3[gene] OR FAD[gene] OR PS1[gene] OR
S182[gene]) AND human[Organism] AND biomol_mrna[prop]
|
|