|
|
|
|
|
Entrez GeneMailing ListsFeedbackRelated Sites
ResourcesEntrezNCBI |
This page summarizes news and announcements related to Entrez Gene. The history of what has been sent to subscribers to the Gene announcements mailing list is also available. Links Between RefSeqs and Ensembl. (December 3, 2009) Entrez Gene is now calculating matches between NCBI and Ensembl annotation based on comparison of rna and protein features. For organisms that are represented in the Consensus Coding Sequence (CCDS) project (i.e., human and mouse), the set of matches includes all protein sequences in CCDS and their corresponding mRNAs. For all other organisms, matches are collected as follows. For a protein to be identified as a match between RefSeq and Ensembl, there must be at least 80% overlap between the two. Furthermore, splice site matches must meet certain conditions: either 60% or more of the splice sites must match, or there may be at most one splice site mismatch. For rna features, the matching criteria are the same as for proteins above. This data can be accessed in a number of different ways. First, in the Full Report view in Entrez Gene, matching Ensembl transcripts and proteins are listed with the RefSeqs in the NCBI Reference Sequences section next to the label Related Ensembl, with links to the Ensembl web site for the associated transcripts and proteins. Links to Ensembl genes will continue to be reported in the Summary section at the top of the Full Report. Second, genes with matching Ensembl annotation can be found using a new property named "matches Ensembl". For example, to find all genes with Ensembl matches, use:
Third, Ensembl matches are provided on our FTP site in a new file named gene2ensembl.gz. This file is described in ftp://ftp.ncbi.nih.gov/gene/README. New Properties for rnatype. (December 3, 2009) For some time now, the type of gene has been indexed with a genetype property such as "genetype protein coding [properties]". Entrez Gene is now indexing rna types as well, so you can find genes by the types of RNAs that are represented on them, such as "rnatype mRNA [properties]". The current list of rnatype properties is:
Gene Groups (relationships) (April 2, 2009) Gene is now reporting different types of gene-to-gene relationships. The first type is a relationship between a pseudogene and its related functional gene. In the General gene information section of the Full Report in Entrez Gene, a Related functional gene heading will appear for relevant genes, with a link to the related functional gene. Because these relationships are bi-directional, the functional gene will have a link to its related pseudogenes in its General gene information section, with the heading Related pseudogene(s). The data are not complete, so please be aware that the lack of a report should not be interpreted to indicate that a gene does not have any pseudogenes. Additional types of relationships will be added in the future. Sort by Chromosome (December 17, 2008) An option to sort by chromosome has been added. Choosing this option causes the records to be sorted in this order:
For example, suppose that the search results include genes for Homo sapiens (human) and Mus musculus (mouse). The human genes will all appear before those for mouse. Within the set of human genes in the results, those that are placed on chromosome 1 will appear first, followed by those placed on chromosome 2, and so on. Finally, within a chromosome, genes will be sorted according to their start positions on the chromosome. Genes that are not placed on a chromosome will appear at the end of the results. Genes that are placed on multiple chromosomes will be sorted according to the first such chromosome. In conjunction with the new sort option, two new fields have been added to the DocSum (Document Summary) for Gene. The ChrSort field contains a sortable version of the first chromosome, if any, on which this gene is placed. The ChrStart field contains the start position for the first such chromosome. Search by Preferred Symbol (December 17, 2008) The new [Preferred symbol] index field contains the preferred symbol for the gene, as compared to the [sym] field which is indexed for preferred symbols, aliases and locus tags. There is often a lot of overlap among preferred symbols and aliases, and this new field allows you to restrict the search to those genes with the specified preferred symbol, while excluding those that would match only on an alias name or locus tag.For example, a query of set1[sym] would return 13 results, while a query of set1[preferred symbol] will return only those 11 results with set1 as the preferred symbol. The alias for this field is [PREF]. Property "Officially Named" (December 17, 2008) This new property is set for all genes with official nomenclature. Example of usage: officially named [prop] . Search by GeneID Range. (October 27, 2008) You can now search for a range of GeneID values. For example:
will find all GeneIDs between 1 and 1000. This may be helpful for users of E-utilities. New Search Options. (October 27, 2008) Two new fields have been added to facilitate searching:
Exon Count This field contains the number of distinct, non-overlapping RefSeq exons annotated for all RNA products of a gene interval, based on annotation in this priority: reference assembly first, alternate assembly second. This field can be queried by either a single integer value or a range. For example, to retrieve all human records with one exon, use:
To retrieve all records with a range of exons:
The aliases for this field are [XC] and [NUMEXONS]. Gene Length This field contains the gene length based on annotation in this priority: reference assembly first, alternate assembly second. If there are multiple placements, only on non-reference assemblies, then the longest value on non-reference assemblies is used. This field can be queried by either a single integer value or a range. For example, to retrieve all records with a gene span less than or equal to 5kb, use:
The aliases for this field are [GL] and [GENELEN]. New hiv1interactions Property. (October 27, 2008) This property is set for all genes with curated HIV-1:human protein interaction data. Example of usage:
See also http://www.ncbi.nlm.nih.gov/RefSeq/HIVInteractions/index.html New Sort Options. (March 17, 2008) An option has been added to the gray query bar to allow you to re-sort the results of your query. The options currently are:
These options are defined as follows. Sort by Relevance (the current default) Relevance is calculated from Gene's assessment of what fields are the most important by which to find search results. For example, Gene assigns more value to results if they match a term in the 'Gene Name' (symbol) field than to a match in free text such as the RefSeq or GeneRIF summary. Thus if your query is the single term 'cat', then records with symbols of 'cat' will be sorted ahead of records with the term 'cat' only elsewhere in the record. Sort by Gene Weight Gene Weight is calculated from multiple lines of evidence geared toward evaluating how well a gene has been characterized. These lines include:
Sort by Gene Symbol This sort orders records by the preferred symbol assigned to the gene. Limit by Chromosomal Region. (March 11, 2008) The Limits page, accessed by clicking on Limits in the Query bar, now has a function to facilitate retrieving Gene records by chromosomal location. The section supporting this function is titled Limit by Chromosomal Region. You must first select the organism against which you want to do a search. A scrollable menu is provided, but you can jump to the region of interest by beginning to type a trivial name or the binomial (e.g. human, Homo sapiens, rat, Rattus norvegicus). For example, if you want to find genes on mouse chromosome 5, you do not have to scroll all the way down to Mus musculus, but can type Mus and then scroll to find and select Mus musculus. When you have selected a species, a new menu is offered with the chromosomes appropriate to the species. For most genomes represented in Gene, the only choice will the mitochondrion or a plastid. For Drosophila melanogaster, the choices are the arms for chromosomes 2 and 3 rather than the complete chromosome. After selecting the chromosome (or arm) you can enter the integers representing the lower and upper boundaries between which you want to find genes. These values will be used as additional query elements. For example, if you wanted to identify all zinc finger genes on human chromosome 19 between 40,000,000 and 50,000,000 bp, you could follow these steps:
Your result should look like this and is equivalent to entering this in Gene's query box:
zinc finger[All Fields] AND (NC_000019[nucl_accn] AND 40000000[CHRPOS]:50000000[CHRPOS]) Please note "Limits: Homo sapiens Chr.19 From 40000000 to 50000000" is displayed in yellow on the results page, and will remain set until you remove the check in the Limits tab or return to the Limits page to refine your query. Please see the news item Chromosome base position available for query and in document summary for additional information. Reporting of annotation information. (September 20, 2007) For genomes that NCBI annotates, Entrez Gene will represent information about the annotation of each current GeneID. Text phrases will be attached to the gene data if the gene is not annotated well, or if annotation has changed in a complex way. Text phrases will also attached if there is no defining cDNA or genomic sequence for the gene, or if the GeneID was created after the most recent genome annotation. The goal is to facilitate retrieval of Gene records where the annotation on the RefSeq genomic records, if it exists, should be interpreted with caution. Thus, records that are not known to have annotation issues can be retrieved by appending this clause to your query:
Specific sub-categories of annotation information are:
Please note that the double quotes are included in the text phrases shown here because they are mandatory when performing a text phrase search. Chromosome base position available for query and in document summary. (May 2, 2007) The location of a gene's annotation on a reference chromosome is now reported in the document summary. Thus if a gene is annotated on an unplaced scaffold/contig, or on a genome without chromosomes, placements will not be reported. The report includes the accession and version of the RefSeq accession for the chromosome and the position of the gene. The report is provided only for genomes where chromosome coordinates are defined, and only for the reference assembly. You may query by a range of chromosome base positions, subject to the limitations indicated above. Your query should include the Chromosome and either the Organism or the Taxonomy ID, and in general, you should specify a range of at least 100 kb. The results of the query will include all genes that lie either partly or completely within the range specified. For example, the query:
will find genes C12orf33, PZP and A2M. Query by accession with version number. (March 12, 2007) A query for an accession string can now include the version number. For example:
An accession can be queried without the version number as well. Addition of "has ccds" property. (October 30, 2006) A "has ccds" property has been added to Entrez Gene. This property identifies genes that encode a protein sequence that is a member of a Consensus CDS (CCDS). For example, to find all human gene records that are in CCDS, use:
For information about CCDS, see http://www.ncbi.nlm.nih.gov/projects/CCDS/ . Implementation of augmented RefSeq section. (October 9, 2006) The Reference Sequences section of Entrez Gene's full report option now has additional subsections to support the display of the position of a gene on multiple assemblies, links to the genomic sequence within that range, and the accessions of the RNA and protein sequences specific to those assemblies. RefSeqs for which annotation or sequence may be updated without requiring a complete re-annotation of the genome are now labeled as such. The technical description of this change was announced here. Modification to Full Report display (September 26, 2006) Entrez Genes's display was restructured to facilitate browsing. Among the changes you may notice are the scolling windows for display of GeneRIFs, Interactions, and Markers. Automatic spelling suggestions for interactive queries (April 6, 2006) Gene was added to the set of databases using NCBI's spelling suggestion tool (also available via e-utilities).
Try these examples: Alternate words or spellings are suggested only if your original query term is at least five letters long. Revised December 9, 2009
|