FAQs for RefSeqGene and LRG

  1. How do I request a new RefSeqGene?
  2. How does RefSeqGene number exons?
  3. How can I get a list of RefSeqGenes, LRG, and related transcripts?

How do I request a new RefSeqGene?

You may request a new RefSeqGene by sending your request to info@ncbi.nlm.nih.gov. At a minimum, we need  to know:

  • The symbol for the gene (or its HGNC id or GeneID)
  • The accession and version of each cDNA you would like to use as a reference standard.
  • Your contact information so we can contact you if we have any question.

Optionally, we would also like to know:

  • How much flanking sequence to use in generating the RefSeqGene.  Our default is to start 5 kb upstream and extend 2 kb downstream.
  • Whether you have identified any rare alleles in the current reference assembly and that should be altered in the RefSeqGene.
  • Any cDNAs that have commonly been used for the gene, and that should be aligned to the RefSeqGene.

How does RefSeqGene number exons?

The RefSeqGene/LRG collaboration has received multiple requests for a stable numbering system for exons represented on its sequences.  Until May, 2013, NCBI's RefSeq group numbered exons and provided those numbers on the exon features of cDNAs as well as on RefSeqGenes.  These values were based on all the exons known for a gene, from 5' to 3'.  These numbers were not stable, because as new exons were discovered for a gene, the numbers were re-calculated. This practice resulted in some confusion.

RefSeqGene/LRG decided to provide a more stable system, based not on all exons identified for a gene, but on a representive set  of exons based initially on the reference standard cDNAs for a RefSeqGene.  Because the reference standard cDNAs of RefSeqGenes do not change often, and because they do not change after an LRG is published for the same sequence, these exon numbers will be much more stable. They will still be assigned 5'->3', with overlapping exons from mulitple transcript variants differentiated by a letter suffix, e.g. 2a, 2b.  These labels will be reported on the RefSeqGene, e.g.
(http://www.ncbi.nlm.nih.gov/nuccore/257196128?report=graph) but not the associated transcripts.

When the LRG is created, and thus when the exon locations have been finalized, the exon numbers will not change.  If there are other commonly used numbering systems for exons, they will be added to the annotation as long as there is attribution (an organization or citation) for the alterative system. In other words, the RefSeqGene/LRG mechanism will provide stable exon numbers, defined on stable coordinates, for human genes. The goal is to develop the standard for referring to exons by number.

How can I get a list of RefSeqGenes, LRG, and related transcripts?

RefSeqGene maintains a report, available by FTP (ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/RefSeqGene/LRG_RefSeqGene), that enumerates the RefSeqGene accession.version, the corresponding LRG when available, and accessions of transcripts and proteins annotated on the RefSeqGene. Transcripts and proteins that are defined as reference standards, vs. those available by alignment only, are indicated in the last column. RefSeqGene provides alignments of transcripts previously used as reporting standards, even if not RefSeqs, to facilitate mapping from one coordinate system to another. Contact us to request additional alignments.

