RefSeqGene Guide

A RefSeqGene sequence includes representation of a subset of mRNAs and coding regions that have been selected to serve as reference standards. The RefSeqGene sequence is also annotated with variation reported to dbSNP and dbVar and can be analyzed by a variety of tools at NCBI.  This guide is provided to help direct attention to major methods that can be used to:

  1. Determine how a set of sequences you have generated align to a RefSeqGene and compare to variation annotated on that sequence.
  2. Interconvert the location of sequence variation in genomic assembly coordinates and RefSeqGene coordinates.
  3. Calculate the HGVS expressions for large numbers of variation calls, including those based on the RefSeqGene.
  4. Evaluate the alignment of RefSeqGene sequences to a version of the reference assembly released by the Genome Reference Consortium (GRC)

There are also presentations and handouts that relate to RefSeqGene.

  1. Our education group posts many fact sheets about NCBI resources.  RefSeqGene is included in the set available from our Education pages.
  2. Presentations:
    1. Introducing RefSeqGene, November, 2011

Align sequences to a RefSeqGene, and compare to annotated variation

There is now an interface at NCBI to expedite comparing your nucleotide sequence to that of a RefSeqGene.

The interface is based on the standard BLAST submission page, and can be accessed from BLAST's home page in the Specialized BLAST section, and RefSeqGene'shome page. This interface provides the following functionality for users of RefSeqGene/LRG.

Step 1: What sequence(s) do you want to align?

You can submit one or more sequences to be aligned to the RefSeqGene/LRG. Assuming that you want to align your own sequences, you might want to try something like this, where you have multiple sequences, separated by a description starting with a ">"

generate a file with all the sequences you want to align




Step 2. upload the sequence to the RefSeqGene BLAST page

You can copy/paste your sequences into the query box or use the upload function.

You may upload multiple sequences at the same time; you don't have to process one by one. If you enter multiple, the BLAST, the BLAST result page will show the details of one at a time (select which to view based on the menu labeled A in Figure 1), but if you click on Graphics link in the overview section (labeled B in Figure 1), you will see all your query sequences aligned.

Step 3. Review the results

The alignments can be displayed using standard functions of NCBI's graphic sequence display. If you are not familiar with this tool, please take a look at the video tutorials we provide at YouTube.

Step 4. Compare any differences to known variation

If there is a mismatch, insertion, or deletion when aligning any of your test sequences to a RefSeqGene, you can compare the location of that variation to any annotation on the RefSeqGene using the following steps:

  1. Open the viewer controls labelled Configure.
  2. Open the Variation section options by clicking on Variation in the column at the left.
  3. Click on all selections, and press Configure
  4. Mouse over any annotation to learn more about what has been submitted to NCBI's databases about variation at that location.

Interconvert the location of sequence variation in genomic assembly coordinates and  RefSeqGene coordinates

NCBI provides a Genome Remapping Service, with a special section dedicated to processing RefSeqGene sequences. There is extensivie help documentation associate with the site, which will not be repeated here. Suffice it to say the process is as simple as:

  1. Define the coordinate system with which you are beginning, e.g. GRCh37 (hg19).
  2. Define the sequence set to which you want the coordinates mapped, e.g. RefSeqGene.
  3. Define what you want included in the report.
  4. Upload or paste in the your data.
  5. Download a report.

Calculate HGVS expressions and get a report of functional consequence for large numbers of variation calls, including those based on the RefSeqGene

NCBI provides Variation Reporter, which processes reports of locations of variation, and returns information about what is known about variation at those locations according the NCBI's latest annotation. The full report (available by download) reports the location of the variation in multiple coordinate systems, including RefSeqGene. It also accepts input by location on a RefSeqGene.

Evaluate the alignment of RefSeqGene sequences to a version of the reference assembly released by the Genome Reference Consortium (GRC)

RefSeqGenes are aligned weekly to the current patch release of the human assembly provided by the GRC.   These alignments are provided to each top-level sequence (not a component of another sequence) , and reported according to the GFF3 standard.  The file is written to RefSeqGene's ftp site (  Details about the file are provided in the README_SUBMIT.txt file in that directory.

Support Center

Last updated: 2013-10-01T08:28:37-04:00