PubMed Nucleotide Protein Genome Structure Taxonomy

Rattus norvegicus - Norway rat genome data and search tips Revised January 8, 2013

The Map Viewer help document describes how to use the Map Viewer software. This page describes the data available for Rattus norvegicus (rat) and the search tips specific to that organism. You can also return to the Rattus norvegicus genome view search page. The Map Viewer home page allows you to search the genome data of any organism represented in MapViewer.

  1. Scope of Data
  2. Available Maps
  3. Constructing queries
  4. Constructing URLs

Scope of Data back to top

The NCBI Map Viewer integrates rat sequence and map data from several sources. The types of maps include sequence, cytogenetic, genetic linkage, and radiation hybrid, described below. Maps are integrated with each other as noted in the Show Connections section of the general Map Viewer help document.

Rat Genomic Sequence Data back to top

The sequence maps show data for two genome assemblies available for rat:

  1. Reference assembly (Rnor_5.0) - This assembly, provided by the Baylor College of Medicine Human Genome Sequencing Center and the Rat Genome Sequencing Consortium (RGSC), is derived from the BN/SsNHsdMCW strain.
  2. Celera assembly - Celera Genomics' assembly of WGS sequence released in January 2003. It contains 29 million reads of Brown Norway rat sequence and nearly eight million reads of Sprague-Dawley sequence.

Rat BLAST Databases back to top

The complete set of rat sequence databases available for BLAST searching is shown in the pop-up menu on the rat BLAST page, which includes a link to the database descriptions.

Additional Rat Genome Resources back to top

In addition to the rat data available in the Map Viewer and through BLAST, links to NCBI resources and external sites are available from NCBI's Genome resource.

Available Maps back to top

The maps available for rat include:

Cytogenetic Maps back to top

Ideogram G banding patterns of rat chromosomes as described by Levan (1974).

Sequence Maps back to top

Assembly

Allows users to view all sequence data available for a given region of the genome and separates the data by assembly.

There are currently two assemblies available for rat:

  • Reference assembly - Rnor_5.0 derived from the BN/SsNHsdMCW strain
  • Celera Assembly derived from a mixture of Brown Norway and Sprague-Dawley strains

By default, all sequence maps available for an organism display the features that have been annotated on the reference assembly.

The assembly displayed (and any other map) is controlled using the Maps&Options dialog box. Instructions on how to do this are provided in the Select One or More Assemblies to Display section of the general Map Viewer help document.

Clone Shows alignment of BAC end sequences to the assembled genomic sequence. During the alignment process, at least 50% of the BAC end had to align to the genome with >96% identity. All hits with the best bit score were kept. For example, if a BAC end sequence hit two places on the genome with the same high bit score, both of those hits are shown. Various colors are used in the graphic display to show (1) the quality of alignment of a BAC end to the assembled genomic sequence (e.g., does the BAC end hit the genome uniquely, does it contain repetitive sequence?), and (2) the relationship between two BAC ends (e.g., are they at the expected orientation and distance from each other, are they on different chromosomes, or is a virtual relationship estimated between a BAC end that has been sequenced, and its unsequenced mate pair?). The examples below provide more detail. When the Clone map is displayed as the master map (described in the main Map Viewer help document), the accession numbers of the sequence records for the BAC ends, which were deposited into dbGSS, are displayed.














Component

Provides the tiling path of GenBank AABRxxxxxxxx and finished BAC accessions used to build the NW_xxxxxx contigs of the reference assembly, or the AAHXxxxxxxxx accessions used to build the NW_xxxxxx contigs of the Celera assembly. The Contig map shows the genomic contigs assembled from these components.

Contig Shows the chromosomal placement of NW_xxxxxx contigs on the assembled genome sequence. Individual GenBank records used to assemble the contigs are shown on the Component map, described above.

CpG Island Shows regions of high G + C content on the assembled genome sequence. Two sets of criteria were used for finding CpG islands: "strict" and "relaxed," described below. The algorithm (and cutoffs) were taken from Takai and Jones, 2002.
  • relaxed (shown with light blue shading on the map)
    • 200 bp min length
    • 50% or higher G + C content
    • 0.60 or higher observed CpG / expected CpG
    • post-processing:  merge islands that are <= 100 bp apart
  • strict (shown with dark blue shading on the map)
    • 500 bp min length
    • 50% or higher GC content
    • 0.60 or higher observed CpG / expected CpG
Ensembl Genes Map of annotated genes provided by Ensembl, based on the latest Ensembl release available at the time of the build.
Ensembl Transcripts Map of annotated transcripts provided by Ensembl, based on the latest Ensembl release available at the time of the build.
GenBank DNA

Shows the placement of rat genomic DNA sequences from GenBank that were not used in the assembly of contigs. Placement is based on the alignment of the sequences to the components of the contigs. It includes rat genomic sequences longer than 500 bp that have at least 97% identity to the components for at least 98 base pairs. If a sequence extends beyond a contig, that portion of sequence is not shown. The 'hits' link leads to a tabular display that shows the matching regions (base spans) of the assembly component and the GenBank genomic DNA record that has been aligned to it.

The length of a line represents the upper- and lower-most points on the genome assembly to which sequence fragments from a single GenBank record were aligned.

When the GenBank_DNA map is displayed as the master map, in the default verbose mode, the descriptive text includes several columns: Total Bases, which shows the total number of bases in the GenBank record; Aligned Bases, which shows the total number of bases from that record that were aligned to the genome; % identity for the alignment; % coverage, which shows how much of the Genbank record aligned to the genome as a percentage; Alignment-length ratio, which is the ratio of the alignment length in the genome to the alignment length of the Genbank record; and Strain from which the Genbank record was derived, when available.

Genes_Sequence

Shows genes that have been annotated on the genomic contigs. This includes known and putative genes placed as a result of alignments of mRNAs to the contigs.

If multiple models exist for a single gene, corresponding to splicing variants, the Genes map presents a flattened view of all the exons that can be spliced together in various ways. For example, if one splice variant uses exons 1, 3, 4, and another splice variant uses exons 2, 3, 4, the Genes map shows exons 1, 2, 3, 4. (In comparison, the GenBank RNA maps shows what combinations of exons are valid based on mRNA sequences from RefSeq and GenBank.)

Genes shown on the left of the grey line are transcribed in the - orientation (from bottom up), and those on the right in the + orientation (from top down).

When "Gene_Sequence" is selected as the Master map, the verbose display (detailed labeling, shown by default) includes arrows to the right of each gene name that indicate its direction of transcription as well as links to:

Gene models are shown in five colors, depending on the type of evidence that was used to construct the models. The one or two letter code shown in the evidence column (that is displayed when Gene_Sequence is the master map) also indicates the type of evidence.

 
Gene Color Evidence Code Type of evidence used to construct gene model
Blue C Confirmed gene model - model based on alignment of mRNA, or mRNAs plus ESTs, to the genomic sequence (see additional notes, below)
Light Green E EST only - model based on EST evidence only
Dark Brown PE Predicted+EST - model predicted by Gnomon and EST evidence (more about Gnomon)
Light Brown P Predicted only - model predicted by Gnomon (more about Gnomon)
Orange ? Conflict - there is some discrepancy between the mRNA sequence and the gene model (see additional notes, below)
  I Interim LocusID - model based alignment of mRNAs, or mRNAs plus ESTs, to the genome, in which the aligning transcripts could not be unambiguously assigned to a preexisting LocusID (see additional notes, below)

  Additional Notes:

In general, a gene model is shown in blue if there is a clean alignment between a RefSeq or GenBank mRNA sequence and the genomic sequence, and if there is an exact match between the protein product that was annotated in the mRNA sequence record and the conceptual translation of the genomic sequence gene model.

A gene model is shown in orange if there is some discrepancy between the mRNA sequence and the gene model, either in the alignment of the two and/or in their protein products. Examples of the former can include gaps, or the alignment of an mRNA to two or more genomic regions. Examples of the latter can include differences between the amino acid sequence given in an mRNA sequence record and the conceptual translation of the corresponding gene model, or premature termination of a coding region in the genomic sequence. Both of those can be caused by base pair mismatches between the mRNA and genomic sequence.

Models with Interim LocusIDs (LOC###### or LOC#########, evidence code I) may be paralogs, genes not yet curated, duplications because of assembly errors, or pseudogenes. The genome assembly and annotation pipeline assigns interim IDs when there is no unambiguous solution to what they should be. Interim LocusIDs are always associated with a RefSeq XM_* accessions (model mRNAs), although supporting alignments may (or may not) include RefSeq NM_* accessions (known mRNAs). More about RefSeq and RefSeq accessions can be found at the RefSeq homepage.

Model Transcripts Shows models generated by Gnomon. Gnomon uses protein alignments in addition to transcript alignments. To capture as much coding information in the genome as possible in this assembly, Gnomon models may represent partial as well as complete coding sequences. Models with a completely supported CDS are blue, models with a partially supported CDS are green, and the pure ab initio predictions are brown. Those ab initio predictions with e-values <0.0001 are indicated as dark brown on the map and all other ab initio predictions are shown in light brown. Pure ab initio status indicates that the model was built without the support of mRNA or protein alignments, either through failure to align the sequence to the genome or an alignment ignored by Gnomon due to a score falling below a pre-determined threshold.

Phenotype Map of Quantitative Trait Loci (QTL). QTL data are obtained from Rat Genome Database (RGD), and are represented by one peak marker and/or two flanking markers. The QTL is placed on the genome based on the position of these markers.

RefSeq Transcripts

Shows diagrams of the RNAs that are predicted on the genomic contigs. The RefSeq Transcript map and Gene_Sequence map are built in the same way, using the same types of evidence, described above. However, the Gene map shows a view of all the exons in a gene, while the Transcript map shows the combinations of exons (i.e., splice variants) that are valid, based on mRNA sequences.

Repeats Shows the position of repetitive elements. RepeatMasker version open-3.2.6 was executed using RepBase Update 20080801 and these flags:
  • -no_is
  • -cutoff 255
  • -frag 20000

GenBank RNA Alignment of RNAs from a given organism to the assembled rat genomic sequence. Only transcripts supplied with orientation are used, and each alignment is the single best placement for that sequence in the current annotation run of the rat genome. The corresponding alignment of mRNA and EST clusters is shown in the UniGene maps, described below.

The RNA maps include:
  • Hs RNA - individual human mRNAs aligned to the assembled rat genome
  • Mm RNA - individual mouse mRNAs and ESTs aligned to the assembled rat genome
  • Rn RNA - individual rat mRNAs and ESTs aligned to the assembled rat genome

STS Shows the placement of STSs from a variety of souces onto the assembled genomic sequence (the NW_xxxxxx contigs, described above) using Electronic-PCR (e-PCR).

Unigene Shows the alignment of rat EST clusters to the assembled rat genomic sequence. ESTs are clustered based on shared exon-intron boundaries and alignment to a common position on the genome. ESTs can come from one or more UniGene clusters whose IDs are noted by the EST cluster. (UniGene clusters are made with a different build procedure so there is not necessarily a one-to-one correspondence between EST clusters on the UniGene map and clusters in the UniGene resource.)

Genetic Linkage Maps back to top

FHH x ACI This rat linkage map, obtained from RGD, was described originally by Steen et al., (1999). The map was constructed using the F2 intercross of FHH x ACI, contains 2083 simple sequence length polymorphism (SSLP) markers, and totals 1527 cM in length.

SHRSP x BN This rat linkage map, obtained from RGD, was described originally by Steen et al., (1999). The map was constructed using the F2 intercross of SHRSP x BN, contains 3824 SSLP markers, and totals 1477 cM in length.

Radiation Hybrid Maps back to top

MCW RH This radiation hybrid map, obtained from RGD, was described originally by Kwitek et al., (2004). Using the T55 mapping panel, the map consists of 1265 framework SSLP markers common to the genetic linkage maps and an additional 23,172 EST, gene, and SSLP markers placed in relation to the framework markers. Map resolution is estimated at 9cR3000 per Mb.

Constructing queries back to top

Searchable Terms back to top

Map viewer supports searching on any term that describes an element on any map, including:

  • GenBank or RefSeq accession number
  • gene symbol
  • marker name and alias
  • text words
    e.g., a search for actin will retrieve map objects containing that word in their descriptions. If multiple terms are entered, they will automatically be combined with the 'AND' Boolean operator.

Map Positions back to top

As noted in the Search By Position section of the Entrez Map Viewer general help document, there are three main ways to search by map position from the Map View of a chromosome:
  1. enter a range of interest in the Region text boxes in the side bar
  2. click on the region of interest in the chromosome thumbnail graphic in the sidebar
  3. click on a region of interest in the enlarged Map View of the chromosome
The following types of map positions can be entered in option 1:
  • symbols - marker names or aliases can be entered to display a region of the chromosome between those mapped elements.
  • numerical positions - it is not necessary to specify units. The Map Viewer will interpret the range in the units of the master map (i.e. bases for sequence maps). Note that for a sequence map, base pair positions may be entered in any of the following formats: 1000000 or 1,000,000 or 1M or 1000K.

It is not necessary to enter a value in both Region text boxes. If you enter a value only in the upper box, the Map Viewer will display the region of the chromosome starting from that point and ending at the lower end of the chromosome. If you enter a value only in the lower box, the Map Viewer will display the region of the chromosome starting at the upper end of the chromosome and ending at the value entered.

General Tips back to top

As mentioned in the Searchable terms section, any term entered in the query box will be treated as an independent entity to be joined by the 'AND' Boolean operator. It is also possible to construct more complex queries by using explicit Boolean operators (AND, OR, NOT), field restriction, or limiting retrieval to records that have certain properties.

The Advanced Search page allows you to use a number of query options by simply checking boxes or radio buttons that represent various search fields, properties, and object types. It also allows you to limit your query to one or more chromosomes. The Advanced Search page is accessible from the header region of the genome view page.

Constructing URLs that link to Map Viewer back to top

If you would like to create WWW links to the Map Viewer, the instructions for constructing URLs are given in the general Map Viewer Help document. You can construct URLs that either perform a search or display a specific mapped object or chromosomal region. For example:

Questions or Comments?
Write to the NCBI Service Desk