PubMed Nucleotide Protein Genome Structure Taxonomy

Ornithorhynchus anatinus - duck-billed platypus genome data and search tips June 26, 2007

The Map Viewer help document describes how to use the Map Viewer software. This page describes the data available for Ornithorhynchus anatinus (duck-billed platypus), and the search tips specific to that organism. You can also return to the genome view search page for Ornithorhynchus anatinus. The Map Viewer home page allows you to search the genome data of any organism represented in MapViewer.

  1. Scope of Data
  2. Available Maps
  3. Constructing queries
  4. Constructing URLs

Scope of Data back to top

The Map Viewer provides a view of platypus data from sequence maps, described below. Separate documents provide an introduction to the information infrastructure developed at NCBI to integrate the various types of data generated by the current build, including statistics for the current build.

Platypus Genomic Sequence Data back to top

NCBI's current duck-billed platypus genome build (Oan1.1) is based on the Ornithorhynchus_anatinus-5.0.1 assembly provided by the Washington University Genome Sequencing Center in March, 2007. The genome was derived from a female platypus collected in New South Wales, Australia. The sequencing strategy produced a 6x whole-genome shotgun (WGS) assembly.

The mitochondrial sequence, NC_000891, is based on the work of Janke et al. (1996).

Platypus BLAST Databases back to top

The complete set of platypus sequence databases available for BLAST searching are shown on the Platypus BLAST page, which includes a link to the database descriptions.

Additional Platypus Genome Resources back to top

In addition to the Ornithorhynchus anatinus data available in the Map Viewer and through BLAST, information is available from the Platypus Genome Resources Guide, which includes links to NCBI resources and external resources pertaining to genomic sequence, maps and annotation. An explanation of the Gnomon gene prediction processing can be found here. In addition, the NCBI Handbook includes a series of exercises that demonstrate additional questions that can be answered with Map Viewer.

Available Maps back to top

Sequence-Based Maps back to top

Ab initio Shows gene prediction models generated by Gnomon. Gnomon uses protein alignments in addition to transcript alignments and, in order to capture as much coding information in the genome as possible in this assembly, Gnomon models may represent partial as well as complete coding sequences. Models with a completely supported CDS are blue, models with a partially supported CDS are green, and the pure ab initio predictions are brown. Those ab initio predictions with e values <0.0001 are indicated as dark brown on the map and all other ab initio predictions are shown in light brown. Pure ab initio status indicates that the model was built without the support of mRNA or protein alignments, either through failure to align the sequence to the genome or an alignment ignored by Gnomon due to a score falling below a pre-determined threshold.

Component The component map provides the tiling path of GenBank accessions used to build each "NT_xxxxxxxxx" contig, and the tiling path of GenBank "AAPN01xxxxxx" accessions from the Ornithorhynchus anatinus whole genome shotgun project (AAPN00000000.1) used to build the NW_xxxxxxxxx WGS contigs, which are described below.

Contig Shows the chromosomal placement of NW_xxxxxxxxx contigs on the assembled genomic sequence. The individual GenBank records used to assemble the contigs are shown on the Component map, described above.

CpG Island Shows regions of high G + C content on the assembled genome sequence. Two sets of criteria were used for finding CpG islands: "strict" and "relaxed," described below. The algorithm (and cutoffs) were taken from Takai and Jones, 2002.
  • relaxed (shown with light blue shading on the map)
    • 200 bp min length
    • 50% or higher G + C content
    • 0.60 or higher observed CpG / expected CpG
    • post-processing:  merge islands that are <= 100 bp apart
  • strict (shown with dark blue shading on the map)
    • 500 bp min length
    • 50% or higher GC content
    • 0.60 or higher observed CpG / expected CpG
GenBank_DNA

Shows the placement of platypus genomic DNA sequences from GenBank that were not used as components in the assembly. The placement is based on the alignment of the sequences to the assembled genomic scaffolds or chromosomes. It includes genomic sequences longer than 500 bp that have at least 97% identity to the components for at least 98 base pairs. If a sequence extends beyond a contig, that portion of sequence is not shown. The 'hits' link leads to a tabular display that shows the matching regions (base spans) of the assembly component and the GenBank genomic DNA record that has been aligned to it.

Orange lines represent unfinished (phase 1 and 2) HTGs sequences that have been aligned to the assembled genome. Blue lines represent other genomic DNA records that have been aligned to the assembled genome.

The length of a line represents the upper and lower-most points on the genome assembly to which sequence fragments from a single GenBank record were aligned.

Thick parts of a line represent fragments of sequence from a GenBank record that have been aligned to the assembled genomic sequence, and the thin parts of a line connect the fragment that come from a single GenBank record.

When the GenBank DNA map is displayed as the master map, in the default verbose mode, the descriptive text includes several columns: Total Bases, which shows the total number of bases in the GenBank record; Aligned Bases, which shows the total number of bases from that record that were aligned to the genome; % identity for the alignment; % coverage, which shows how much of the Genbank record aligned to the genome as a percentage; Alignment-length ratio, which is the ratio of the alignment length in the genome to the alignment length of the Genbank record; and Strain from which the Genbank record was derived, when available.

Genes_Sequence Genes that have been annotated on the genomic contigs. This includes known and putative genes placed as a result of alignments of mRNAs to the contigs. If multiple models exist for a single gene, corresponding to splicing variants, the Gene map presents a flattened view of all the exons that can be spliced together in various ways. For example, if one splice variant uses exons 1, 3, 4, and another splice variant uses exons 2, 3, 4, the Gene_Sequence map shows exons 1, 2, 3, 4. Genes shown on the left of the grey line are transcribed in the - orientation (from bottom up), and those on the right in the + orientation (from top down).

When Genes Sequence is selected as the Master map, the verbose display (detailed labeling, shown by default) includes arrows to the right of each gene name indicate its direction of transcription as well as links to:

  • sv - sequence viewer
  • pr - protein
  • dl - view/download sequence data
  • ev - evidence viewer
  • mm - Model Maker
  • hm - HomoloGene

Gene models are shown in five colors, depending on the type of evidence that was used to construct the models.

 
Color Description Sequence identity and CDS effect
Blue Gene model CDS has very good agreement with the genome Identical or few mismatches
Orange Gene model aligns to genome, but the CDS predicted where aligned has poor identity with the CDS annotated on the defining mRNA similar
Brown The CDS of the model has poor or no alignment to a CDS on the genome. poor or none
Grey Gene model has no CDS na
Black Gene model was provided by an outside source External

  Additional Notes:

In general, a gene model is shown in blue if there is a clean alignment between a RefSeq or GenBank mRNA sequence and the genomic sequence, and if there is an exact match between the protein product that was annotated in the mRNA sequence record and the conceptual translation of the genomic sequence gene model.

A gene model is shown in orange if there is some discrepancy between the mRNA sequence and the gene model, either in the alignment of the two and/or in their protein products. Examples of the former can include gaps, or the alignment of an mRNA to two or more genomic regions. Examples of the latter can include differences between the amino acid sequence given in an mRNA sequence record and the conceptual translation of the corresponding gene model, or premature termination of a coding region in the genomic sequence. Both of those can be caused by base pair mismatches between the mRNA and genomic sequence.

Models with Interim LocusIDs (evidence code I) may be paralogs, genes not yet curated, duplications because of assembly errors, or pseudogenes. The genome assembly and annotation pipeline assigns interim IDs when there is no unambiguous solution to what they should be. Interim LocusIDs are always associated with a RefSeq XM_* accessions (model mRNAs), although supporting alignments may (or may not) include RefSeq NM_* accessions (known mRNAs). More about RefSeq and RefSeq accessions...

RNA Maps

The RNA maps show mRNA and EST sequences from a given organism aligned to the assembled platypus genomic sequence that has been repeat-masked and dusted. Only ESTs supplied with orientation are used. Each alignment is the single best placement for that sequence in the current build of the platypus genome. It can be queried by sequence accession.

The RNA maps include:
  • Oan_RNA - alignment of individual Ornithorhynchus anatinus (platypus) mRNAs and ESTs to the assembled genomic sequence.

The display for RNA maps differs from those labeled as Oan_UniGene in that what are displayed here are the alignments [thicker lines] and putative introns [thinner lines] of ESTs and longer mRNAs best placed at that position. Green lines indicate ESTs; blue indicates cDNAs. In contrast, the "UniGene" map is a summary of probable splicing events, with connections to UniGene for the clusters that contain those sequences.

RefSeq RNA Diagrams of the RNAs that are predicted on the genomic contigs. The RefSeq RNA map and Genes Sequence map are built in the same way; however, the Genes Sequence map shows a view of all the exons in a gene, while the RefSeq RNA map could potentially show the combinations of exons (i.e., splice variants) that are valid, if mRNA sequences indicate alternative splice variants.

Repeats Position of repetitive elements.

The RepeatMasker was used to illustrate areas within the genome that contain interspersed repeats and low complexity DNA sequences.


STS Placement of STSs from a variety of sources onto the assembled genomic sequence (the NW_xxxxxx contigs, described above) using Electronic-PCR (e-PCR).

Unigene Maps

The UniGene maps show mRNA and EST sequences from a given organism aligned to the assembled genomic sequence that has been repeat-masked and dusted. Only ESTs supplied with orientation are used. Each alignment is the single best placement for that sequence in the current build of the genome. ESTs are clustered based on shared introns and alignment to a common position on the genome. Those ESTs can come from one or more UniGene clusters, whose IDs are noted by the EST cluster. (UniGene clusters are made with a different build procedure, so there is not necessarily a one-to-one correspondence between EST clusters on the UniGene map and clusters in the UniGene resource.)

The display of the UniGene map varies according to the span of sequence being displayed.

For large spans of sequence (greater than 10 million bases), the Map Viewer displays histograms that show the density of ESTs and mRNAs aligned to a region, the UniGene clusters to which they belong, and the number of sequences from each UniGene cluster.

For smaller spans of sequence (i.e., higher resolutions, showing less than 10 million bases), the Map Viewer displays the above information plus blue lines that indicate exon/intron structure:

  • thick blue lines indicate aligned regions (putative exons)
  • thin blue lines indicate connections between aligned regions (putative introns). Regions are connected if they come from a single transcript, or from a set of 'chained' transcripts that share at least one common intron/exon splice junction. (For example, if transcript B shares one intron/exon splice junction with transcript A, and a different splice junction with transcript C, then A, B, and C will be chained together into one transcript.)
  • a light grey bar shades the region that encompasses all the alignments consistent with a given set of evidence (putative mRNA), and therefore indicates the span of a model

Alignments are grouped by common structure. If two or more transcripts share at least one intron/exon splice junction, the alignments of those transcripts are merged into a single model. If two or more transcripts do not share any intron/exon splice junction, they are shown as separate models.

The UniGene maps include:
  • Oan_UniGene - Alignment of platypus EST clusters to the assembled platypus genomic sequence.


Constructing queries back to top

Searchable Terms back to top

The Map Viewer supports searching on any term that describes an element on any map, including:
  • GenBank accession number
  • marker name
  • marker alias
    Sometimes two or more marker names refer to the same primer pair, and are therefore considered synonyms or "aliases." In such cases, any one of the terms will retrieve the marker.
  • text words
    For example, a search for actin will retrieve all map objects containing that word in their description. If multiple terms are entered, they will automatically be combined with the 'AND' Boolean operator. Adjacency searches are not supported at present. For example, a query entered as "cell adhesion" will be processed as cell AND adhesion and will retrieve records with descriptions that contain cell matrix adhesion as well as cell adhesion.
Search terms can also be truncated at the right end only, using an asterisk (*) as a wild card to represent zero to many characters. See the truncation section of the general Map Viewer Help document for more details.

Map Positions back to top

As noted in the Search By Position section of the Entrez Map Viewer general help document, there are three main ways to search by map position from the Map View of a chromosome:
  1. enter a range of interest in the Region text boxes in side bar
  2. click on the region of interest in the chromosome thumbnail graphic in the sidebar
  3. click on a region of interest in the enlarged Map View of the chromosome
The following types of map positions can be entered in option 1:
  • symbols - you can enter marker names or alternate marker names (aliases) to display a region of the chromosome between those mapped elements. Note that both mapped elements must be present on the maps that share the same coordinate system in order for the range search to work properly.

  • numerical positions - can be used if the master map is a genetic map, radiation hybrid map, YAC map, or sequence map. It is not necessary to specify units. The Map Viewer will interpret the range in the units of the master map (centiMorgans, centiRays, ordinal units, or bases, respectively). Note that for a sequence map, base pair positions may be entered in any of the following formats: 1000000 or 1,000,000 or 1M or 100K.

It is not necessary to enter a value in both Region text boxes. If you enter a value only in the upper box, the Map Viewer will display the region of the chromosome starting from that point and ending at the lower end of the chromosome. If you enter a value only in the lower box, the Map Viewer will display the region of the chromosome starting at the upper end of the chromosome and ending at the value entered.

General Tips back to top

As mentioned in the Searchable Terms section of the Map Viewer Help Document, any term entered in the query box will be treated as an independent entity to be joined by the 'AND' Boolean operator. It is also possible to construct more complex queries by using explicit Boolean operators (AND, OR, NOT), field restriction, or limiting retrieval to records that have certain properties.

The Advanced Search page allows you to use a number of query options by simply checking boxes or radio buttons that represent various search fields, properties, and object types. It also allows you to limit your query to one or more chromosomes. The Advanced Search page is accessible from the header region of the genome view page.

Constructing URLs that link to Map Viewer back to top

If you would like to create WWW links to the Map Viewer, the instructions for constructing URLs are given in the general Map Viewer Help document. You can construct URLs that either perform a search or display a specific mapped object or chromosomal region.