PubMed Nucleotide Protein Genome Structure Taxonomy

Arabidopsis thaliana genome data and search tips Revised July 11, 2007

The Map Viewer help document describes how to use the Map Viewer software. This page describes the data available for Arabidopsis thaliana, and the search tips specific to that organism. You can also return to the Arabidopsis thaliana genome overview page. There are additional on-line sources of information for Arabidopsis thaliana genomic bioinformatics. These are MIPS, TAIR, and Rhee, Plant Physiol. (2000) 124:1460-4. The Map Viewer home page allows you to search the genome data of any organism represented in MapViewer.

  1. Scope of Data
  2. Available Maps
  3. Constructing Queries
  4. Constructing URLs

Scope of Data back to top

Frequency of Updates to Map Viewer Data back to top

Currently, the Map Viewer data files are updated as the sequencing consortium provides the data. The data are currently available from the TAIR FTP site and the NCBI FTP site.

Available Maps back to top

Model transcrtipts back to top

Shows gene prediction models generated by Gnomon.

Gnomon uses protein alignments in addition to transcript alignments and, in order to capture as much coding information in the genome as possible in this assembly, Gnomon models may represent partial as well as complete coding sequences. Models with a completely supported CDS are blue, models with a partially supported CDS are green. Those models that have frameshifts and/or premature stops are indicated in dark brown while models based entirely on ab initio prediction are light brown. Pure ab initio status indicates that the model was built without the support of mRNA or protein alignments, either through failure to align the sequence to the genome or an alignment ignored by Gnomon due to a score falling below a pre-determined threshold.

Clone back to top

Alignment of BAC end sequences to the assembled genomic sequence. During the alignment process, BAC ends are aligned to the genome and the best placement is selected, with the requirement that at least 50% of the BAC end had to align to the genome with >90% identity. If a BAC end sequence has two or more best placements on the genome, then each location will be used for clone placement. Clones shown in blue have an unambiguous best placement, whereas clones shown in black have multiple possible placements. Clones shown in green have discordant end alignments, and clones shown in orange have multiple placements with discordant end alignments.

When the Clone map is displayed as the Master map, the verbose display provides the clone name (linked to the Clone Registry database) and the BAC end sequence accessions (linked to dbGSS).

Contig back to top

The contig map represents the assembled chromosome, data provided by TAIR.

CpG Map back to contents
This map shows regions of high G+C content in the pseudomolecule sequence. Two sets of criteria are used: "strict" and "relaxed". These are described:
  • relaxed (shown with light blue shading)
    • minimum length of 200 bp
    • G+C content at 50% or greater
    • observed CpG/expected CpG at 0.60 or greater
    • merge islands separated by less than 100 bp
  • strict (shown with dark blue shading)
    • minimum length of 600 bp
    • G+C content at 55% or greater
    • observed CpG/expected CpG at 0.60 or greater
The algorithm and cutoff values were developed from Takai and Jones, 2002. Map units are base pairs.

GenBank DNA Map back to contents
Shows the placement of genomic DNA sequences from GenBank that were not used as components in the assembly. Placement is based on the alignment of the sequences to the assembled genomic scaffolds or chromosomes. It includes Arabidopsis genomic sequences longer than 500 bp that have at least 97% identity to the components for at least 98 base pairs.

Marker back to top

The marker map shows the placement of genetic markers on the assembled Arabidopsis genome. Data provided by TAIR.

RNA Maps back to top

Each "RNA" map is produced by aligning individual mRNAs from a given plant species to the complete Arabidopsis thaliana chromosome sequence. Green lines indicate ESTs and blue lines indicate cDNAs. The corresponding alignment of EST clusters is shown in the UniGene (UniG) maps, described below.

The RNA maps include:

UniGene Maps back to top

The UniGene maps show mRNA and EST sequences from a given organism aligned to the assembled Arabidosis genomic sequence. Each UniGene cluster contains sequences that represent a unique gene.

The display of a UniGene map varies according to the span of sequence being displayed. For large spans of sequence (greater than 10 million bases), the Map Viewer displays histograms that show the density of ESTs and mRNAs aligned to a region, the UniGene clusters to which they belong, and the number of sequences from each UniGene cluster.

For smaller spans of sequence (i.e., higher resolutions, showing less than 10 million bases), the Map Viewer displays the above information plus blue lines that indicate exon/intron structure:

  • thick blue lines indicate aligned regions (putative exons)
  • thin blue lines indicate connections between aligned regions (putative introns). Regions are connected if they come from a single transcript, or from a set of 'chained' transcripts that share at least one common intron/exon splice junction. (For example, if transcript B shares one intron/exon splice junction with transcript A, and a different splice junction with transcript C, then A, B, and C will be chained together into one transcript.)
  • a light grey bar shades the region that encompasses all the alignments consistent with a given set of evidence (putative mRNA), and therefore indicates the span of a model
The UniGene maps include:
  • AT UniG - Arabidosis mRNAs and EST clusters aligned to the assembled Arabidopsis genome. It can be queried by Arabidopsis mRNA accessions and UniGene cluster names.


Gene back to top

Shows the genes that have been annotated on the genome assembly by the sequencing consortium and provided by TAIR.

Genes positioned to left of the chromosome are transcribed in the - orientation (from the bottom to the top). Those positioned to the right of the chromosome are transcribed in the + orientation (from the top to the bottom). This is also shown by the directional guide placed between the gene name and the links when the verbose display is active.

When this map is selected as the Master map, the verbose display includes arrows indicating direction of transcription as well as links to:

Refseq mRNA back to contents
The data for this map are the nucleic acid component of the RefSeq nuc_prot set. The available links are:
  • ug - UniGene
  • sv - sequence viewer
  • pr - the GenBank accession for the cognate protein of the RefSeq nuc_prot set
  • ev - evidence viewer
  • BLink - BLAST Link
Map units are base pairs.

Repeat Map back to contents
The position of repetitive elements in the pseudomolecule determined through the use of RepeatMasker with the following flags:
  • -w flag
  • -no_is
  • -cutoff 255
  • -frag 20000
Map units are base pairs.

STS Map back to contents

Placement of STSs from a variety of sources onto the assembled genomic sequence using Electronic-PCR (e-PCR).


Tiling Path Map back to contents

Tiling Path map displays the BAC clones used in the Arabidopsis genome assembly. Data provided to NCBI by TAIR.


Constructing queries back to top

Searchable Terms back to top

Text terms back to top

The current version implements searching of flat files. The viewer supports searching on any text term that may describe an element on any map. These include:

  • symbols
  • alternate symbols
    A search for either the BAC name F25I16.9 or the GenBank accession AC026238 will retrieve the locus named MYB51. All three terms refer to the same genetic entity. The terms are therefore considered synonyms and any term will retrieve the same information for viewing.
  • text words
    e.g., a search for actin will retrieve map objects containing that word in their descriptions.
    If multiple terms are entered, they will automatically be combined with a Boolean AND.  Also, adjacency searches are not supported at present. For example, a query entered as "cell adhesion" will be processed as cell AND adhesion and will retrieve records with descriptions that contain cell matrix adhesion as well as cell adhesion. The section on Boolean Operators provides information about additional options.
The specific terms available for each map are:
  • Clone Map
    • clone name, clone GenBank accession number
  • Marker Map
    • marker name, corresponding clone name
  • Gene Map
    • gene name, gene alias, gene product name, gene product description, gene GenBank accession number, GenBank accession for the corresponding clone

Truncation and Wildcards back to top

Search terms can also be truncated at the right end only, using an asterisk (*) as a wild card to represent zero to many characters. See the truncation section of the general Map Viewer Help document for more details.

Map Positions or Regions back to top

As noted in the Search By Position section of the Entrez Map Viewer general help document, there are three main ways to search by map position from the Map View of a chromosome:
  1. enter a range of interest in the Region text boxes in side bar
  2. click on the region of interest in the chromosome thumbnail graphic in the sidebar
  3. click on a region of interest in the enlarged Map View of the chromosome

For Arabidopsis thaliana, the following types of map positions can be entered in the Region text boxes noted in option 1:
  • symbols - you can enter gene symbols, marker names, or alternate symbols or marker names to display a region of the chromosome between those mapped elements. Note that both mapped elements must be present on the master map in order for the range search to work properly.

Query options back to top

Boolean Operators back to top

If multiple terms are entered, they will automatically be combined with a Boolean AND, as mentioned in the Text Terms section above. Adjacency searches are not supported at present. For example, a query entered as  cell adhesion  will be processed as  cell AND adhesion  and will retrieve records with descriptions that contain  cell matrix adhesion  as well as  cell adhesion.

You can choose to use any Boolean operators (AND, OR, NOT) in your query. Boolean operators must be written in upper case.

The general syntax for a Boolean Query is:
term[field] BOOLEAN term[field] BOOLEAN term[field]
The available search fields and their corresponding abbreviations (qualifiers) are listed below.

By default, Boolean operators are processed from left to right. The order in which Entrez processes a search statement can be changed by enclosing individual concepts in parentheses. The terms inside the parentheses are processed first as a unit and then incorporated into the overall strategy. Additional details about Boolean Operators are provided in the Entrez Help document.

Search fields back to top

If desired, you can restrict the search for a term to a particular field by placing the field qualifier in square brackets [] after the term. It is not necessary to include a space between the search term and the field specifier.

If no field qualifier is used, the system will search all fields.

Terms can be combined with Boolean operators, as described above.

Search fieldDescriptionQualifier
accessionthe nucleotide accession of a GenBank component or the nucletide or protein accessions for RefSeqs [accession], [acc], [accn]
chromosomethe chromosome number[chr]
idthe integer identifier for a particular type of object; useful in combination with type[id]
map namethe name of the map
(The general Map Viewer Help document provides a list of map names. Use the character string in the "URL value" column.)
[map_name],[map]
symbolthe gene symbol or other short name; includes clone names, marker names, and alternate symbols (also referred to as aliases or synonyms; see Text Terms section above for example)[sym]
titlegene name, symbol, or description[title], [ti], [titl]
typetype of mapped object; most useful in combination with id
Options are: clone, contig, gene, marker, At_est_cl, Hv_est_cl, Os_est_cl, Ta_est_cl, Zm_est_cl
[obj_type]


Constructing URLs that link to Map Viewer back to top

If you would like to create WWW links to the Map Viewer, the instructions for constructing URLs are given in the general Map Viewer Help document. You can construct URLs that either perform a search or display a specific mapped object or chromosomal region.

Questions or comments:
Write to NCBI Service Desk