PubMed Nucleotide Protein Genome Structure Taxonomy

Gallus gallus - chicken data and search tips Revised June 15, 2007

The Map Viewer help document describes how to use the Map Viewer software. This page describes the data available for Gallus gallus (chicken), and the search tips specific to that organism. You can also return to the Gallus gallus genome view search page. The Map Viewer home page allows you to search the genome data of any organism represented in MapViewer.

  1. Scope of Data
  2. Available Maps
  3. Types of mapped objects and maps on which they can be found
  4. Legend
  5. Constructing queries
  6. Constructing URLs


Scope of Data back to top

The Map Viewer provides a view of chicken data from a variety of sources described below.

Chicken Genomic Sequence Data:
whole genome shotgun (WGS) data
back to top

The current chicken genome build is an improved assembly produced by the Genome Sequencing Center at the Washington University School of Medicine in St. Louis . The genome was derived from a single chicken, a Red Jungle Fowl (RJF) hen from the inbred line UCD001. The sequencing strategy for the previous assembly produced a ~6.6-fold whole-genome shotgun (WGS) coverage using plasmid and fosmid subclones. The improved assembly includes an additional 198,000 reads and marker data, and results from a new PCAP assembly. Finally, some of the underlying WGSs contigs have been replaced using 100 RJF clones. The sequence scaffolds were aligned to BAC end sequences using a comprehensive fingerprint-based map developed by Ren et al. (2003) at Texas A&M University and Michigan State University.

The mitochondrial genome presented in this build is not derived from the Red Jungle Fowl hen used for the WGS data but was obtained from an adult White Leghorn.

Chicken BLAST Databases back to top

The complete set of chicken sequence databases available for BLAST searching are shown on the chicken BLAST page, which includes a link to the database descriptions.

Additional Chicken Genome Resources back to top

The Chicken Genome Resources page includes a link to Map Viewer. It also brings together information on a wide variety of chicken-related resources, which highlights the role of the chicken as a model organism and as an agriculturally important animal.

Available Maps back to top

The available maps for Gallus gallus include:

Sequence Maps back to top

Ab initio

Models generated by Gnomon .
mRNA alignments were used to segment the genomic sequence by putative gene boundaries, and Gnomon was executed on these segments to predict genes. Gnomon uses protein alignments in addition to transcript alignments and, in order to capture as much coding information in the genome as possible in this assembly, Gnomon models may represent partial as well as complete CDS.

Those models with e values <0.0001 are indicated as dark brown on the map. Other models are shown in light brown. Please note that this process predicts exons and not all possible mRNAs, so there is only one model per putative gene. The labels on the map are linked to the protein record of the highest scoring match to the model's predicted protein. Note that Gnomon models are also included in the Gene_Sequence map, in regions where confirmed models have not yet been identified.

Assembly

The Assembly map allows users to visualize all of the sequence data available for a given region of the genome.

The assembly map also acts as a filter through which all of the other sequence maps are viewed, allowing you to see the annotations that have been placed on the sequence data.

When viewing the Assembly map, a blue vertical line indicates the assembly that is being viewed. Currently, the reference assembly is shown as the blue line by default.

Any other sequence maps in the window will show annotations that have been placed on sequence data from the assembly shown in blue.

Component

The component map provides the tiling path of GenBank "AADN02xxxxxx" accessions used to build the "NW_xxxxxx" WGS contigs.

Contig

Shows the chromosomal placement of NW_xxxxxx contigs on the assembly of whole genome shotgun (WGS) data.

The individual GenBank records used to assemble the contigs are shown on the Component map, described above.

CpG Island Shows regions of high G + C content on the assembled genome sequence. Two sets of criteria were used for finding CpG islands: "strict" and "relaxed," described below. The algorithm (and cutoffs) were taken from Takai and Jones, 2002.
  • relaxed (shown with light blue shading on the map)
    • 200 bp min length
    • 50% or higher G + C content
    • 0.60 or higher observed CpG / expected CpG
    • post-processing:  merge islands that are <= 100 bp apart
  • strict (shown with dark blue shading on the map)
    • 500 bp min length
    • 50% or higher GC content
    • 0.60 or higher observed CpG / expected CpG
Ensembl Genes Alignment of genes annotated on the genomic contigs by Ensembl.
Ensembl Transcripts Alignment of individual transcripts to the assembled genomic sequence by Ensembl.
GenBank_DNA

Shows the placement of chicken genomic DNA sequences from GenBank on the assembly of whole genome shtogun (WGS) data. Placement is based on the alignment of the sequences to the components (AADN02xxxxxx) of the contigs. It includes chicken genomic sequences longer than 500 bp that have at least 97% identity to the components for at least 98 base pairs. If a sequence extends beyond a contig, that portion of sequence is not shown. The 'hits' link leads to a tabular display that shows the matching regions (base spans) of the assembly component and the GenBank genomic DNA record that has been aligned to it.

The length of a line represents the upper and lower-most points on the genome assembly to which sequence fragments from a single GenBank record were aligned.

When the GenBank_DNA map is displayed as the master map, in the default verbose mode, the descriptive text includes a bases column, which shows the total number of bases in the GenBank record that was aligned to the genome, and a status column, which shows the total number of bases from that record that were aligned to the genome, how many separate pieces of sequence from that record were aligned, and whether those pieces were shuffled to make the alignment.

Genes_Sequence

Genes that have been annotated on the genomic contigs. This includes known and putative genes placed as a result of alignments of mRNAs to the contigs.

If multiple models exist for a single gene, corresponding to splicing variants, the Gene_Sequence map presents a flattened view of all the exons that can be spliced together in various ways. For example, if one splice variant uses exons 1, 3, 4, and another splice variant uses exons 2, 3, 4, the Gene_Sequence map shows exons 1, 2, 3, 4. (In comparison, the RefSeq Transcript map shows what combinations of exons are valid based on mRNA sequences from RefSeq and GenBank.)

Genes shown on the left of the grey line are transcribed in the - orientation (from bottom up), and those on the right in the + orientation (from top down).

When Gene_Sequence is selected as the Master map, the verbose display (detailed labeling, shown by default) includes arrows to the right of each gene name indicate its direction of transcription as well as links to:

Additional information about these links is also provided below, under Links to Related Resources.

Gene models are shown in five colors, depending on the type of evidence that was used to construct the models. The one or two letter code shown in the evidence column (that is displayed when Gene_Sequence is the master map) also indicates the type of evidence.

 
Gene Color Evidence Code Type of evidence used to construct gene model
Blue C Confirmed gene model - model based on alignment of mRNA, or mRNAs plus ESTs, to the genomic sequence (see additional notes, below)
Light Green E EST only - model based on EST evidence only
Dark Brown PE Predicted+EST - model predicted by GenomeScan and EST evidence (more about GenomeScan)
Light Brown P Predicted only - model predicted by GenomeScan (more about GenomeScan)
Orange ? Conflict - there is some discrepancy between the mRNA sequence and the gene model (see additional notes, below)
  I Interim LocusID - model based alignment of mRNAs, or mRNAs plus ESTs, to the genome, in which the aligning transcripts could not be unambiguously assigned to a preexisting LocusID (see additional notes, below)

  Additional Notes:

In general, a gene model is shown in blue if there is a clean alignment between a RefSeq or GenBank mRNA sequence and the genomic sequence, and if there is an exact match between the protein product that was annotated in the mRNA sequence record and the conceptual translation of the genomic sequence gene model.

A gene model is shown in orange if there is some discrepancy between the mRNA sequence and the gene model, either in the alignment of the two and/or in their protein products. Examples of the former can include gaps, or the alignment of an mRNA to two or more genomic regions. Examples of the latter can include differences between the amino acid sequence given in an mRNA sequence record and the conceptual translation of the corresponding gene model, or premature termination of a coding region in the genomic sequence. Both of those can be caused by base pair mismatches between the mRNA and genomic sequence.

Models with Interim LocusIDs (evidence code I) may be paralogs, genes not yet curated, duplications because of assembly errors, or pseudogenes. The genome assembly and annotation pipeline assigns interim IDs when there is no unambiguous solution to what they should be. Interim LocusIDs are always associated with a RefSeq XM_* accessions (model mRNAs), although supporting alignments may (or may not) include RefSeq NM_* accessions (known mRNAs). More about RefSeq and RefSeq accessions can be found at the RefSeq homepage.

Hs_RNA Alignment of individual human RNAs to the assembled genomic sequence. The corresponding alignment of EST clusters is shown in the Hs_UniGene map, described below.

Gga_RNA Alignment of individual chicken RNAs to the assembled genomic sequence. The corresponding alignment of EST clusters is shown in the Gga_UniGene map, described below.

Tgu_RNA Alignment of individual zebra finch RefSeqs and mRNAs to the assembled genomic sequence. The corresponding alignment of EST clusters is shown in the Tgu_UniGene map, described below.

Ave_RNA Alignment of individual bird mRNAs, excluding chicken and zebra finch mRNAs, to the assembled genomic sequence.

Gga_UniGene Alignment of chicken EST clusters to the assembled genomic sequence. ESTs are clustered based on shared introns and alignment to a common position on the genome. Those ESTs can come from one or more UniGene clusters, whose IDs are noted by the EST cluster. (UniGene clusters are made with a different build procedure, so there is not necessarily a one-to-one correspondence between EST clusters on the Gga_UniGene map and clusters in the UniGene resource.)

Hs_UniGene Alignment of human EST clusters to the assembled genomic sequence. ESTs are clustered based on shared introns and alignment to a common position on the genome. Those ESTs can come from one or more UniGene clusters, whose IDs are noted by the EST cluster. (UniGene clusters are made with a different build procedure, so there is not necessarily a one-to-one correspondence between EST clusters on the Hs_UniGene map and clusters in the UniGene resource.)

Tgu_UniGene Alignment of zebra finch EST clusters to the assembled genomic sequence. ESTs are clustered based on shared introns and alignment to a common position on the genome. Those ESTs can come from one or more UniGene clusters, whose IDs are noted by the EST cluster. (UniGene clusters are made with a different build procedure, so there is not necessarily a one-to-one correspondence between EST clusters on the Gga_UniGene map and clusters in the UniGene resource.)

STS Placement of STSs from a variety of sources onto the assembled genomic sequence (the NW_xxxxxx contigs, described above) using Electronic-PCR (e-PCR). Marker data for chicken is the result of a collaborative effort directed by Martin Groenen at the Wageningen University and Research Centre in the Netherlands. Also represented on this map are markers for turkey, Meleagris gallopavo, developed by Kent Reed and colleagues at the University of Minnesota.

RefSeq Transcripts

Diagrams of the RNAs that are predicted on the genomic contigs. The Transcript map and Gene_Sequence map are built in the same way, using the same types of evidence, described above. The Gene_Sequence map,however, shows a view of all the exons in a gene, while the Transcript map shows the combinations of exons (i.e., splice variants) that are valid, based on mRNA sequences.

Genetic Linkage Maps back to top

Wageningen University and Research Centre

The chicken linkage map data displayed in this viewer was obtained from Martien Groenen at the Wageningen University and Research Centre in the Netherlands, who has directed a collaborative effort to collect this data for chicken.

University of Minnesota

The turkey linkage map was obtained from Kent Reed at the University of Minnesota. The UMN map represents a collection of linkage data published by several groups in addition to the University of Minnesota, including the Roslin Institute (see Burt et al., 2003), Tuskegee University (see Huang et al., 1999), and researchers from Purdue University and Virginia Tech (see Latch et al., 2002).

Types of objects and maps on which they can be found back to top

Clones back to top

Components of Sequence Assembly back to top

GenBank Accessions back to top

Genes back to top

Polymorphisms back to top

  • Sequence maps

STSs back to top

Legend back to top

Verbose Mode back to top

By default, the master map at the right side of the display is shown in verbose mode, which provides descriptive information (as available) for each object on the master map.

Orientation back to top

Object Location Symbol Meaning
Plus strand Genes shown to the right of the grey line are transcribed in the + orientation (from top down); contigs with a + orientation are read from top down
Minus strand Genes shown to the left of the grey line are transcribed in the - orientation (from bottom up); contigs with a - orientation are read from bottom up
Unknown ? The orientation of the map element is unknown.

Links to Related Resources back to top

Each map element displayed in your search results will be associated with a number of links (when available) that lead to additional information. The links include:

Linked Text Link Action Description

Map element Map View The results of a search list the map elements that contain your search term. Those elements can be present in one or more maps. Following the link for a particular map element leads to a graphical view of the chromosomal region that contains the element.

sv Sequence Viewer Graphically shows the position of the map element within the sequence region. The display includes a graphic depiction of the coding region (CDS), RNA, and gene features that have been annotated on that sequence region. A 2 Kb section of sequence is shown below that, with corresponding graphic annotations of the features. The left and right arrows at either end of the sequence data allow you to move upstream and downstream.
pr Protein Links to the corresponding protein sequence record in the Entrez Protein database.
dl Download Sequence

Opens a form that allows you to download a region of a chromosome. The form has two parts: (1) the top part allows you to enter chromosome coordinates in text boxes, and (2) the bottom part displays the NT_* contigs (or portions of them) that are found in that chromosome region.

Note that part 1 shows the position (base span) of the region on the chromosome, and part 2 shows the position of the region on the contig. The "strand" column for each contig shows whether that contig is on the plus or minus strand of the chromosome. Therefore, if a contig is on the minus strand, increasing the value of the 3' chromosome coordinate will decrease the value of the 5' contig coordinate.

The options to "Display, Save to Disk, and View Evidence" allow you to view the individual contigs in the region (or portions of them, depending on the chromosome region specified).

By default, the dl link beside each gene displays the chromosome and contig coordinates for the span of that gene. To view/save additional sequence data upstream and downstream of the gene, simply adjust the chromosome coordinates and press the "Change Region" button. Note that the contig coordinates will also change.

ev Evidence Viewer Graphical display of the biological evidence supporting a particular gene model. It displays all RefSeq models, GenBank mRNAs, annotated known or potential transcripts, and ESTs that align to the genomic sequence region of interest. (more...)
mm Model Maker Allows you to view the evidence that was used to build a gene model on assembled genomic sequence, and to create your own version of the model by selecting exons of interest. Model Maker is accessible from sequence maps that were analyzed at NCBI and displayed in Map Viewer.
hm HomoloGene a resource of curated and calculated orthologs for genes as represented by UniGene or by annotation of genomic sequences.  (more about HomoloGene...)

STS Maps Legend back to top

Colored dots indicate uniqueness of STS positions back to top

The rightmost edge of the verbose display includes columns of colored dots that indicate which maps have data for each marker. The color of the dots indicates whether an STS has been mapped to a unique position on that map:

green dot marker has been mapped to only one location on the chromosome being displayed
green dot with black slash marker has been mapped to multiple locations on the chromosome being displayed
green and yellow dot marker has been mapped to the chromosome being displayed, and also to another chromosome
yellow dot marker has been mapped to one location, but on a different chromosome from the one being displayed
yellow dot with black slash marker has been mapped to multiple locations on a different chromosome from the one being displayed

For example, if you are viewing chromosome 2, a yellow dot indicates that the map named in the column header has placed that marker in a single location on another chromosome.

Polymorphism Column back to top

The polymorphism column indicates whether the marker has been used to detect a polymorphism, with Y for yes and N for no.

Detailed Marker Information back to top

To see detailed mapping information about a marker, follow the link for that marker to its UniSTS record.

Constructing queries back to top

Searchable Terms back to top

Text terms back to top

The Gallus gallus data are searchable with the following types of terms:

  • gene symbol
  • gene name
  • marker name
  • marker alias
    Sometimes two or more marker names refer to the same primer pair, and are therefore considered synonyms or "aliases." In such cases, any one of the terms will retrieve the marker.
  • text words
    e.g., a search for actin will retrieve map objects containing that word in their descriptions.
    If multiple terms are entered, they will automatically be combined with a Boolean AND.  Also, adjacency searches are not supported at present. For example, a query entered as "cell adhesion" will be processed as cell AND adhesion and will retrieve records with descriptions that contain cell matrix adhesion as well as cell adhesion. The section on Boolean Operators provides information about additional options.

The system will retrieve mapped objects containing the search terms in their descriptions.

Truncation back to top

Search terms can also be truncated at the right end only, using an asterisk (*) as a wild card to represent zero to many characters. See the truncation section of the general Map Viewer Help document for more details.

Map Positions or Regions back to top

As noted in the Search By Position section of the Entrez Map Viewer general help document, there are three main ways to search by map position from the Map View of a chromosome:
  1. enter a range of interest in the Region text boxes in side bar
  2. click on the region of interest in the chromosome thumbnail graphic in the sidebar
  3. click on a region of interest in the enlarged Map View of the chromosome

For chicken, the following types of map positions can be entered in the Region text boxes noted in option 1:
  • gene symbols
  • base pairs

Query options back to top

As mentioned in the Searchable terms section, any term entered in the query box will be treated as an independent entity to be joined by the 'AND' Boolean operator. It is also possible to construct more complex queries by using explicit Boolean ooperators (AND, OR, NOT), field restriction, or limiting retrieval to records that have certain properties. Finally, an Advanced Search page allows you to use a number of query options simply by checking boxes and/or radio buttons.

Boolean Operators back to top

If multiple terms are entered, they will automatically be combined with a Boolean AND, as mentioned in the Text Terms section above. Adjacency searches are not supported at present. For example, a query entered as  cell adhesion  will be processed as  cell AND adhesion  and will retrieve records with descriptions that contain  cell matrix adhesion  as well as  cell adhesion.

You can choose to use any Boolean operators (AND, OR, NOT) in your query. Boolean operators must be written in upper case.

The general syntax for a Boolean Query is:

term[field] BOOLEAN term[field] BOOLEAN term[field]

The available search fields and their corresponding abbreviations (qualifiers) are listed below.

By default, Boolean operators are processed from left to right. The order in which Entrez processes a search statement can be changed by enclosing individual concepts in parentheses. The terms inside the parentheses are processed first as a unit and then incorporated into the overall strategy.

Search fields back to top

If desired, you can restrict the search for a term to a particular field by placing the field qualifier in square brackets [] after the term. It is not necessary to include a space between the search term and the field specifier.

If no field qualifier is used, the system will search all fields.

Terms can be combined with Boolean operators, as described above.

The Advanced Search page (see example) also provides the ability to restrict your search to specific fields, and to limit retrieval to mapped objects that have desired properties.

Search fieldDescriptionQualifier
accessionthe nucleotide accession of a GenBank component or the nucletide or protein accessions for RefSeqs [accession], [acc], [accn]
chromosomethe chromosome number[chr]
idthe integer identifier for a particular type of object; useful in combination with type[id]
map namethe name of the map
(The general Map Viewer Help document provides a list of map names. Use the character string in the "URL value" column.)
[map_name],[map]
symbolthe gene symbol or other short name; includes clone names, marker names, and alternate symbols (also referred to as aliases or synonyms; see Text Terms section above for example)[sym]
titlegene name, symbol, or description[title], [ti], [titl]
typetype of mapped object; most useful in combination with id
Options are: component, contig, locus, sts
[obj_type]


Advanced Search Page back to top

The Advanced Search page allows you to use a number of query options by simply checking boxes or radio buttons that represent various search fields, properties, object types. It also allows you to limit your query to one or more chromosomes.

The Advanced Search page is accessible from the header region of the genome view page.

Constructing URLs that link to Map Viewer back to top

If you would like to create WWW links to the Map Viewer, the instructions for constructing URLs are given in the general Map Viewer Help document. You can construct URLs that either perform a search or display a specific mapped object or chromosomal region.

Questions or Comments?
Write to the NCBI Service Desk