Bos taurus genome data and search tips

PubMed

Nucleotide

Protein

Genome

Structure

Taxonomy

Bos taurus genome data and search tips Revised January 10, 2012

The Map Viewer help document describes how to use the Map Viewer software. This page describes the data available for Bos taurus, and the search tips specific to that organism. Upon familiarization, you may return to the Bos taurus genome overview page or stop by the Map Viewer home page, where you can search the genome data of any organism represented in MapViewer.

Scope of Data

Available Maps
- Sequence Maps
- Genetic Map
- Radiation Hybrid Map

Constructing Queries

Constructing URLs

Scope of Data

The Map Viewer provides a view of cow data from a variety of sources described below.

Cow Genomic Sequence Data

The current cow genome build, 6.1, includes two full genome assemblies. In this build, the default view was changed to UMD_3.1.

The reference assembly, UMD_3.1, was produced by the University of Maryland using the genomic sequence generated by Baylor College of Medicine.

Build 6.1 also includes an alternate assembly, Btau_4.6.1, produced by the Human Genome Sequencing Center at Baylor College of Medicine. The genome was derived from a female and a male of the Hereford breed. The sequencing strategy produced a 7-fold mixed assembly that combines whole-genome shotgun (WGS) sequence and BAC sequence.

The mitochondrial genome presented in build 6.1, NC_006853, is not derived from the Herefords used for the WGS/BAC data but was obtained from a Korean native cow.

BLAST of Cow Genomic Sequence

The complete set of cow sequence databases available for BLAST searching are shown on the cow BLAST page, which includes a link to the database descriptions.

Additional Cow Genome Resources

In addition to the Bos taurus data available in the Map Viewer and through BLAST, links to NCBI resources and external sites are available from Genome.

Available Maps

The available maps for Bos taurus include:

Sequence Maps

Assembly
The Assembly map allows users to visualize all of the sequence data available for a given region of the genome, and separates the data by assembly.

Data are currently available for the reference assembly, UMD_3.1, and for the alternate assembly Btau_4.6.1.

When viewing the Assembly map, a blue vertical line indicates the assembly that is being viewed. The reference assembly is shown as the blue line by default.

The orange vertical lines show regions of the genome where sequence data from other assemblies are available.

The Maps&Options dialog box allows you to change the assembly being displayed. To do that, select the desired assembly in the Assembly menu, select Assembly from the list of maps on the right, press the Change Assembly button, and then click OK. When the display is refreshed, the line color for your selected assembly will change from orange to blue, and it will move to the right side of Assembly map. (Conversely, the other assembly(ies) available for the chromosome region will now be shown as vertical orange lines on the left side of the map.)

Clone

Alignment of BAC end sequences to the assembled genomic sequence. Placement method is documented here.

Component

The component map provides the tiling path of GenBank "DAAA02xxxxxx" or "AAFC03xxxxxx" accessions used to build the "NW_xxxxxx" WGS contigs.

Contig

Shows the chromosomal placement of NW_xxxxxx contigs on the assembly of whole genome shotgun (WGS) data.

CpG Island

Shows regions of high G + C content on the assembled genome sequence. Two sets of criteria were used for finding CpG islands: "strict" and "relaxed," described below. The algorithm (and cutoffs) were taken from Takai and Jones, 2002.

relaxed (shown with light blue shading on the map)

200 bp min length
50% or higher G + C content
0.60 or higher observed CpG / expected CpG
post-processing: merge islands that are <= 100 bp apart

strict (shown with dark blue shading on the map)

500 bp min length
50% or higher GC content
0.60 or higher observed CpG / expected CpG

GenBank_DNA

Shows the placement of cow genomic DNA sequences from GenBank on the assembly of whole genome shotgun (WGS) data. Placement is based on the alignment of the sequences to the components (DAAAxxxxxxxx or AAFCxxxxxxxx) of the contigs. It includes cow genomic sequences longer than 500 bp that have at least 97% identity to the components for at least 98 base pairs. If a sequence extends beyond a contig, that portion of sequence is not shown. The 'hits' link leads to a tabular display that shows the matching regions (base spans) of the assembly component and the GenBank genomic DNA record that has been aligned to it.

The length of a line represents the upper and lower-most points on the genome assembly to which sequence fragments from a single GenBank record were aligned.

When the GenBank_DNA map is displayed as the master map, in the default verbose mode, the descriptive text includes several columns: Total Bases which shows the total number of bases in the GenBank record; Aligned Bases which shows the total number of bases from that record that were aligned to the genome; % identity for the alignment; % coverage which shows how much of the Genbank record aligned to the genome as a percentage; Alignment-length ratio, which is the ratio of the alignment length in the genome to the alignment length of the Genbank record; and Breed from which the Genbank record was derived, when available.

Gene

Genes that have been annotated on the genomic contigs. This includes known and putative genes placed as a result of alignments of mRNAs to the contigs.

If multiple models exist for a single gene, corresponding to splicing variants, the Gene map presents a flattened view of all the exons that can be spliced together in various ways. For example, if one splice variant uses exons 1, 3, 4, and another splice variant uses exons 2, 3, 4, the Gene map shows exons 1, 2, 3, 4. (In comparison, the RefSeq RNA map shows what combinations of exons are valid based on mRNA sequences from RefSeq.)

Genes shown on the left of the grey line are transcribed in the - orientation (from bottom up), and those on the right in the + orientation (from top down).

When Gene is selected as the Master map, the verbose display (detailed labeling, shown by default) includes arrows to the right of each gene name indicate its direction of transcription as well as links to:

sv - sequence viewer (more...)
pr - protein (more...)
dl - view/download sequence data from a chromosome region (more...)

Additional information about these links is also provided in the Entrez Map Viewer Help Document, under Links to Related Resources.

Gene models are shown in five colors, depending on the type of evidence that was used to construct the models. The one or two letter code shown in the evidence column (that is displayed when Gene is the master map) also indicates the type of evidence.

Gene Color Evidence Code Type of evidence used to construct gene model

Blue C Confirmed gene model - model based on alignment of mRNA, or mRNAs plus ESTs, to the genomic sequence (see additional notes, below)

Light Green E EST only - model based on EST evidence only

Dark Brown PE Predicted+EST - model predicted by Gnomon and EST evidence (more about Gnomon)

Light Brown P Predicted only - model predicted by Gnomon (more about Gnomon)

Orange ? Conflict - there is some discrepancy between the mRNA sequence and the gene model (see additional notes, below)

I Interim GeneID - model based alignment of mRNAs, or mRNAs plus ESTs, to the genome, in which the aligning transcripts could not be unambiguously assigned to a preexisting GeneID (see additional notes, below)

Additional Notes:

In general, a gene model is shown in blue if there is a clean alignment between a RefSeq or GenBank mRNA sequence and the genomic sequence, and if there is an exact match between the protein product that was annotated in the mRNA sequence record and the conceptual translation of the genomic sequence gene model.

A gene model is shown in orange if there is some discrepancy between the mRNA sequence and the gene model, either in the alignment of the two and/or in their protein products. Examples of the former can include gaps, or the alignment of an mRNA to two or more genomic regions. Examples of the latter can include differences between the amino acid sequence given in an mRNA sequence record and the conceptual translation of the corresponding gene model, or premature termination of a coding region in the genomic sequence. Both of those can be caused by base pair mismatches between the mRNA and genomic sequence.

Models with Interim GeneIDs (evidence code I) may be paralogs, genes not yet curated, duplications because of assembly errors, or pseudogenes. The genome assembly and annotation pipeline assigns interim IDs when there is no unambiguous solution to what they should be. Interim GeneIDs are always associated with a RefSeq XM_* accessions (model mRNAs), although supporting alignments may (or may not) include RefSeq NM_* accessions (known mRNAs). More about RefSeq and RefSeq accessions can be found at the RefSeq homepage.

Model Transcripts
Models generated by Gnomon. mRNA alignments were used to segment the genomic sequence by putative gene boundaries, and Gnomon was executed on these segments to predict genes. Gnomon uses protein alignments in addition to transcript alignments and, in order to capture as much coding information in the genome as possible in this assembly, Gnomon models may represent partial as well as complete coding sequences. Models built using alignments are blue, the models with frameshifts or premature stops are green, and the pure ab initio predictions are brown.

RefSeq Transcripts

Diagrams of the RNAs that are predicted on the genomic contigs. The RefSeq Transcripts map and Gene map are built in the same way, using the same types of evidence, described above. The Gene map, however, shows a view of all the exons in a gene, while the RefSeq Transcripts map shows the combinations of exons (i.e., splice variants) that are valid, based on mRNA sequences.

Bt_RNA

Alignment of individual cow mRNAs and ESTs to the assembled bovine genomic sequence.

Hs_RNA

Alignment of individual human mRNAs to the assembled bovine genomic sequence.

Oar_RNA

Alignment of individual sheep mRNAs and ESTs to the assembled bovine genomic sequence.

Ssc_RNA

Alignment of individual pig mRNAs and ESTs to the assembled bovine genomic sequence.

Repeats

Position of repetitive elements

The following version of RepeatMasker was executed:
RepeatMasker version open-3.1.3 , sensitive mode run with blastp version 2.0MP-WashU RepBase Update 20060120, RM database version 20060120
using these flags:

-wublast
-s
-cutoff 255
-species \"Bos taurus\"
-frag 20000

STS	Placement of STSs from a variety of sources onto the assembled genomic sequence (the NW_xxxxxx contigs, described above) using Electronic-PCR (e-PCR).

Bt_UniG

Alignment of cow EST clusters to the assembled bovine genomic sequence. ESTs are clustered based on shared introns and alignment to a common position on the genome. Those ESTs can come from one or more UniGene clusters, whose IDs are noted by the EST cluster. (UniGene clusters are made with a different build procedure, so there is not necessarily a one-to-one correspondence between EST clusters on the Bt_UniG map and clusters in the UniGene resource.)

MARC map

The MARC genetic marker map was generated by localizing the marker in the position of the chromosome which contains that marker using the data from USDA Meat Animal Research Center (MARC) database.

ILTX RH map

This is a third generation map of the Illinois-Texas 5,000-rad RH map that contains 3,484 ordered markers, of which 3,204 are anchored in the human genome. The map was created using a 5000 rad cattle-hamster RH panel by Womack et al. The details of the third-generation whole-genome and comparative maps are described by Everts-van der Wind, et al. The map is a result of collaboration between the University of Illinois at Urbana-Champaign (H. A. Lewin, PI) and Texas A&M University (J. E. Womack, PI), funded by the USDA National Research Initiative.

Constructing queries

Searchable Terms

The Map Viewer supports searching on any term that describes an element on any map, including:

symbols
A search for symbol PNLIP will retrieve the locus named pancreatic lipase. Sometimes two or more symbols refer to the same locus and are considered synonyms or aliases. In this case, either term will retrieve the same information for viewing.
GenBank accessions
e.g., a search for accession AB113380 will retrieve the map representing the chromosome to which this sequence aligns.
markers
e.g., a search for USP11 will retrieve the chromosome X map containing this STS marker. If a marker alias exists (i.e., L03387), either one of the terms will retrieve the marker.
text terms
e.g., a search for actin will retrieve all map objects containing that word in their description. If multiple terms are entered, they will automatically be combined with the 'AND' Boolean operator.

Map Positions

As noted in the Search By Position section of the Entrez Map Viewer Help Document, there are three main ways to search by map position from the Map View of a chromosome:

enter a range of interest in the Region text boxes on the left sidebar
click on the region of interest in the chromosome thumbnail graphic in the sidebar
click on a region of interest in the enlarged Map View of the chromosome

Allowable Values

For Bos taurus, the following types of map positions can be entered in the left sidebar text boxes noted in option 1:

symbols - you can enter gene symbols, marker names, or alternate symbols or marker names to display a region of the chromosome between those mapped elements. Note that both mapped elements must be present on the maps that share the same coordinate system in order for the range search to work properly.
numerical positions - can be used if the master map is a genetic map, radiation hybrid map, YAC map, or sequence map. It is not necessary to specify units. The Map Viewer will interpret the range in the units of the master map (centiMorgans, centiRays, ordinal units, or bases, respectively).

It is not necessary to enter a value in both Region text boxes. If you enter a value only in the upper box, the Map Viewer will display the region of the chromosome starting from that point and ending at the lower end of the chromosome. If you enter a value only in the lower box, the Map Viewer will display the region of the chromosome starting at the upper end of the chromosome and ending at the value entered.

General Tips

As mentioned in the Searchable Terms section of the Entrez Map Viewer Help Document, any term entered in the query box will be treated as an independent entity to be joined by the 'AND' Boolean operator. It is also possible to construct more complex queries by using explicit Boolean operators (AND, OR, NOT), field restriction, or limiting retrieval to records that have certain properties.

The Advanced Search page allows you to use a number of query options by simply checking boxes or radio buttons that represent various search fields, properties, and object types. It also allows you to limit your query to one or more chromosomes. The Advanced Search page is accessible from the header region of the genome view page.

Constructing URLs that link to Map Viewer

If you would like to create WWW links to the Map Viewer, the instructions for constructing URLs are given in the general Map Viewer Help document. You can construct URLs that either perform a search or display a specific mapped object or chromosomal region. For example:

Find loci that are myosin-related.
https://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=9913&query=myosin
Find two STS markers, BM7233 and BM1905.
https://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=9913&query=BM7233+OR+BM1905&qchr=&advsrch=off&neighb=off

Questions or comments:
Write to NCBI Service Desk