| PubMed | Nucleotide | Protein | Genome | Structure | Taxonomy |
| Sus scrofa - pig genome data and search tips | Revised July 14, 2008 |
|
The Map Viewer help document describes how to use the Map Viewer software. This page describes the data available for Sus scrofa (pig), and the search tips specific to that organism. The Map Viewer home page allows you to search the genome data of any organism represented in MapViewer. |
|
|
| Pig Genomic Sequence Data |
|
|
The current pig genome build (1.1) is based on the assembly produced by the Wellcome Trust Sanger Institute. The genome was derived from a female of the Duroc breed. The sequencing strategy is BAC-based and is proceeding chromosome-by-chromosome; build 1.1 includes assemblies for chromosomes 1, 4, 5, 7, 11, 13, 14, 15, 17 and X. The mitochondrial genome presented in build 1.1, NC_000845 , is not derived from the Duroc used for the nuclear genome assembly but was obtained from a Landrace pig. |
| Pig BLAST Databases |
|
|
The complete set of pig sequence databases available for BLAST searching are shown on the pig BLAST page, which includes a link to the database descriptions. |
| Additional Pig Genome Resources |
|
| In addition to the Sus scrofa data available in the Map Viewer and through BLAST, links to NCBI resources and external sites are available from the Pig Genome Resource Guide. |
|
|
|
The available maps for Sus scrofa include: |
| Sequence Maps |
|
| Ab initio |
Models generated by Gnomon. mRNA alignments were used to segment the genomic sequence by putative gene boundaries, and Gnomon was executed on these segments to predict genes. Gnomon uses protein alignments in addition to transcript alignments and, in order to capture as much coding information in the genome as possible in this assembly, Gnomon models may represent partial as well as complete coding sequences. Models built using alignments are blue, the models with frameshifts or premature stops are green, and the pure ab initio predictions are brown. |
| Component | The component map provides the tiling path of BAC accessions from GenBank used to build the "NW_xxxxxx" contigs. |
| Contig | Shows the chromosomal placement of NW_xxxxxx contigs on the assembly. |
| CpG Island | Shows regions of high G + C content on the assembled genome sequence. Two sets of criteria were used for finding CpG islands: "strict" and "relaxed," described below. The algorithm (and cutoffs) were taken from Takai and Jones, 2002.
|
| GenBank_DNA | Shows the placement of pig genomic DNA sequences from GenBank on the assembly. Placement is based on the alignment of the sequences to the BAC components of the contigs. It includes pig genomic sequences longer than 500 bp that have at least 97% identity to the components for at least 98 base pairs. If a sequence extends beyond a contig, that portion of sequence is not shown. The 'hits' link leads to a tabular display that shows the matching regions (base spans) of the assembly component and the GenBank genomic DNA record that has been aligned to it. The length of a line represents the upper and lower-most points on the genome assembly to which sequence fragments from a single GenBank record were aligned. When the GenBank_DNA map is displayed as the master map, in the default verbose mode, the descriptive text includes several columns: Total Bases which shows the total number of bases in the GenBank record; Aligned Bases which shows the total number of bases from that record that were aligned to the genome; % identity for the alignment; % coverage which shows how much of the Genbank record aligned to the genome as a percentage; and Alignment-length ratio, which is the ratio of the alignment length in the genome to the alignment length of the Genbank record. |
| Genes_Sequence | Genes that have been annotated on the genomic contigs. This includes known and putative genes placed as a result of alignments of mRNAs to the contigs. If multiple models exist for a single gene, corresponding to splicing variants, the Gene_Sequence map presents a flattened view of all the exons that can be spliced together in various ways. For example, if one splice variant uses exons 1, 3, 4, and another splice variant uses exons 2, 3, 4, the Gene_Sequence map shows exons 1, 2, 3, 4. (In comparison, the RefSeq RNA map shows what combinations of exons are valid based on mRNA sequences.) Genes shown on the left of the grey line are transcribed in the - orientation (from bottom up), and those on the right in the + orientation (from top down). When Gene_Sequence is selected as the Master map, the verbose display (detailed labeling, shown by default) includes arrows to the right of each gene name indicate its direction of transcription as well as links to:
Additional information about these links is also provided in the Entrez Map Viewer Help Document, under Links to Related Resources. Gene models are shown in five colors, depending on the type of evidence that was used to construct the models. The one or two letter code shown in the evidence column (that is displayed when Gene_Sequence is the master map) also indicates the type of evidence. |
|
|
Additional Notes: In general, a gene model is shown in blue if there is a clean alignment between a RefSeq or GenBank mRNA sequence and the genomic sequence, and if there is an exact match between the protein product that was annotated in the mRNA sequence record and the conceptual translation of the genomic sequence gene model. A gene model is shown in orange if there is some discrepancy between the mRNA sequence and the gene model, either in the alignment of the two and/or in their protein products. Examples of the former can include gaps, or the alignment of an mRNA to two or more genomic regions. Examples of the latter can include differences between the amino acid sequence given in an mRNA sequence record and the conceptual translation of the corresponding gene model, or premature termination of a coding region in the genomic sequence. Both of those can be caused by base pair mismatches between the mRNA and genomic sequence. Models with Interim LocusIDs (evidence code I) may be paralogs, genes not yet curated, duplications because of assembly errors, or pseudogenes. The genome assembly and annotation pipeline assigns interim IDs when there is no unambiguous solution to what they should be. Interim LocusIDs are always associated with a RefSeq XM_* accessions (model mRNAs), although supporting alignments may (or may not) include RefSeq NM_* accessions (known mRNAs). More about RefSeq and RefSeq accessions can be found at the RefSeq homepage. |
| Phenotype | Shows the placement of quantitative trait loci (QTLs) on the assembled pig genome sequence. Each phenotype is placed by the position of the marker or markers by which it was mapped. |
| RefSeq RNA | Diagrams of the RNAs that are predicted on the genomic contigs. The RefSeq RNA map and Genes_Sequence map are built in the same way, using the same types of evidence, described above. The Genes_Sequence map, however, shows a view of all the exons in a gene, while the RefSeq RNA map shows the combinations of exons (i.e., splice variants) that are valid, based on mRNA sequences. |
| Repeats | Position of repetitive elements
The following version of RepeatMasker was executed:
|
| Bt_RNA | Alignment of individual cow mRNAs and ESTs to the assembled pig genomic sequence. |
| Eca_RNA | Alignment of individual horse mRNAs and ESTs to the assembled pig genomic sequence. |
| Hs_RNA | Alignment of individual human mRNAs to the assembled pig genomic sequence. |
| Oar_RNA | Alignment of individual sheep mRNAs and ESTs to the assembled pig genomic sequence. |
| Ssc_RNA | Alignment of individual pig mRNAs and ESTs to the assembled pig genomic sequence. |
| STS | Placement of STSs from a variety of sources onto the assembled genomic sequence (the NW_xxxxxx contigs, described above) using Electronic-PCR (e-PCR). |
| Ssc_UniG | Alignment of pig EST clusters to the assembled pig genomic sequence. ESTs are clustered based on shared introns and alignment to a common position on the genome. Those ESTs can come from one or more UniGene clusters, whose IDs are noted by the EST cluster. (UniGene clusters are made with a different build procedure, so there is not necessarily a one-to-one correspondence between EST clusters on the Ssc_UniGene map and clusters in the UniGene resource.) |
| Genetic Linkage Maps |
|
| MARC | The swine mapping population consisted of eight litters resulting from mating two white composite crossbred boars mated to eight Chinese (Meishan Minzhu or Fengjing) or Duroc x white composite crossbred sows. Two sires, eight dams and 94 progeny were genotyped. No grandparents were genotyped. The map includes 1266 markers; total map distance is 2480 cM. The map data have been obtained from MARC, the Meat Animal Research Center (Rohrer et al., 1996). |
|
|
| Searchable Terms |
|
The Sus scrofa data are searchable with the following types of terms:
The system will retrieve mapped objects containing the search terms in their descriptions. |
| Map Positions |
|
As noted in the Search By Position section of the Entrez Map Viewer general help document,
there are three main ways to search by map position from the
Map View of a chromosome:
It is not necessary to enter a value in both Region text boxes. If you enter a value only in the upper box, the Map Viewer will display the region of the chromosome starting from that point and ending at the lower end of the chromosome. If you enter a value only in the lower box, the Map Viewer will display the region of the chromosome starting at the upper end of the chromosome and ending at the value entered. |
| General Tips |
|
|
As mentioned in the Searchable Terms section of the Entrez Map Viewer Help Document, any term entered in the query box will be treated as an independent entity to be joined by the 'AND' Boolean operator. It is also possible to construct more complex queries by using explicit Boolean operators (AND, OR, NOT), field restriction, or limiting retrieval to records that have certain properties. The Advanced Search page allows you to use a number of query options by simply checking boxes or radio buttons that represent various search fields, properties, and object types. It also allows you to limit your query to one or more chromosomes. The Advanced Search page is accessible from the header region of the genome view page. |
|
|
If you would like to create WWW links to the Map Viewer, the instructions for constructing URLs are given in the general Map Viewer Help document. You can construct URLs that either perform a search or display a specific mapped object or chromosomal region. For example:
|
|
Questions or Comments? Write to the NCBI Service Desk |