NCBI HomeGenomic BiologyHuman Genome Resources Release notes  

Map Viewer (human) Help (human maps)

Release notes are available for builds:  24 25 26 27 28 29 30 31 32 33 34 35 36 37
Related Documentation:  Assembly & Annotation Information

   Release notes

Annotation Release 104 (November 2012)   Statistics back to top

Assembly

This annotation release includes three full assemblies plus an alternate assembly for chromosome 7:
  • Annotation processing improvements:

    Map Viewer

    Build 37.3 (October 2011)   Statistics back to top

    Assembly

    This build includes the reference assembly including nine partial chromosome alternate haplotypes and intermediate Fix and Novel patches for 100 defined regions of the assembly, one whole genome alternate assembly (HuRef), and a single chromosome alternate assembly (chromosome 7). See the Genome Reference Consortium web site for additional information about alternate haplotypes and Patches (See Assembly Statistics for GRCh37.p5)
  • Annotation processing improvements:

    Map Viewer


    Build 37.2 (November 2010)   Statistics back to top

    Assembly

    This build includes the reference assembly including nine partial chromosome alternate haplotypes and 70 partial chromosome Patches, two whole genome alternate assemblies (HuRef and Celera), and a single chromosome alternate assembly (chromosome 7). See the Genome Reference Consortium web site for additional information about alternate haplotypes and Patches (See Assembly Statistics for GRCh37.p2)

    Annotation

    Map Viewer


    Build 37.1 (August 2009)   Statistics back to top

    Jump to notes about:    Assembly      Annotation      Map Viewer      Miscellaneous

    Assembly

    This build includes the reference assembly including nine partial chromosome alternate haplotypes, two whole genome alternate assemblies, and a single chromosome alternate assembly.

    Annotation

    Map Viewer

    Miscellaneous


    Build 36.3 (March 2008)   Statistics back to top

    Jump to notes about:    Assembly      Annotation      Map Viewer      Reports

    Assembly

    This build adds the HuRef assembly and includes with no changes the reference genome assembly, the alternate Celera assembly, and partial chromosome alternate haplotypes.

    Annotation

    This build represents an annotation update. New and updated RefSeq and GenBank transcripts and proteins were aligned to the genome assemblies and new annotation models were calculated. The annotation methods were modified as follows:

    Map Viewer

    Reports


    Build 36.2 (September 2006)   Statistics back to top

    Jump to notes about:    Assembly      Annotation      Map Viewer

    Assembly

    The genome assembly did not change with this update; it is identical to that provided for build 36.1.

    Annotation

    This build represents an annotation update. New and updated RefSeq and GenBank transcripts and proteins were aligned to the genome and new annotation models were calculated. The annotation methods were modified for this build resulting in a significant reduction in the number of predicted splice variants (XM_ accessions).

    Map Viewer


    Build 36.1 (March 2006)   Statistics back to top

    Jump to notes about:    Assembly      Annotation      Map Viewer

    Assembly

    This build includes the reference assembly, a whole genome and single chromosome alternate assembly, and partial chromosome alternate haplotypes.

    Annotation


    Map Viewer



    Notes for Build 35 (August 2004) back to top

    Version 1

    Assembly

    Annotation

    Map Viewer

    The method of converting sequence to cytogenetic bands was changed in Build 35.1. Thus the content of the ideogram file on the ftp site has changed slightly. The conversions were provided by Terry Furey: Furey, TR; Haussler D, Integration of the cytogenetic map with the draft human genome sequence. Hum Mol Genet. 2003 May 1;12(9):1037-44. PubMed

    Maps added as part of this release include:

    Data for the Celera assembly is not available for all the maps produced for the reference assembly .


    Notes for Build 34 (March 2004) back to top

    Version 2

    With the release of this version, the link for Gene-specific information was changed from LocusLink to Gene.

    Annotation differs from that in Build 34.1 in that:

    Version 1

    Assembly

    In addition to the reference assembly (termed ref in the Maps&Options box), Map Viewer displays a reference sequence for the DR51 haplotype in the Major Histocompatibility Complex (NG_002432) and the assembly of chromosome 7 from the The Center for Applied Genomics, TCAG, termed HSC_TCAG.

    This Build is the first to include the pseudoautosomal region of the Y chromosome in the assembly, the resultant contigs, and the feature annotation.


    Annotation

    Versioning of annotation

    The version of a set of annotation is now displayed on the Map Viewer page. A version will be incremented if data in map or maps is updated, for example if a new dbSNP build is released and the Variation map is changed accordingly. The statistics page now supports reporting by version. In contrast, a Build is incremented only with a change in the reference sequence (assembly) itself, and the initial version for that new build is set as one(1).

    Gene Annotation

    The algorithm for placing mRNAs on the genome was improved to:

    These changes should be apparent in the UniGene and EST maps as well as the exon annotation on the Reference sequences.

    The number of genes annotated on the reference genome has decreased, and the number of models identified as pseudogenes has increased. This is primarily due to a change in the algorithm used to model genes, mRNAs and proteins which gives more weight to coding propensity and matches to existing proteins, and checks more rigorously for changes in frame. This method, developed by Alexandre Souvorov and named Gnomon, has replaced GenomeScan as our standard method of predicting gene models. It is discussed in more detail here. In this method, any gene model that results in a frameshift or premature termination relative to a set of conserved proteins is flagged as a probable pseudogene. That pseudogene is retained as the annotation unless: (1) the gene model corresponds to the the best placement of a RefSeq mRNA from a protein-coding gene, or (2) the gene is identified as protein-coding by best placement of known mRNAs. If mRNA aligns well to the model, a model RNA product is generated (RefSeq accession of the format XR_xxxxxx), otherwise the gene is annotated as /pseudo with no product. Because of the above, there are now three sources of pseudogene annotation:

    Map Viewer


    Release of this build involves addition of several maps and changes in software.

    Notes for Build 33 (April 2003) back to top

    The reference DNA sequence of Homo sapiens was first made available for downloading here. On April 28-29, 2003, the sequence records were updated with current annotation and Map Viewer was updated to reflect that annotation. No changes were made in data processing between Build 32 and Build 33.


    Notes for Build 32 back to top

    Assembly

    The number of 'finished' chromosomes increased. Now chromosomes 6, 7, 9, 10, 13, 14, 18, 19, 20, 21, 22 and Y are considered complete.

    Gene annotation

    Color assigned to gene models

    There is a modification in the use of color to convey information about the level of evidence supporting a gene and any conflicts in that annotation.
    In prior builds, model genes would be orange if they exhibited any difference compared to the genome, including any translation or transcription discrepancy of the mRNA/CDS with respect to the referenced cDNA/protein product (such as a single gap). Before Build 31, even mismatches, such as SNPs, resulted in this color difference. In Build 32, the use of the orange color has been made even more limited:

    1. there is less than 85% coverage of the mRNA transcription with respect to the cDNA product,
    2. the per cent identity of the transcription unit with respect to the cDNA product is less than 98%, or
    3. there is a gap in the coding region, that is, the translation of the coding region with respect to its referenced protein product is not limited to amino acid substitutions.
    The file on the ftp site that corresponds to the gene annotation, seq_locus.md has been modified to represent explicitly the coding and non-coding regions of each exon. Thus the LOCUS lines have been replaced with CDS and UTR, respectively.

    Map Viewer

    Phenotype map

    A new map was added in this build, namely a representation of phenotypes from OMIM in sequence coordinates. This map is called Phenotype in the Maps&Options selection, and is labeled as Pheno in query results page. Thus it is now easier, when querying by a disease name, to know if it has been placed on a sequence map at all.


    If the phenotype is associated with a known gene, the sequence correspond to those of the gene. If the phenotype is placed by linkage or association to mapped markers, the phenotype is placed by the position of that marker or markers. At present, there is no step to extend the range defined by the markers to reflect the level of confidence in any boundary marker.

    Representation of coding regions

    Coding regions are now represented differently from non-coding on both the Gene and Transcript (RNA) maps. On the Gene map, this representation is the summary of all the coding regions (CDS) and untranslated regions (UTR) for each transcript, if several are annotated. Thus it is possible to have a Gene look as if it has UTR interspersed with CDS, as in the case where there is a shorter variant with a 3' UTR that does not include all exons of any longer variant. In those cases, it might help to add the RNA map to the display.

    Notes for Build 31 back to top

    Assembly

    The number of 'finished' chromosomes increased. Now chromosomes 6, 7, 13, 20, 21, 22 and Y are considered complete.

    Gene annotation

    In this build, there is a significant reduction in the number of genes (from 34,539 to 26,846 ) annoted on the NT_000000 accessions and displayed on the Gene/Sequence map. This reduction resulted from:

    Map Viewer

    Several maps were added in this build:

    Notes for Build 30 back to top

    Assembly

    The number of 'finished' chromosomes increased. Now chromosomes 6, 7, 20, 21, 22 and Y are considered complete.

    The number of 'contigs' decreased from 2042 to 1395, based not only on more finished sequence, but also on the decision to retain 'contigs' composed of single BACS in the reference genome only if they contained a gene not found elsewhere in the assembly.

    Gene annotation

    In this build, there is a significant redunction in the number of genes annoted on the NT_000000 accessions and displayed on the Gene/Sequence map. This reduction resulted from:

    Although we recognize that this may result in the removal of some valid models, we hope that the models from the ab initio predictions (GenomeScan) will fill any gaps.

    Another change was to retain more RefSeq mRNA accessions in the annotation, based on the best placement in the reference genome or any haplotype.

    Map Viewer

    The deCodes genetic map was added between the release of Builds 29 and 30.

    Notes for Build 29 back to top

    Assembly

    Several significant changes were made in the assembly process in this build:

    Gene annotation

    The following modifications were made to the gene annotation process.


    Map Viewer

    Representation of expression data has been enhanced by the addition of the SAGE tag map. Please note that this may not always be available at the same time as the bulk of the data release, and will be added later.

    Notes for Build 28 back to top

    Assembly

    There were no modifications to the assembly process relative to the previous build.

    Gene annotation

    This build identifies more genes than the previous because of the following modifications:


    GenomeScan models are not instantiated as XM_000000 accessions if they overlap, on any strand and in excess of 100 bp, an alignment-based model such that the alignment-based model satisfied preliminary criteria, including ORF length and repeat masking.

    Map Viewer

    Although there were no changes in the maps provided and the methods of computing them, the ModelMaker tool was added and changes were made to the look and feel of access to zoom functions and tools supporting configuring the display.

    ModelMaker

    ModelMaker, accessed by the mm link in the legend of the Gene_seq map, allows the user to view aligned mRNAs, ESTs, and GenomeScan models in a strand specific way. Information about each exon, its translation in all reading frames, and its putative splice junctions are provided to enable the user to evaluate evidence for determining all valid combinations of exons and reading frames, test the open reading frames determined by the combination selected, and produce a final 'mRNA' to copy and use in subsequent research.

    Configuration

    Maps&Options has replaced the previous Display Settings link. It has been made more obvious by a new contrasting background color, and by being accessible not only within the blue bar at the top of the screen but also from the blue column at the left.

    Other changes in the basic display include:

    Notes for Build 27 back to top

    Assembly

    There were no modifications to the assembly process relative to the previous build.

    Gene annotation

    There were major modifications in annotating genes and mRNAs. In previous builds, genes were annotated based only on mRNA alignments, and alignments were not extended based on EST evidence. The Map Viewer was used to indicate other potential gene locations based on GenomeScan predictions and/or EST alignments. In this build, however, a more comprehensive set of genes was annotated on the contig sequences (and thus viewable on the Genes_sequence map) based on the combination of mRNA alignments, EST alignments, and GenomeScan predictions. In particular:

    Another major change was to reduce the amount of redundancy in the number of mRNA models selected to represent each gene. In the past, there was no restriction on the number, as long as each mRNA model differed in intron/exon content. In Build 27, multiple models were retained only if they were supported by RefSeq mRNAs and those RefSeq mRNAs matched the assembly at that region quite well. Multiple models per gene are, in fact, provided as RefSeq NM_000000 accessions. That is because another major change in this build was to discontinue providing model mRNAs as XM_000000 accessions when they matched quite well to existing NM_000000 accessions. Thus, when a RefSeq mRNA sequence (NM_000000) was determined to align to the genome with fewer than 3 gaps and fewer than 4 mismatches, that sequence was retained to represent the mRNA model, and no XM_000000 accession was retained or generated. more about RefSeq accessions... Thus gene models in Build 27 have been categorized as: These categories of genes are represented by various colors and evidence abbreviations on the Gene_Sequence map.

    Map Viewer

    There are also modifications to existing maps:


    New features

    seq link
    The link to the download sequence form to retrieve a region of genomic sequence has been made more evident for genes annotated on the NCBI contigs. When Genes_sequence or Genes_cytogenetic is the master, the full (verbose) label display includes a seq link to a form that displays the sequence in two coordinate systems: chromosome and the scaffold/contig represented by the NT_000000 accession (and preset to the coordinates of the gene). This makes it easier to download the genomic sequence including a gene of interest.

    Evidence Viewer
    The display has been modified to provide an indicator of the density of ESTs along an alignment.

    Notes for Build 26 back to top

    Gene annotation

    This build is the first to annotate and provide accessions (format XR_000000) for genes that are transcribed, but do not appear to encode a protein.


    Map Viewer

    The Gene_Sequence (Genes_seq) now uses color coding to indicate when there is conflicting data about a gene, or the alignment of a defining mRNA is not perfect. Genes that have such conflicts are represented in orange; those with consistent information are blue. Cases which cause the color to be represented as orange include:

    A partial set of connections between STS markers and phenotypes of OMIM is now included. More details about how to search for and display these connections are provided in the help documention for human Map Viewer.

    Notes for Build 25 back to top

    Assembly

    Before assembly, source sequences were divided into sub-chromosomal bins based on genetic and RH mapping data. This was done to prevent false joins that might occur when distant regions within a chromosome have very similar sequence.

    Gene annotation

    Genes continued to be identified based on alignment of mRNAs (RefSeq and GenBank) but not ESTs. Gene boundaries are based on those aligned models that

    A modification with this build relaxed the definition of 'shared exon' to:
    ends within 10 bp of each other AND at least 5 bp, AND at least 50% of the exon's length in common.

    Identification of the 5' and 3' extents of an aligment became more rigorous. Limits were set on the size of the first and last introns.

    The number of model mRNAs (accessions of the form XM_######) and proteins (XP_######) decreased slightly because of the slightly relaxed definition of intron/exon identity defined above. Model mRNAs were not discarded, however, if they were based on the alignment of a RefSeq.

    Protein matches to GenomeScan model proteins are now made to all of the nr database.

    Map Viewer


    Notes for Build 24 back to top

    Assembly

    Genomic accessions used in assembly were assigned to chromosomes according to information from chromosome coordinators. If the assignment for a sequence differed from that suggested by mapped STS, the sequence was assembled with the chromosome indicated by overlaps with other sequences.

    For chromosome 20, the assembly provided by the Sanger Centre was used.

    Gene annotation

    Models in this release were based on mRNAs (GenBank and RefSeq, but not ESTs). Alignment criteria were:

    Only the best alignment in the genome was retained. This is in contrast to the previous build in which only RefSeq mRNAs were used.

    For the first time, RefSeq reference gene accessions (format NG_######) were used to apply more detailed annotation by incorporating reference gene accessions in our assembly and annotation processes. See, for example, NG_000004 and its placement in NT_007812.

    Map Viewer


    New features

    Evidence viewer:
    Evidence Viewer provides these major functions: This display is currently accessed by the ev provided in the Map Viewer labels for genes, and in the Genome Annotation portion of a LocusLink report. It can also be accessed directly if you know the contig accession and the gene symbol: Example: http://www.ncbi.nlm.nih.gov/cgi-bin/Entrez/evv.cgi?contig=NT_011519&gene=HIRA



    Questions or Comments?
    Write to the Help Desk

    Disclaimer     Privacy statement

    NCBI   |   NLM   |   NIH   |   Top of page