NCBI Logo
NCBI News




In this issue


Field Guide
to GenBank


Human Reference Sequence

UniGene Expands

Rat Genome Assembly

Taxonomy Browser

Search the
NCBI Web


Recent Publications

New Genomes
in GenBank


Entrez Quiz

Submissions Corner

GenBank Cumulative Updates

GenBank
Release 135


Masthead

 





First Version of Human Genome Reference Sequence
Debuts on DNA's 50th


April 14, 2003 marked the 50th anniversary of the description of the structure of DNA and also saw the release of the first version of the 3 billion base pair reference sequence of the human genome. Annotations to the raw sequence made public on April 14 were released on April 29 when the reference genome, NCBI build 33, appeared in the NCBI Map Viewer.The human genome reference sequence is a critical contribution to RefSeq, NCBI’s database of reference sequences for genomes, mRNAs, and proteins.

The reference sequence covers about 99 percent of the human genome’s gene-containing , euchromatic, regions, at an accuracy of 99.99 percent. Only the sequence near centromeres, telomeres, and other heterochromatic blocks, along with a small number of unclonable gaps, has yet to be determined. Small updates to the assembly will continue to be made as complex regions are refined and the remaining gaps, about 400 of less than 100 kilobases each, are closed.

The assembly is the end product of several years of collaborative work by the Human Genome Project sequencing centers, NCBI, the University of California at Santa Cruz, and Ensembl, a joint project between EMBL-EBI and the Sanger Institute.

There are about 32,000 genes annotated by NCBI on build 33; of those, almost 18,000 are supported by mRNA alignments and may be considered to be confirmed. The typical confirmed human gene has 12 exons of an average length of 236 base pairs each, separated by introns of an average length of 5,478 base pairs. As a consequence, the average intron length is about twice the average transcript length. Some statistics on NCBI’s build 33 are given in Table 1.

To view the genome or download the sequence, start at the “Human Genome Resources” link under “Hot Spots” on the NCBI Home Page. See the upcoming summer edition of the NCBI News for more on the human genome and how to explore it at NCBI.

Statistics: First version of the human genome reference sequence, NCBI human genome build 33.

Table 1: Statistics: First version of the human genome reference sequence, NCBI human genome build 33.



Continue to: UniGene Expands


NCBI News | Fall/Winter 2002 NCBI News: Spring 2003