UniGene is a collection of Genbank sequences that have been organized into clusters - the aim
is for each cluster to represent one gene.
Expressed Sequence Tags (ESTs) make up most of the clusters.
These are short nucleotide sequences (100-400 nt) and so represent only
fragments of genes.
ESTs are selected and sequenced at random from cDNA libraries
derived from a cell line or tissue of interest e.g. a cell line from a family with hereditary thrombophilia.
Such rapid production of ESTs enables a much faster rate of progress of gene discovery. |
Coagulation factor V has been entered in the text box, click on "go".
UniGene is an experimental system for automatically
partitioning GenBank sequences into a non-redundant set of
gene-oriented clusters. Each UniGene cluster contains
sequences that represent a unique gene, as well as related
information such as the tissue types in which the gene has
been expressed and map location.
In addition to sequences of well-characterized genes,
hundreds of thousands novel expressed sequence tag (EST)
sequences have been included. Consequently, the collection
may be of use to the community as a resource for gene
discovery. UniGene has also been used by experimentalists to
select reagents for gene mapping projects and large-scale
However, it should be noted that the procedures for
automated sequence clustering are still under development
and the results may change from time to time as
improvements are made. Feedback from users has been
especially useful in identifying problems and we encourage
you to report any problems you encounter.
It should also be noted that no attempt has been made to
produce contigs or consensus sequences. There are several
reasons why the sequences of a set may not actually form a
single contig. For example, all of the splicing variants for a
gene are put into the same set. Moreover, EST-containing
sets often contain 5' and 3' reads from the same cDNA clone,
but these sequences do not always overlap.
At present, only sequences from human, rat, and mouse have
been processed. These species were chosen because they
have the greatest amounts of EST data available. Additional
organisms may be added in the future.
A representation of the UniGene datasets is available by ftp
A description of the UniGene build procedure is available.
An article about the UniGene Collection in the August 1997 NCBI News contains an overview of the project. Although the number of UniGene clusters has changed since that article was written due to improvements in the clustering algorithm, the article provides background information as well as a description of how the collection was used in the Transcript Map project (see Schuler et al., 1996, below).
Additional references include:
Schuler (1997). Pieces of the puzzle: expressed sequence tags and the catalog of human genes. J Mol Med 75(10),694-698.
Schuler et al. (1996). A gene map of the human genome. Science 274, 540-546.
Boguski & Schuler (1995). ESTablishing a human transcript map. Nature Genetics 10, 369-371.