NCBI LogoNCBI News

In this issue

Dazzling Graphics
with Cn3D 3.0

BLAST Offers
Taxonomic Views

HomoloGene:
Clusters of Clusters

News Briefs

Fly Genome Deposited
in GenBank

Drosophila Finds
New Home Page

Recent Publications

Frequently Asked
Questions

BLAST Lab

Masthead


HomoloGene: Clusters of Clusters

HomoloGene is a new NCBI database of both curated and calculated orthologs and homologs for the human, mouse, rat, and zebrafish genes represented in UniGene and LocusLink.

Curated orthologs include gene pairs from the Mouse Genome Database (MGD) at the Jackson Laboratory, the Zebrafish Information (ZFIN) database at the University of Oregon, and from published reports. Computed orthologs and homologs are identified from nucleotide sequence comparisons between all UniGene clusters for each pair of organisms. Calculated orthologs and homologs may be considered putative because they are based only on sequence comparison.

Computed similarities are detected using BLAST to compare nucleotide sequences for each pair of organisms, and to identify those sequence pairs that share the greatest degree of nucleotide sequence similarity. The best match for a sequence in one organism to a sequence in a second organism is based on the percentage of identical sequence, called the %ID in the HomoloGene report, for an alignment of a minimum of 100 base pairs. When sequences from two UniGene clusters are reciprocal best matches, the UniGene clusters corresponding to the pair of sequences are considered to represent a putative ortholog pair.

HomoloGene also contains a set of triplet ortholog clusters in which orthologous clusters in two organisms are also orthologous to the same cluster in a third organism. For the organisms human, mouse, and rat, there are currently over 7,000 of these triplets. For the organisms zebrafish, human, and rodent (mouse or rat), there are currently just over 200 triplets.


HomoloGene Search

To obtain a HomoloGene report for a gene, use the Query box at the top of the HomoloGene page to search using a UniGene ClusterID, LocusLink LocusID, gene symbol, gene name, nucleotide accession number, or any free text appearing in UniGene cluster titles. The HomoloGene report consists of a header section, followed by reports falling within any of three sections entitled Curated Orthologs, Calculated Orthologs, and Mutally Orthologous Pairs.

The header section gives the title of the HomoloGene cluster, followed by a listing of all the possible orthologs contained within it. For each entry, the UniGene cluster ID and the LocusLink ID are given.

The Curated Orthologs section gives pairs of orthologs and, for each pair, a link to the source in which the orthologous relationship is claimed. If the source is a research paper, the link is to a PubMed abstract. In other cases, the source may be MGD or ZFIN, and links are provided to these resources.

The Calculated Orthologs section gives a listing of putative ortholog pairs identified on the basis of sequence similarity. For each pair listed, a %ID score is given as a measure of reliability. If the gene in question is a member of a triplet ortholog cluster, the Mutally Orthologous Pairs section shows the triplet.


FTP Access to Data

The current datasets for the calculated orthologs and homologs and the mutally orthologous pairs are available via FTP at ftp://ftp.ncbi.nlm.nih.gov/pub/HomoloGene/. Link to the HomoloGene Webpage from the UniGene page at www.ncbi.nlm.nih.gov/UniGene/.
LW, DW



Continue


NCBI News | Spring 2000