NCBI
Entrez PubMed Nucleotide Protein Genome Structure UniGene
 Search for
  Limits Preview/Index History Clipboard Details    
About Entrez
spacer gif
back to About Entrez
back to About Entrez


HomoloGene
Home
Query Tips
Build Procedure
FTP Site

Genome Resources
Homo sapiens
Mus musculus
Rattus norvegicus
Danio rerio


HomoloGene Build Procedure

The input for HomoloGene processing consists of the proteins from the input organisms. These sequences are compared to one another (using blastp) and then are matched up and put into groups, using a tree built from sequence similarity to guide the process, where closer related organisms are matched up first, and then further organisms are added as the tree is traversed toward the root. The protein alignments are mapped back to their corresponding DNA sequences, where distance metrics can be calculated (e.g. molecular distance, Ka/Ks ratio). Sequences are matched using synteny when applicable. Remaining sequences are matched up by using an algorithm for maximizing the score globally, rather than locally, in a bipartite matching. Cutoffs on bits per position and Ks values are set to prevent unlikely "orthologs" from being grouped together. These cutoffs are calculated based on the respective score distribution for the given groups of organisms. Paralogs are identified by finding sequences that are closer within species than other species.