Format

Send to

Choose Destination
Genome Biol Evol. 2016 Jan 18;8(2):446-57. doi: 10.1093/gbe/evw005.

Ortholog-Finder: A Tool for Constructing an Ortholog Data Set.

Author information

1
Department of Biological and Environmental Science, Shizuoka University, Japan horiike.tokumasa@shizuoka.ac.jp.
2
The Genome Institute, Japanese Foundation of Cancer Research, Tokyo, Japan.
3
Department of Economics, Chiba University of Commerce, Ichikawa, Japan.
4
Research Center for Aquatic Genomics, National Research Institute of Fisheries Science, Fisheries Research Agency, Kanagawa, Japan.
5
School of New Sciences, Daegu Gyoungbook Institute of Science and Technology, Daegu, Republic of Korea.

Abstract

Orthologs are widely used for phylogenetic analysis of species; however, identifying genuine orthologs among distantly related species is challenging, because genes obtained through horizontal gene transfer (HGT) and out-paralogs derived from gene duplication before speciation are often present among the predicted orthologs. We developed a program, "Ortholog-Finder," to obtain ortholog data sets for performing phylogenetic analysis by using all open-reading frame data of species. The program includes five processes for minimizing the effects of HGT and out-paralogs in phylogeny construction: 1) HGT filtering: Genes derived from HGT could be detected and deleted from the initial sequence data set by examining their base compositions. 2) Out-paralog filtering: Out-paralogs are detected and deleted from the data set based on sequence similarity. 3) Classification of phylogenetic trees: Phylogenetic trees generated for ortholog candidates are classified as monophyletic or polyphyletic trees. 4) Tree splitting: Polyphyletic trees are bisected to obtain monophyletic trees and remove HGT genes and out-paralogs. 5) Threshold changing: Out-paralogs are further excluded from the data set based on the difference in the similarity scores of genuine orthologs and out-paralogs. We examined how out-paralogs and HGTs affected phylogenetic trees constructed for species based on ortholog data sets obtained by Ortholog-Finder with the use of simulation data, and we determined the effects of confounding factors. We then used Ortholog-Finder in phylogeny construction for 12 Gram-positive bacteria from two phyla and validated each node of the constructed tree by comparison with individually constructed ortholog trees.

KEYWORDS:

eukaryote; horizontal gene transfer; ortholog; out-paralog; phylogenetic analysis; prokaryote

PMID:
26782935
PMCID:
PMC4779612
DOI:
10.1093/gbe/evw005
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center