Format

Send to

Choose Destination
PLoS One. 2014 Aug 19;9(8):e105015. doi: 10.1371/journal.pone.0105015. eCollection 2014.

Orthology detection combining clustering and synteny for very large datasets.

Author information

1
Institut für Pharmazeutische Chemie, Philipps-Universität Marburg, Marburg, Germany.
2
Bioinformatics Group, Department of Computer Science, Universität Leipzig, Leipzig, Germany; Interdisciplinary Center for Bioinformatics, Universität Leipzig, Leipzig, Germany; Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany; Departamento de Ciência da Computação, Instituto de Ciências Exatas, Universidade de Brasília, Brasília, Brasil.
3
Genome Informatics, Faculty of Technology, Bielefeld University, Bielefeld, Germany; Institute for Bioinformatics, Center for Biotechnology, Bielefeld University, Bielefeld, Germany.
4
Faculty of Mathematics and Computer Science University of Leipzig, Leipzig, Germany.
5
Computational EvoDevo Group, Department of Computer Science, Universität Leipzig, Leipzig, Germany.
6
Bioinformatics Group, Department of Computer Science, Universität Leipzig, Leipzig, Germany; Interdisciplinary Center for Bioinformatics, Universität Leipzig, Leipzig, Germany; Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany; Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria; Center for non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark; The Santa Fe Institute, Santa Fe, New Mexico, United States of America; RNomics Group, Fraunhofer Institut for Cell Therapy and Immunology, Leipzig, Germany.

Abstract

The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.

PMID:
25137074
PMCID:
PMC4138177
DOI:
10.1371/journal.pone.0105015
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Public Library of Science Icon for PubMed Central
Loading ...
Support Center