Format

Send to

Choose Destination
BMC Genomics. 2017 May 18;18(1):390. doi: 10.1186/s12864-017-3771-x.

OrthoFiller: utilising data from multiple species to improve the completeness of genome annotations.

Author information

1
Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK.
2
Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK. steven.kelly@plants.ox.ac.uk.

Abstract

BACKROUND:

Complete and accurate annotation of sequenced genomes is of paramount importance to their utility and analysis. Differences in gene prediction pipelines mean that genome annotations for a species can differ considerably in the quality and quantity of their predicted genes. Furthermore, genes that are present in genome sequences sometimes fail to be detected by computational gene prediction methods. Erroneously unannotated genes can lead to oversights and inaccurate assertions in biological investigations, especially for smaller-scale genome projects, which rely heavily on computational prediction.

RESULTS:

Here we present OrthoFiller, a tool designed to address the problem of finding and adding such missing genes to genome annotations. OrthoFiller leverages information from multiple related species to identify those genes whose existence can be verified through comparison with known gene families, but which have not been predicted. By simulating missing gene annotations in real sequence datasets from both plants and fungi we demonstrate the accuracy and utility of OrthoFiller for finding missing genes and improving genome annotations. Furthermore, we show that applying OrthoFiller to existing "complete" genome annotations can identify and correct substantial numbers of erroneously missing genes in these two sets of species.

CONCLUSIONS:

We show that significant improvements in the completeness of genome annotations can be made by leveraging information from multiple species.

KEYWORDS:

Gene prediction; Genome annotation; Orthogroup; Orthology

PMID:
28521726
PMCID:
PMC5437544
DOI:
10.1186/s12864-017-3771-x
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center