Format

Send to

Choose Destination
BMC Bioinformatics. 2018 Jan 30;19(1):26. doi: 10.1186/s12859-018-2026-4.

Inferring synteny between genome assemblies: a systematic evaluation.

Liu D1,2, Hunt M3,4, Tsai IJ5,6.

Author information

1
Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, Taipei, Taiwan.
2
Biodiversity Research Center, Academia Sinica, Taipei, Taiwan.
3
Nuffield Department of Clinical Medicine, Experimental Medicine Division, John Radcliffe Hospital, University of Oxford, Oxford, OX1 1NF, UK.
4
European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
5
Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, Taipei, Taiwan. ijtsai@gate.sinica.edu.tw.
6
Biodiversity Research Center, Academia Sinica, Taipei, Taiwan. ijtsai@gate.sinica.edu.tw.

Abstract

BACKGROUND:

Genome assemblies across all domains of life are being produced routinely. Initial analysis of a new genome usually includes annotation and comparative genomics. Synteny provides a framework in which conservation of homologous genes and gene order is identified between genomes of different species. The availability of human and mouse genomes paved the way for algorithm development in large-scale synteny mapping, which eventually became an integral part of comparative genomics. Synteny analysis is regularly performed on assembled sequences that are fragmented, neglecting the fact that most methods were developed using complete genomes. It is unknown to what extent draft assemblies lead to errors in such analysis.

RESULTS:

We fragmented genome assemblies of model nematodes to various extents and conducted synteny identification and downstream analysis. We first show that synteny between species can be underestimated up to 40% and find disagreements between popular tools that infer synteny blocks. This inconsistency and further demonstration of erroneous gene ontology enrichment tests raise questions about the robustness of previous synteny analysis when gold standard genome sequences remain limited. In addition, assembly scaffolding using a reference guided approach with a closely related species may result in chimeric scaffolds with inflated assembly metrics if a true evolutionary relationship was overlooked. Annotation quality, however, has minimal effect on synteny if the assembled genome is highly contiguous.

CONCLUSIONS:

Our results show that a minimum N50 of 1 Mb is required for robust downstream synteny analysis, which emphasizes the importance of gold standard genomes to the science community, and should be achieved given the current progress in sequencing technology.

KEYWORDS:

Assembly quality; Comparative genomics; Genome synteny; Nematode genomes

PMID:
29382321
PMCID:
PMC5791376
DOI:
10.1186/s12859-018-2026-4
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center