Format

Send to

Choose Destination
Genome Res. 2014 Dec;24(12):2077-89. doi: 10.1101/gr.174920.114. Epub 2014 Oct 1.

Alignathon: a competitive assessment of whole-genome alignment methods.

Author information

1
Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, California 95064, USA; Biomolecular Engineering Department, University of California Santa Cruz, Santa Cruz, California 95064, USA;
2
School of Computer Science, McGill University, Montreal, QC H3A 0G4, Canada;
3
Department of Biology, The Pennsylvania State University, University Park, Pennsylvania 16801, USA;
4
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom;
5
Softberry Inc., Mount Kisco, New York 10549, USA;
6
Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, California 95064, USA;
7
Department of Animal Biotechnology, Konkuk University, Seoul 143-701, Korea;
8
Centre For Genomic Regulation (CRG), 08003 Barcelona, Spain; Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain; Westfalian Wilhelms University, Institute of Evolution and Biodiversity, 48149 Muenster, Germany;
9
Centre For Genomic Regulation (CRG), 08003 Barcelona, Spain; Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain; Institute of Human Genetics (IGH), UPR 1142, CNRS, Montpellier, France;
10
Centre For Genomic Regulation (CRG), 08003 Barcelona, Spain; Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain;
11
Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA;
12
Department of Computer Science, Northern Illinois University, DeKalb, Illinois 60115, USA;
13
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom; The Genome Analysis Centre, Norwich Research Park, Norwich, NR4 7UH, United Kingdom;
14
ithree Institute, University of Technology Sydney, NSW 2007, Australia;
15
Department of Bioengineering and Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Illinois 61801, USA;
16
Department of Computer Science and the Donnelly Centre, University of Toronto, Toronto, ON M5S 3G4, Canada; Centre for Computational Medicine and the Genetics and Genome Biology Program, Hospital for Sick Children, Toronto, ON M5G 1X8, Canada;
17
Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA; Lawrence Berkeley National Laboratory, Berkeley, California 94710, USA;
18
Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, California 95064, USA; Biomolecular Engineering Department, University of California Santa Cruz, Santa Cruz, California 95064, USA; Howard Hughes Medical Institute, Chevy Chase, Maryland 20815-6789, USA.

Abstract

Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.

PMID:
25273068
PMCID:
PMC4248324
DOI:
10.1101/gr.174920.114
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for HighWire Icon for PubMed Central
Loading ...
Support Center