Logo of narLink to Publisher's site
Nucleic Acids Res. Jul 1, 2010; 38(Web Server issue): W14–W18.
Published online May 3, 2010. doi:  10.1093/nar/gkq321
PMCID: PMC2896128

ALTER: program-oriented conversion of DNA and protein alignments

Abstract

ALTER is an open web-based tool to transform between different multiple sequence alignment formats. The originality of ALTER lies in the fact that it focuses on the specifications of mainstream alignment and analysis programs rather than on the conversion among more or less specific formats. In addition, ALTER is capable of identify and remove identical sequences during the transformation process. Besides its user-friendly environment, ALTER allows access to its functionalities in a programmatic way through a Representational State Transfer web service. ALTER’s front-end and its API are freely available at http://sing.ei.uvigo.es/ALTER/ and http://sing.ei.uvigo.es/ALTER/api/, respectively.

INTRODUCTION

Multiple sequence alignments (MSAs) are at the core of many bioinformatic analyses that benefit from the comparison of genomic sequences, from phylogenetic reconstruction to functional prediction (1,2). MSAs can be stored in a large variety of formats (e.g. FASTA, PIR, PHYLIP, NEXUS, etc.), and very often, researchers are obligated to transform between these in order to use different tools. Some conversion utilities have been extremely useful in this regard, the most popular being ReadSeq (http://iubio.bio.indiana.edu/soft/molbio/readseq/java/). Indeed, there are other tools developed mainly for other purposes that can also import and export aligments in several formats, like ReadAl/TrimAl (3), SeaView (4), Se-Al (http://tree.bio.ed.ac.uk/software/seal/) or even ClustalX2 (5), among others. Moreover, projects like BioPython (6) or BioPerl (7) also offer conversion capabilities.

However, the problem with most of these converters is that they—logically—focus on more or less flexible format specifications that are often violated by both developers and users. In fact, during the last years MSA’s formats have ‘evolved’ very much like the sequences they contain, with mutational events consisting of long names, extra spaces, additional carriage returns, etc. Thus, different applications often require or produce particular MSA formats that in fact do not completely fulfill the requirements of the ‘canonical’ formats, often complicating the use of different tools for the analysis of data. For example, ReadSeq and programs like PAML (8) or PAUP* (http://paup.csit.fsu.edu/) fail to read simple alignments produced by ClustalX2 in PHYLIP format. To alleviate these kind of problems, we introduce a web server called ALTER for the program-oriented—rather than format-oriented—conversion between DNA and protein MSA formats. ALTER is free and open to all and there is no login requirement.

FUNCIONALITY

ALTER was designed to accomplish two main objectives: (i) easily convert between MSA formats used by popular tools and (ii) collapse sequences to haplotypes (unique sequences). In order to perform these operations in an intuitive way, ALTER implements a straightforward workflow that easily guides the user through a four-step wizard in which the different options are automatically activated when the required information is available. In addition, ALTER provides an easy-to-follow on-line help as well as many sample MSA data for testing purposes.

Program workflow

The use of ALTER typically implies four simple steps: (i) format/program identification, (ii) data load, (iii) definition of conversion parameters and (iv) storage of the generated file (Figure 1).

Figure 1.
Schematic ALTER workflow. The user can select between different input alignment programs and formats, and obtain a MSA specifically formatted for a particular program.

The process of converting a given MSA in ALTER starts with the selection of the source program and/or the current format. If the user is not confident about this information, the server can try to auto detect the format of the input file.

Next, the user has to specify the operating system (OS) under which the input file was generated and upload it, or alternatively directly paste the data. In order to process the input MSA, ALTER first instantiates an appropriate sequence reader for both the input format and program. For each program/format pair, there is a specific parser generated from a formal grammar via JavaCC technology. Regardless of the possibility to reuse grammars among programs that utilize the same format, ALTER has been designed to be able to associate a different grammar for each program/format pair in order to tackle potential differences. If the user has selected the ‘auto detect’ option, a program-independent grammar is used instead. If there are syntax errors on the input sequences, the parser reports precise information about them and the process aborts.

Once the input MSA has been successfully read, ALTER can perform an optional step to identify redundant sequences and collapse them into haplotypes. Finally, an appropriate writer for the output program/format/OS is instantiated in order to generate the converted MSA, taking into account different parameters. These allow the user to (i) generate sequential or interleaved sequences (in NEXUS and PHYLIP formats), (ii) use lower case for residues, (iii) use match characters (‘.’) to indicate that the same residue is located at the same position of the first sequence and (iii) generate the sum of the number of residues at each sequence line (ALN format). In addition, the collapsing step can be configured to (i) treat gaps as missing data, (ii) consider missing data as differences between sequences and (iii) define a maximum limit of differences to collapse sequences. It is also possible to generate a program-independent conversion using only the canonical format specification.

Every time a new conversion job finished without errors, the output file is displayed and a download button is activated. All the relevant information related to the process of loading and recognizing the input MSA is automatically categorized (info, error, warning) and displayed to the final user by using informative log panels (Figure 2).

Figure 2.
Example of a MSA conversion in ALTER. The ‘Info panel’ in the log area shows information related with the process carried out. Help, support for feedback and contact information options are available from the upper left area. Source code ...

Supported MSA formats/programs

ALTER supports a variety of specific MSA formats provided by popular alignment tools and accepted by a variety of analysis programs. Currently, the focus is on molecular evolution, but different tools can be easily added on request. The list of programs supported include alignment, alignment filtering, sequence edition, model selection, phylogenetic, network and population genetics software (Table 1).

Table 1.
List programs/formats supported by ALTER

Web services

In addition to the functionality provided by the end user front-end, ALTER also implements a web service that allows developers to transform multiple alignment sequences directly in ALTER within their own algorithms and programs (http://sing.ei.uvigo.es/ALTER/api/). Essentially, ALTER’s API offers a unique convert function with multiple parameters plus some metadata functions giving information about the formats and options currently supported. Table 2 summarizes the API functionality.

Table 2.
Core functionality provided by ALTER’s RESTful API

Supported platforms

ALTER runs on a standard Tomcat 5.5 Web application server. Currently, ALTER has been successfully tested in Internet Explorer 7, Firefox 3, Opera 9.62 and Safari 3 browsers working on Windows XP/Vista, Ubuntu Linux 8.04 version and Mac OSX 10.5 of Intel architecture.

IMPLEMENTATION

ALTER is implemented as an AJAX-enabled web application programmed in the J2SE 1.5 Java language. The ZK development framework (http://www.zkoss.org) was used to construct the user interface and to give support to JavaCC for parsing input MSA. JavaCC is a parser and a lexical analyzer generator, that is, it reads a formal description of a language (grammar) and generates code to parse instances of it. It can be see as the Java counterpart of the Lex/Flex and Yacc/Bison tools. Using JavaCC it is possible to (i) isolate the specific sequence format description in independent grammar files and (ii) generate precise error messages during parsing (9).

ALTER also implements a REST-based programming interface. Like any RESTful web service, operations are performed via web queries with a well-defined URL structure. Currently, the server gives access to the main sequence conversion functionality as well as to a set of reflective functions intended to get updated information about the supported programs and formats. This server module was implemented following the JAX-RS 1.0 (Java API for RESTful Web Services) by using the implementation found in the Apache CXF library.

CONCLUSIONS

Current MSA conversion tools understandably focus on the translation among ‘canonical’ formats, but in many instances are not of much help for users, which are interested in working with particular programs that use idiosyncratic format variations. In order to alleviate this drawback, we introduce a web server called ALTER for the program-oriented—rather than format-oriented—conversion between different DNA and protein MSA formats. In addition, ALTER is able to ‘collapse’ sequences to haplotypes—unique sequences—indicating which sequence corresponds to which haplotype. Eliminating this redundancy can be very helpful, for example, to speed up phylogenetic analyses.

FUNDING

European Research Council (ERC-2007-Stg 203161-PHYGENOM to D.P.); Spanish Ministry of Science and Education (BFU2009-08611 to D.P.); Xunta de Galicia (PGIDIT07PXIB310202PR to D.P.); INBIOMED initiative, Angeles Alvariño fellowship (to D.G-P.); University of Vigo (09VIB10 to F.F-.R.). Funding for open access charge: European Research Council (ERC-2007-Stg 203161-PHYGENOM to D.P.).

Conflict of interest statement. None declared.

Supplementary Material

[Supplementary Data]

ACKNOWLEDGEMENTS

The authors want to thank all the beta testers, especially those from the Bioinformatics and Molecular Evolution group at the University of Vigo.

REFERENCES

1. Posada D, editor. Bioinformatics for DNA sequence analysis. New York, NY, USA: Humana Press; 2009.
2. Kemena C, Notredame C. Upcoming challenges for multiple sequence alignment methods in the high-throughput era. Bioinformatics. 2009;25:2455–2465. [PMC free article] [PubMed]
3. Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–1973. [PMC free article] [PubMed]
4. Gouy M, Guindon S, Gascuel O. SeaView version 4: a multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol. Biol. Evol. 2010;27:221–224. [PubMed]
5. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, et al. Clustal W and Clustal X version 2.0. Bioinformatics. 2007;23:2947–2948. [PubMed]
6. Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25:1422–1423. [PMC free article] [PubMed]
7. Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H, et al. The Bioperl toolkit: perl modules for the life sciences. Genome Res. 2002;12:1611–1618. [PMC free article] [PubMed]
8. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 2007;24:1586–1591. [PubMed]
9. Metsker SJ. Building Parsers With Java. Boston, MA, USA: Addison-Wesley Professional; 2001.
10. Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–4680. [PMC free article] [PubMed]
11. Katoh K, Kuma K, Toh H, Miyata T. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 2005;33:511–518. [PMC free article] [PubMed]
12. Notredame C, Higgins DG, Heringa J. T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 2000;302:205–217. [PubMed]
13. Edgar RC. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004;5:113. [PMC free article] [PubMed]
14. Do CB, Mahabhashyam MS, Brudno M, Batzoglou S. ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res. 2005;15:330–340. [PMC free article] [PubMed]
15. Castresana J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 2000;17:540–552. [PubMed]
16. Hall TA. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp. Ser. 1999;41:95–98.
17. Posada D. jModelTest: phylogenetic model averaging. Mol. Biol. Evol. 2008;25:1253–1256. [PubMed]
18. Abascal F, Zardoya R, Posada D. ProtTest: selection of best-fit models of protein evolution. Bioinformatics. 2005;21:2104–2105. [PubMed]
19. Tamura K, Dudley J, Nei M, Kumar S. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol. Biol. Evol. 2007;24:1596–1599. [PubMed]
20. Ronquist F, Huelsenbeck JP. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003;19:1572–1574. [PubMed]
21. Swofford DL. PAUP*: Phylogenetic analysis using parsimony (*and Other Methods) 2000 Sunderland, MA, USA.
22. Guindon S, Gascuel O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 2003;52:696–704. [PubMed]
23. Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006;22:2688–2690. [PubMed]
24. Huson DH. SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics. 1998;14:68–73. [PubMed]
25. Clement M, Posada D, Crandall KA. TCS: a computer program to estimate gene genealogies. Mol. Ecol. 2000;9:1657–1659. [PubMed]
26. Librado P, Rozas J. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics. 2009;25:1451–1452. [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...