Logo of bioinformLink to Publisher's site
Bioinformation. 2008; 2(10): 452–455.
Published online 2008 Jul 31.
PMCID: PMC2561165

A comparison of MSA tools


Multiple sequence alignment (MSA) is essential in phylogenetic, evolutionary and functional analysis. Several MSA tools are available in the literature. Here, we use several MSA tools such as ClustalX, Align-m, T-Coffee, SAGA, ProbCons, MAFFT, MUSCLE and DIALIGN to illustrate comparative phylogenetic trees analysis for two datasets. Results show that there is no single MSA tool that consistently outperforms the rest in producing reliable phylogenetic trees.

Keywords: multiple sequence alignment methods, phylogenetic trees, Robinson-Foulds distance, Neighbor-Joining method


Several multiple sequence alignment (MSA) methods are available in the literature. McClure and colleagues tested the ability of MSA methods to identify short motifs found in four datasets of homologous proteins [1]. Henikoff and Henikoff evaluated the ability of multiple alignments in identifying new family members in database search [2]. Thompson and colleagues presented a systematic analysis and comparison of several alignment programs using the BaliBASE reference alignments as test cases [3]. Despite these comparison studies, choosing an alignment method which produces the nearest phylogenetic test tree (TT) to the reference tree (RT) is still open for discussion. Multiple sequence alignment is a crucial step in phylogenetic analysis essentially for highly divergent data sets (<30% of sequence identity) that are difficult to align. Different methods produce non-identical alignments leading to variation in the constructed phylogenetic trees for a single dataset. Here, we perform the evaluation of eight alignment methods, namely ClustalX 1.81 [4], Align-m 2.3 [5], T-Coffee 3.93 [6], SAGA 0.95 [7], ProbCons 1.08 [8], MAFFT 5.743 [9], MUSCLE 3.6 [10] and DIALIGN 2.2.1 [11] to test their ability to generate similar phylogenetic test trees with respect to reference.



We identified BaliBASE [12] and Homstrad [13] as reference alignment resources. The size of each reference alignment is fixed to more than four sequences.

Dataset #1: DS-BB

We selected 134 reference alignments from BaliBASE. The dataset is thereafter designated as DS-BB and divided into three categories according to the percent sequence identity within the reference alignment. Category 1: BB_10 contains 86 reference alignments at 0-10% sequence identity (ID) between each pair of sequences. Category 2: BB_20 contains 29 reference alignments at 10-20% ID. Category 3: BB_30 contains 19 reference alignments at 20-30% ID.

Dataset #2: DS-HOM

We downloaded 218 reference alignments from Homstrad. We created three different categories for this dataset similar to the first dataset. Category 1: HOM_10 contains 141 reference alignments. Category 2: HOM_20 contains 54 reference alignments. Category 3: HOM_30 contains 23 reference alignments. The dataset is thereafter designated as DS-HOM.

Comparison process

The eight alignment methods are run on DS-BB and DS-HOM datasets using default parameters. Tests were performed on a 1.6-GHz Intel Pentium M with 512 MB RAM. Each method generates a total of 352 test alignments: 134 (from DS-BB) + 218 (from DS-HOM). Thus, a total of 2816 (352*8) test alignments are obtained. The 352 test alignments of each method and the 352 reference alignments are given as input to the Neighbor Joining method described by Saitou and Nei, [14] to estimate phylogenetic TTs and RTs. Each 352 TTs of a given alignment method are compared to the 352 RTs.

The Robinson-Foulds distance (T_dRF) implemented in PAL [15] is used to compare a given phylogenetic TT to its corresponding RT. The T_dRF defines the distance between any two trees as the minimum number of transformations required to obtain the topology of one tree from the topology of the other. This is given by equation 1 in supplementary material. In order to evaluate the performance of each alignment method, we developed a score, namely the dRF(M), which considers only the identical TTs generated by each method compared to RTs. This is given by equation 2 under supplementary material. This score gives the average number of identical TTs produced by each method on each dataset category. High values of dRF(M) signify better performance by a method.

Alignment quality assessment

We used the sum-of-pairs score (SP) implemented in BaliBASE scoring scheme to estimate quality alignment for each method. The SP score determine the extent to which a method succeeds in aligning some or all sequences in the alignment. The aim here is to show if the alignment quality of a given method affects the reliability of its phylogenetic TT.


Biologists use MSA as a first step in phylogenetic analysis. A number of sequence alignment tools are available at the internet. However, the choice of a specific tool for a Biologist who is not an expert in the field of Bioinformatics is not trivial. Many comparison studies of multiple alignment methods are available [1-3]. These study lack arguments on phylogenetic analysis. Here, we evaluated eight MSA tools based on the comparison of their phylogenetic TTs. We use the Robinson-Foulds distance to compare the TTs of each alignment method with respect to the RTs. We derived the dRF(M) metric to estimate the percentage of identical TTs generated by each alignment method on each category of the two datasets used (DS-BB and DS-HOM). Figure 1 gives the variation of dRF(M) scores for all the eight methods used in the analysis. We notice that as sequence identity in each category of DS-BB and DS-HOM datasets is low as the percentage of identical TTs is low. All the methods show similar trends of dRF(M) scores. However, on categories BB_20 and BB_30 of DS-BB dataset, MUSCLE gives higher percentage of identical TTs than all the other methods. MUSCLE performs better on categories HOM_10 and HOM_30 in DS-HOM dataset.

Figure 1
Performance of the eight alignment methods on dRF(M) scores for datasets DS-BB and DS-HOM is given. Line with markers indicates each data value. Values are given in percentage. All the methods show similar results. MUSLCE gives slightly higher performance ...

We performed a Wilcoxon rank test for all pairs of methods (Table 1 under supplementary material) to assess the significance of the differences in the overall Robinson-Foulds distances (T_dRF) between all pairs of test and reference trees. Results suggest that the differences between methods are not statistically significant. Each method produces reliable phylogenetic TTs as those given by ProbCons, which is described by Do and colleagues [16] as the best performing method for generating accurate multiple alignment.

Figure 2 gives the variation of SP scores for all the methods on each category of DS-BB and DS-HOM datasets. It shows that ProbCons achieves the best performance on all the categories of each dataset. The significance in the difference for overall SP scores using the Wilcoxon rank test for all pairs of programs is given in Table 2 (supplementary material). The differences between methods are significant, with ProbCons showing the highest alignment quality. The results given in Table 1 and Table 2 (see supplementary material) suggest that quality alignment of the different methods do not heavily impact on the reliability of their phylogenetic TTs. It should be noted that all of them perform with good TTs as ProbCons.

Figure 2
Performance of the eight alignment methods on SP scores for datasets DS-BB and DS-HOM is given. Line with markers indicates each data value. ProbCons shows higher SP scores than all the other methods on each category of DS-BB and DS-HOM datasets.


A comparison of phylogenetic TTs of eight MSA for three categories of two sequence data sets is discussed. All methods perform equally well in producing reliable phylogenetic TTs. Despite the significant differences in alignments qualities produced by the different methods, the analysis shows that the statistical difference in phylogenetic TTs generated by each method is minimal. Several distances exist to compare trees, such as the Nearest-Neighbor interchange [17]. The application of the metric for large dataset would provide insights on MSA performances in divergent datasets.

Supplementary material

Data 1:


1. McClure MA, et al. Mol Biol Evol. 1994;11:4. [PubMed]
2. Henikoff S, Henikoff JG. Protein Sci. 1997;6:3. [PMC free article] [PubMed]
3. Thompson JD, et al. Bioinformatics. 1999;15:1. [PubMed]
14. Saitou N, Nei M. Mol Biol Evol. 1987;4:4. [PubMed]
15. Drummond A, Strimmer K. Bioinformatics. 2001;17:7. [PubMed]
16. Do CB, et al. Genome Res. 2005;15:2. [PMC free article] [PubMed]
17. Waterman MS, Smith TF. J Theor Biol. 1978;73:4. [PubMed]

Articles from Bioinformation are provided here courtesy of Biomedical Informatics Publishing Group
PubReader format: click here to try


Save items

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...