Format

Send to

Choose Destination
Bioinformatics. 2018 Jul 15;34(14):2490-2492. doi: 10.1093/bioinformatics/bty121.

Parallelization of MAFFT for large-scale multiple sequence alignments.

Nakamura T1,2, Yamada KD2,3, Tomii K1,2,4,5, Katoh K2,6.

Author information

1
Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Chiba, Japan.
2
Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan.
3
Graduate School of Information Sciences, Tohoku University, Sendai, Japan.
4
Biotechnology Research Institute for Drug Discovery (BRD), AIST, Tokyo, Japan.
5
AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory (RWBC-OIL), Tokyo, Japan.
6
Research Institute for Microbial Diseases, Osaka University, Suita, Japan.

Abstract

Summary:

We report an update for the MAFFT multiple sequence alignment program to enable parallel calculation of large numbers of sequences. The G-INS-1 option of MAFFT was recently reported to have higher accuracy than other methods for large data, but this method has been impractical for most large-scale analyses, due to the requirement of large computational resources. We introduce a scalable variant, G-large-INS-1, which has equivalent accuracy to G-INS-1 and is applicable to 50 000 or more sequences.

Availability and implementation:

This feature is available in MAFFT versions 7.355 or later at https://mafft.cbrc.jp/alignment/software/mpi.html.

Supplementary information:

Supplementary data are available at Bioinformatics online.

PMID:
29506019
PMCID:
PMC6041967
DOI:
10.1093/bioinformatics/bty121
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center