![]() | ![]() |
Formats:
|
||||||
Copyright © 2008 The Author(s) Malin: maximum likelihood analysis of intron evolution in eukaryotes 1Department of Computer Science and Operations Research, University of Montréal, Montréal, Québec, Canada and 2Collegium Budapest Institute for Advanced Study, Budapest, Hungary Associate Editor: Martin Bishop Received February 21, 2008; Revised April 14, 2008; Accepted May 6, 2008. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Summary: Malin is a software package for the analysis of eukaryotic gene structure evolution. It provides a graphical user interface for various tasks commonly used to infer the evolution of exon–intron structure in protein-coding orthologs. Implemented tasks include the identification of conserved homologous intron sites in protein alignments, as well as the estimation of ancestral intron content, lineage-specific intron losses and gains. Estimates are computed either with parsimony, or with a probabilistic model that incorporates rate variation across lineages and intron sites. Availability: Malin is available as a stand-alone Java application, as well as an application bundle for MacOS X, at the website http://www.iro.umontreal.ca/~csuros/introns/malin/. The software is distributed under a BSD-style license. Contact: csuros/at/iro.umontreal.ca 1 INTRODUCTION An idiosyncratic feature of eukaryotic gene organization is that the genomic sequences of protein-coding genes are frequently interrupted by non-coding sequences, called introns, which are excised (spliced) from the transcripts prior to translation. Fundamental constituents of the splicing machinery are present throughout main eukaryotic lineages (Collins and Penny, 2005). Intron-containing genes are spread across diverse eukaryotic phyla, and orthologous genes often have similar exon–intron organization even at large evolutionary distances (Rogozin et al., 2003). Accordingly, it is fairly certain that splicing was already present in the last common ancestor of eukaryotes (Rodríguez-Trelles et al., 2006). Gene structures changed to different extents in eukaryotic lineages (Roy and Gilbert, 2006). Whole-genome sequencing projects have made it possible to perform large-scale phylogenetic analyses that scrutinize the evolution of exon–intron organization. Following the pioneering study by Rogozin et al. (2003), numerous results have appeared (Carmel et al., 2005; Carmel et al., 2007; Csűrös, 2005; Csűrös et al., 2007, 2007; Nguyen et al., 2005; Nielsen et al., 2004; Roy and Gilbert, 2005; Roy and Penny, 2006; Stajich et al., 2007; Sullivan et al., 2006) inferring lineage- and gene-specific features of gene structure evolution, and often describing methodological novelties. This note aims to introduce Malin, a software package developed for the analysis of eukaryotic gene structure evolution. 2 FEATURES Malin provides a graphical user interface for various tasks commonly used to infer the evolution of exon–intron structure in multiple protein-coding ortholog sets (Fig. 1
Figure 1 Malin uses a rates-across-sites Markov model for intron evolution, with branch-specific gain and loss rates. If no rate variation is assumed across the sites, then every branch has just a gain and loss rate, with corresponding gain and loss probabilities. Briefly, an intron is lost on an edge of length t with probability where λ and μ are the gain and loss rates; a new intron appears in a previously unoccupied site with probability . The constant rate model (Csűrös et al., 2007) is completely specified by the branch-specific gain/loss rates, and the probability with which intron sites are occupied at the root. The rate variation model (Csűrös et al., 2008) assumes that intron sites belong to discrete rate categories. Each site category is defined by a pair of loss and gain rate factors (α,β), so that the loss rates μα and gain rates λβ apply on each edge with prototypical rates μ and λ. Malin optimizes rate factors, and can analyze the same dataset with different models simultaneously.Malin is written entirely in Java. It can be used on any computer platform with a Java Runtime Environment (implementing J2SE 1.5 or higher), including Microsoft Windows, MacOS X and Linux. In addition, Malin is also available as an integrated application on MacOS X. The software is distributed with test data and a detailed User's Guide. Input files follow commonly used formats: Newick format for the possibly multifurcating species phylogeny, Fasta format for alignments and the syntax used by Rogozin et al. (2003) for intron tables. Intron sites are specified in Fasta headers. Analysis results can be exported into tab-delimited text files. The software implements previously described computational innovations (Csűrös et al., 2007, 2008), including rate optimization, posterior predictions, fast evaluation of the likelihood function and estimation of statistical confidence through bootstrapping. Malin provides a feature-rich graphical user interface for the analysis tasks. Figure 1 Ideally, Malin will enable researchers to conduct phylogenetic gene structure analysis with the same ease that is currently available for molecular sequences. ACKNOWLEDGEMENTS I am grateful to Péter Csűrös for help with the software integration in Microsoft Windows. I am greatly indebted to Liran Carmel, Eugene Koonin, Jacek Majewski, Igor Rogozin and Scott Roy for helpful advice and discussions about intron evolution. Funding: This research project has been supported by a grant from the Natural Sciences and Engineering Research Council of Canada. Conflict of Interest: none declared. REFERENCES
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||
Mol Biol Evol. 2005 Apr; 22(4):1053-66.
[Mol Biol Evol. 2005]Curr Biol. 2003 Sep 2; 13(17):1512-7.
[Curr Biol. 2003]Annu Rev Genet. 2006; 40():47-76.
[Annu Rev Genet. 2006]Nat Rev Genet. 2006 Mar; 7(3):211-21.
[Nat Rev Genet. 2006]Curr Biol. 2003 Sep 2; 13(17):1512-7.
[Curr Biol. 2003]Genome Res. 2007 Jul; 17(7):1034-44.
[Genome Res. 2007]Bioinformatics. 2007 Jul 1; 23(13):i87-96.
[Bioinformatics. 2007]Mol Biol Evol. 2008 May; 25(5):903-11.
[Mol Biol Evol. 2008]PLoS Comput Biol. 2005 Dec; 1(7):e79.
[PLoS Comput Biol. 2005]Brief Bioinform. 2005 Jun; 6(2):118-34.
[Brief Bioinform. 2005]Bioinformatics. 2007 Jul 1; 23(13):i87-96.
[Bioinformatics. 2007]Mol Biol Evol. 2008 May; 25(5):903-11.
[Mol Biol Evol. 2008]Bioinformatics. 2007 Jul 1; 23(13):i87-96.
[Bioinformatics. 2007]Mol Biol Evol. 2008 May; 25(5):903-11.
[Mol Biol Evol. 2008]Curr Biol. 2003 Sep 2; 13(17):1512-7.
[Curr Biol. 2003]Bioinformatics. 2007 Jul 1; 23(13):i87-96.
[Bioinformatics. 2007]Mol Biol Evol. 2008 May; 25(5):903-11.
[Mol Biol Evol. 2008]