From: Koonin, Eugene (NIH/NLM/NCBI) [E] Sent: Friday, December 30, 2005 4:45 PM To: NLM/NCBI List ncbi-seminar Cc: Miklós Csürös Subject: seminar Wednesday, Jan 4, 2006, 11 AM Bldg. 38A, B2 conference room Miklós Csürös University of Montreal, Quebec, Canada Probabilistic gain-loss models for intron and gene content evolution I will describe two probabilistic models with the accompanying likelihood algorithms for the analysis of intron evolution and the evolution of gene family size. Recently completed genome sequences have been used for comprehensive analyses of exon-intron organization in orthologous genes of diverse organisms. Large sets of intron presence-absence data require rigorous theoretical frameworks in which different hypotheses can be compared and validated. I describe a probabilistic model for intron gains and losses along an evolutionary tree. The model parameters are estimated using maximum likelihood. I propose a method for estimating the number of introns lost or unobserved in all extant organisms in a study, and show how to calculate counts of intron gains and losses along the branches by using posterior probabilities. The methods are used to analyze the most comprehensive intron data set available presently, consisting of 7236 intron sites from eight eukaryotic organisms. The analysis shows a dynamic history with frequent intron losses and gains, and fairly --- albeit not as greatly as previously postulated --- intron-rich ancestral organisms. I introduce a Markov model for the evolution of a gene family along a phylogeny. This work represents the first tractable probabilistic model that simulatenously handles the three main mechanisms that shape gene content: horizontal gene transfer, gene duplication, and gene loss. (The model assumes a linear birth-death-immigration process along each edge.) The likelihood for the changes in the size of a gene family across different organisms can be calculated very efficiently, in quadratic time in the number of all homologs. I illustrate the model by an application to the evolution of gene content in Preoteobacteria using the COG (Clusters of Orthologous Groups) database. References: M. Csuros. "Likely scenarios of intron evolution", 3rd RECOMB Satellite Workshop on Comparative Genomics, DOI: http://dx.doi.org/10.1007/11554714_5. http://www.iro.umontreal.ca/~csuros/papers/mle-introns.pdf. M. Csuros and I. Miklos. "A probabilistic model for gene content evolution with duplication, loss, and horizontal transfer." To appear at RECOMB 2006 (Conf. Research in Computational Molecular Biology). http://www.iro.umontreal.ca/~csuros/papers/gld.pdf. Eugene V. Koonin, PhD, Senior Investigator National Center for Biotechnology Information National Library of Medicine National Institutes of Health Bethesda, MD 20894 Telephone 301-435-5913 (office); 301-233-7294 (cellular) Fax 301-435-7794