From: Spouge, John (NIH/NLM/NCBI) [E] Sent: Monday, February 13, 2006 2:37 PM To: NLM/NCBI List ncbi-seminar Subject: CBB Seminar 11:00 Tue Feb 14 in B2 Library GLOBAL (GLObal Blocks Aligned Locally) A General Algorithm (with p-value) for Sequence Classification The Needleman-Wunsch (1970) algorithm for global alignment could well be considered the first bioinformatics paper. Despite the obvious importance of global alignment methods, in bioinformatics it is common knowledge that there is no global alignment method with an associated p-value. This talk presents GLOBAL, a general method for combining local alignments into a global alignment. Because GLOBAL is based on local alignments, it is amenable to BLAST heuristics in database searches, which many other methods (e.g., hidden Markov models) are not. The present version of GLOBAL is tailored specifically for use in the CD database, which models protein domains as blocks, separated by gaps of variable length. The blocks represent conserved secondary sequence elements in domains; they are separated by gap regions, which are more mutable and sometimes contain deletions. I present the basis of the GLOBAL p-value calculation, along with simulations indicating the accuracy of the p-value. The present p-value calculation involves a generalization of Staden's method for computing the distribution of scores from position-specific scoring matrices. The generalized Staden's method has some independent interest. At present, early evaluation of GLOBAL against rpsBLAST (reverse-position specific BLAST) and HMMer (Hidden Markov Models) suggests retrieval efficacies are in the order rpsBLAST < GLOBAL < HMMer. There is, however, a great deal of unexplored flexibility in GLOBAL. Maricel Kann will present next week specifically on GLOBAL's retrieval performance. ------------------------------ John L. Spouge | NCBI, NLM, NIH | Building 38A, Room 6N 603 | Lister Hill Center, NLM | Bethesda, Maryland 20894 | Email: spouge@nih.gov | Phone: +1 (301) 402-9310 | Fax: +1 (301) 480-2288 | ------------------------------