From: Park, Yonil (NIH/NLM/NCBI) [E] Sent: Thursday, February 12, 2009 5:13 PM To: NLM/NCBI List ncbi-seminar Subject: CBB seminar, Feb. 17, 11AM Time: Tuesday, February 17, 2009, 11AM Location: NCBI Library, B2 floor, Building 38A Speaker: Yonil Park Title: The Problem of Spurious Flanks in Sequence Alignment Pairwise sequence alignment is a ubiquitous tool for learning about the evolution and function of DNA, RNA, and protein sequences. It is therefore essential to avoid spurious alignments, and statistical techniques have been developed to eliminate alignments that wholly arise by chance. However, it is not known how to avoid alignments that are partly spurious, and it is not even known how often partly spurious alignments occur. In this talk, I show that some commonly used alignment scoring schemes are severely prone to over-extension, often by hundreds of bp/aa. For example, spurious extensions likely comprise > 18% of the human-fugu genome alignment in the UCSC genome database. I provide guidance on choosing appropriate scoring schemes, by showing the tradeoff between over-alignment and under-alignment. Finally, I provide a simple over-alignment p-value. The over-alignment p-value can identify spurious alignment flanks and help us to avoid making false inferences about evolution and function from spurious extensions of alignments. FLANK, software for over-alignment parameters estimation, is available at http://www.ncbi.nlm.nih.gov/CBBresearch/Spouge/html/index/software.html# FLANK ---------- Yonil Park NCBI/NLM/NIH Bldg 38A, Room 6N611N 8600 Rockville Pike Bethesda, MD 20894 Voice: 301-402-1438 Fax: 301-480-2288