Scheduled Seminars on 10/27/2009

David Hurwitz at 11:00  Edit  Delete

An Update on CDTree
The Conserved Domain Database (CDD) is an NCBI resource composed of ancient, conserved protein domains (CDs), edited and annotated by in-house biological experts to identify and record structurally and functionally important regions of protein domains. The CDs are accompanied by written summaries, and links to Entrez PubMed and Entrez Bookshelf for access to research papers, chapters, or figures in textbooks, relevant to the domain. This allows investigators who search the CDD database with novel protein sequences, perhaps even proteins of unknown structure and function, to learn about their proteins by inference from the annotated domains.

CDTree is a software application that is used by curators of CDD to organize, edit, and annotate the collection of CDs in CDD. In this talk, I will discuss new tools in CDTree that automate some of the tasks of curating a protein domain. In particular, I will discuss a fast refiner, tailored for the block model of a CD, that improves the multiple-sequence-alignment of the models. I will present some results comparing the refiner to an earlier version, and discuss its effectiveness at bringing alignments into conformity with structural alignments in VAST. This work suggests some new directions for more automated tools that will reduce the work of CDD curators from some of the more mundane steps of curating CDs. Another new tool I will mention is a method for automatic detection of sequence outliers that may not be suitable for a CD.


Schedule Another Seminar on 10/27/2009