CDTree: A software Tool for Analyzing Protein Domains. David Hurwitz and John Jackson June 19, 2007 11am Bldg 38A, B2 library CDTree is a software tool that has been used at the NCBI for the last 4+ years and was released to the public a half year ago. It is used for analyzing and classifying protein domains and has been an important tool in helping to create NCBI’s Conserved Domain Database (CDD). CDD is a public resource of ancient conserved protein domains. While the number of known proteins is already in the millions, the number of domain families is estimated to be around 10,000 or less. The set of domains in CDD is large enough to account for most protein structure and function, but small enough to allow inspection, editing, and annotation of each domain. In this sense, the CDD project is an attempt to create a comprehensive database of protein structure and function. The domains in CDD are organized into hierarchies of CDs related by common evolutionary descent. This gives a picture of the evolutionary history of each domain. Domains are stored as multiple-sequence alignments whose profiles can be searched. Investigators who search the CDD database can learn about their proteins by inference from the annotated domains and about related CDs from the CD hierarchy. CDTree has many features which support the analysis and classification of protein domains. It allows users to edit individual CDs, recruit new sequences to a CD, and visualize the data in a CD. It also allows users to inspect relationships between CDs. Among the tools in CDTree, users can create phylogenetic sequence trees, taxonomy trees, and domain architecture graphs. CDTree also serves as a helpful front-end to the BLAST server. In this talk we will highlight many of the features of CDTree through a discussion of how to get started with CDTree, and the CD curation process.