CBB seminar Mon, 11 Dec 2000, 11:00 8th floor conference room, Bldg 38A Protein Sequence Clustering. Yuri Wolf. Two approaches to protein sequence clustering, focusing on different neighboring criteria, will be discussed. The quantitative results of lineage-specific family clustering of eukaryotic proteomes will be presented. 1. Clustering with fixed thresholds (BLASTCLUST). Defines neighboring relationships via thresholds on similarity degree and extent, then performs single-linkage clustering. Used to make "non-redundant" sequence sets and to obtain wide clusters of similarly organized proteins. 2. Clustering of lineage-specific families. Consists of two stages. On the first (accretion) stage sequences belonging to a given lineage are clustered by a single-linkage algorithm on the basis of comparison of similarity within the taxonomic lineage to that outside the lineage. On the second (refinement) stage best "alien" hits are added to the original clusters and "native"-only subclusters are found using UPGMA method. Used to obtain sets of lineage-specific paralogs for evolutionary studies.