The FAT-CAT pipeline. The FAT-CAT pipeline starts with the submission of a protein sequence and parameter selection and proceeds through family and subtree HMM scoring to ortholog identification and functional annotation. The FAST-CAT variant differs from the default FAT-CAT pipeline in Stage 3 (indicated by red arrows). In Stage 1, the query is scored against family HMMs in the PhyloFacts database for proteins sharing the same multi-domain architecture (MDA) (shown at top) and HMMs constructed for Pfam domains (shown at bottom). Families meeting Stage 1 criteria (E-value and alignment statistics) are passed to Stage 2. In this toy example, PhyloFacts trees for two Pfam domains and a tree for the MDA meet Stage 1 criteria and are passed to Stage 2. In Stage 2, we obtain an approximate phylogenetic placement of the query in each tree by scoring all the HMMs in the tree. The subtree node corresponding to the top-scoring HMM is examined to determine its suitability as a source of orthologs to the query: Stage 2 parameters include the query-subtree HMM score and alignment statistics and whether the subtree appears to be restricted to orthologs. For each top-scoring node that meets these criteria, we identify a (typically larger) enclosing clade supported by one or more orthology methods. Enclosing clades are passed to Stage 3 for ortholog identification. In Stage 3, FAT-CAT and FAST-CAT diverge. FAT-CAT (blue arrows) evaluates the pairwise alignment between the query and each sequence and identifies all supporting evidence supporting the orthology. FAST-CAT (red arrows) avoids much of this computational complexity by using a fast k-tuple comparison to select the most similar sequences from the enclosing clade, constructing an multiple sequence alignment (MSA) including the query using MAFFT, estimating a phylogenetic tree using FastTree, and extracting a subtree of the phylogenetically closest sequences (i.e. based on tree distance to the query). Alignment analysis can then be restricted to this smaller subset based on the multiple sequence alignment. Sequences meeting these criteria are then passed to Stage 4. In Stage 4, we derive a weighted consensus functional annotation for the query based on orthologs selected in Stage 3. Annotations from close orthologs are given higher weight than those from more distant orthologs, and manually curated annotations are given higher weight than those that are derived computationally.