![]() |
| PubMed Abstract Plus Whole Genome Shotgun Growth New BLAST View Options New Genome Builds–Map Viewer New Organisms in UniGene RefSeq Release 22 GenBank Release 158 NCBI Courses Submissions Corner PubChem Grows to 15 Million Masthead |
CDTree Displays CDs The most basic use of CDTree is to view one of the pre-defined CD hierarchies accessible on the summary page for a curated CD record that, for example, can be found at
Figure 1 shows several CDTree and Cn3D 4.2 displays for the CD hierarchy (Figure 1 a) of the nucleotidyl transferase superfamily (cd02156). Sequence and Taxonomy Tree views of one child CD of this hierarchy, the catalytic core of the arginyl-tRNA synthetase (ArgRS_core, cd00671), are shown in Figure 1 b & c.
Click on image for larger view Figure 1. Hierarchy display for the CD family containing cd00671, the catalytic core domain of the arginyl-tRNA synthetase. Clicking the "Interactive Display with CDTree" button provides the option of loading either the single domain (cd00671) or the entire hierarchy into CDTree. Selected displays in CDTree and Cn3D 4.2 for the CD family containing cd00671. a) the Indent Tree display, showing the CD hierarchy with cd00671 selected; b) the Sequence Tree display, showing a neighbor joining tree for cd00671 with one branch highlighted; c) the Taxonomy Tree display, showing the taxonomic nodes represented by the sequences in cd00671; d) the Cross Hits viewer, showing the distribution of best BLAST hits for the six immediate child domains of the overall parent, cd00802; e) Alignment Viewer in Cn3D 4.2, showing the cd00671 alignment with the active site residues highlighted in yellow; f) Structure Viewer in Cn3D 4.2 showing the two aligned structures in cd00671 with the active site residues shown in yellow and the bound arginine in green. The red highlighting of the 17 sequences selected in panel b is automatically propagated to panels a, c and d. CDTree analyzes CD records When launched from a CD web page, the initial CDTree views provide straightforward taxonomic and phylogenetic analysis through a variety of selection commands that highlight subsets of sequences. Highlighted sequences become red, and this highlight automatically propagates to the same sequences in all open viewers. In Figure 1, one branch of the phylogenetic tree was selected by clicking on the appropriate tree node, highlighting the corresponding sequences in the Indent Tree and Taxonomy Trees. The transferred highlights show that all of these sequences are from eubacteria and represent 17 of the 43 total sequences in the alignment. Other analysis tools include the Cross Hits viewer, the CDART viewer, and the CDTree Validator. The Cross Hits viewer shows at a glance the quality of the hierarchy itself. Each CD is represented by a pie chart where the colored wedges indicate the proportion of sequences in that CD that has the best BLAST match to the CD that corresponds to that color. Thus, a well defined hierarchy will look much like the one in Figure 1 D, where each pie for a child CD contains only its own color, while the parent contains representatives for each of its children. The CDART viewer functions much like the existing CDART web service, showing the domain architectures of sequences in selected CDs. It allows, for example, the selection of sequences based on a shared architecture. The CDTree Validator performs several consistency checks on the alignments in a CD hierarchy, requiring that child CDs maintain the alignment of the parent and that no sequence violates the block alignment model of the CD. CDTree updates CD records Once a CD has been loaded, CDTree can automatically update the alignment with new sequences using Position Specific Iterative BLAST (PSI-BLAST) or standard protein-protein BLAST. After retrieving the BLAST results from NCBI, CDTree distributes the sequences to their best matching CDs, and these new sequences become "pending" rows, in contrast to the "aligned" rows already part of the block model. CDTree uses Cn3D 4.2 to View and Edit CD Alignments A mouse click on the double-helix icon next to any CD in the Indent Tree window launches Cn3D 4.2, where the alignment can be viewed and edited (Figure 1E). This function is available even for alignments that have no 3D structures. If the alignment does contain structural data, then these will be shown as a template to help guide the alignment process (Figure 1 f). Functional features annotated by CDD curators can also be highlighted on both the alignment and structures, as is the active site in Figure 1 E & F. If the CD contains pending sequences, these will be placed in the Imports window in Cn3D, where they can be aligned to the CD block model using any of the alignment algorithms in Cn3D. After this is complete, the Cn3D Save function exports this new alignment back into CDTree. By continuing this cycle, an entire hierarchy can be updated, one CD at a time. The colored box to the right provides a list and description of several new features of Cn3D 4.2 in addition to those related to the CDTree functions mentioned above. More information on Cn3D is available on the Homepage for the program. CDTree creates CD records and CD hierarchies One of the most powerful features of CDTree is that it can create de novo CD records, either as the beginning of a new hierarchy or a new child in an existing hierarchy. Quite simply, any group of sequences can be selected and assigned to a new CD, which then can be assigned a parent if appropriate. Alternatively, CDTree can start with a single protein sequence, run a BLAST search, and use the resulting alignment as the nucleus of a new CD record. A utility named "fa2cd" that comes bundled with CDTree can create CDs from legacy alignment data in FASTA format, which can then be imported into CDTree as well. The combination of CDTree and Cn3D 4.2 provides a powerful software platform that integrates literature, sequence, taxonomic, phylogenetic, functional, and structural data for a protein domain in an interlinked set of views, allowing researchers both to visualize these data more fully and to identify new domains and domain features in poorly annotated protein families.
|
|||||||
Selected New Features in Cn3D 4.2 Stereoview - Cn3D 4.2 can display customizable stereo images PSSM Export - Cn3D 4.2 can export the Position Specific Score Matrix (PSSM) calculated for the current alignment in NCBI scoremat format, which can be added to an RPS-BLAST database or used to begin a PSI-BLAST search. New Alignment Algorithms - Cn3D 4.2 now can apply the Block Aligner to the entire set of pending sequences at once, and can also align a pending sequence using BLAST to the most similar sequence in the alignment, rather than the master sequence. Cn3D 4.2 also contains a new alignment refiner that sequentially realigns each row in the existing alignment. New Sequence Conservation Coloring Options - Cn3D 4.2 offers three new coloring options to assist in analyzing the quality of a block alignment model: —Block Fit - colors all blocks across the entire alignment by how well they fit to the PSSM —Normalized Block Fit - like Block Fit except that the colors are computed separately for each block, thereby showing which sequences have the best/worst scores for each block —Block Row Fit - like Block Fit except that colors are computed separately for each row, thereby showing which block has the best/worst score for each sequence Improved Highlighting - Cn3D 4.2 can highlight by blocks, can extend existing highlights to aligned columns, and can restrict existing highlights to a single row. Highlights can also be cached, or saved, and then can be recalled later. Users may appreciate that clicking in the white space of the sequence window no longer removes the highlighting, thereby saving some frustration! Finally, when launched from within CDTree, Cn3D 4.2 can exchange highlight messages with CDTree. For example, clicking on the double-helix icon next to a CD in the Indent Tree window will send highlights from CDTree to Cn3D, if that CD has been opened with the Cn3D viewer already. |
||||||||
|
||||||||