22166170[PMID] - PMC

Figure 10. From: New developments on the cheminformatics open workflow environment CDK-Taverna.

Configuration panel for the Weka Regression worker: The configuration for a three-layer perceptron neural networks is selected. Each machine learning method consists of a parameter panel for individual configuration.

Andreas Truszkowski, et al. J Cheminform. 2011;3:54-54.

Citation Full text

Figure 1. From: New developments on the cheminformatics open workflow environment CDK-Taverna.

Advanced reaction enumeration features: (left) The Variable RGroup feature allows the definition of chemical groups which can be flexibly attached to predefined atoms. (middle) The Atom Alias feature offers the possibility to define a wild card for preconfigured elements. (right) The Expandable Atom feature enables the definition of freely sizeable rings or aliphatic chains.

Andreas Truszkowski, et al. J Cheminform. 2011;3:54-54.

Citation Full text

Figure 11. From: New developments on the cheminformatics open workflow environment CDK-Taverna.

Diagrams for machine learning results: (upper left) Scatter plot with experimental versus predicted output values. (upper right) Residuals plot with differences between the predicted and experimental output values. (lower left) Experimental output data are plotted over corresponding sorted predicted output data. (lower right) Characteristic quantities of the predicted model.

Andreas Truszkowski, et al. J Cheminform. 2011;3:54-54.

Citation Full text

Figure 2. From: New developments on the cheminformatics open workflow environment CDK-Taverna.

Workflow for reaction enumeration: After loading a generic reaction (IN REACTION, from a MDL RXN file) and two educt lists (IN REACTANTS 1, IN REACTANTS 2, from MDL SD files) the Reaction Enumeratorworker performs the enumeration with the results stored as MDL RXN files. An additional PDF file is created which shows all enumerated reactions in a tabular manner. The results are stored in the output folder determined by the OUT input port.

Andreas Truszkowski, et al. J Cheminform. 2011;3:54-54.

Citation Full text

Figure 9. From: New developments on the cheminformatics open workflow environment CDK-Taverna.

Partitioning into training and test set: A regression dataset is split into a training and a test set which is performed by the Split Dataset Into Train-/Testset. Then a regression model is created by the Weka Regression worker and evaluated by the Evaluate Regression Results as PDF which stores the results in a PDF file. The dataset is read from a XRFF file (IN XRFF). The generated test and training sets are coded as XRFF files and stored on hard disk. The OUT input port determines the result output folder.

Andreas Truszkowski, et al. J Cheminform. 2011;3:54-54.

Citation Full text

Figure 6. From: New developments on the cheminformatics open workflow environment CDK-Taverna.

Genetic algorithm for selection of an optimum reduced set of input vector components: The algorithm starts with a random population in which each chromosome consists of a random distribution of enabled/disabled (on/off) input vector components denoted A₁to A_n(where the number of components with "on" status remains fixed during evolution). This distribution is changed by mutation and cross-over. The fitness of each chromosome is evaluated by the inverse square RMSE. The selection process for each generation is performed by Roulette wheel selection where chromosomes are inherited with probabilities that correspond to their particular fitness.

Andreas Truszkowski, et al. J Cheminform. 2011;3:54-54.

Citation Full text

Figure 3. From: New developments on the cheminformatics open workflow environment CDK-Taverna.

Capabilities of the advanced reaction enumerator: The sketched generic reaction contains three different generic groups labelled X, Y and Z. Group × defines a Variable RGroup which can freely attach to all atoms of the ring. The Atom Alias group labelled Y is a wild card for the elements carbon, oxygen and nitrogen. The Expandable Atom group Z defines a variable ring size: The ring can be expanded by up to two additional carbon atoms. The enumerated products with the small letters a and b originate from multi-match detection.

Andreas Truszkowski, et al. J Cheminform. 2011;3:54-54.

Citation Full text

Figure 7. From: New developments on the cheminformatics open workflow environment CDK-Taverna.

"Leave-One-Out" analysis to estimate the significance of input vector components: The root mean square error (RMSE) rises with an increasing number of discarded components (i.e. a decreasing number of input vector components used for the machine filearning procedure). The relative RMSE shift from step to step may be correlated with the significance of the discarded component. In this case it is shown that the first fifty components do only have a negligible in influence on the machine learning result and thus may be excluded from further analysis.

Andreas Truszkowski, et al. J Cheminform. 2011;3:54-54.

Citation Full text

Figure 8. From: New developments on the cheminformatics open workflow environment CDK-Taverna.

Workflow for "Leave-One-Out" analysis: First a regression dataset is generated from a CSV file with UUID and molecular descriptor input data for each molecule (IN QSAR) and a CSV file containing the UUID of the molecule and the corresponding output (regression) value (IN RTID). Then the Leave-One-Out Attribute Selection worker evaluates the significance of the input components and generates a dataset for each evaluation step. Afterwards the composed datasets are coded as XRFF files. A CSV file with the sequence of discarded input vector components is generated. In addition the results are visualised with a PDF output file. Instead of the Leave-One-Out Attribute Selection worker a GA Attribute Selection worker may be used to determine a minimum molecular descriptor subset with maximum predictability. The results are stored in the output folder determined by the OUT input port.

Andreas Truszkowski, et al. J Cheminform. 2011;3:54-54.

Citation Full text

10.

Figure 5. From: New developments on the cheminformatics open workflow environment CDK-Taverna.

NP-likeness scoring workflow: This workflow take inputs of atom signatures file generated from the user defined natural products library (NP file) as well as synthetics (SM file) and compound libraries (Query file) and score the compound libraries (Query file) for NP-likeness. The higher the score the more is the NP-likeness of a molecule. The Query fragments scorer worker generates score for each compound in the Query file tagged with the corresponding UUID of the compound. Pairs of compound's UUID and score are written out to a text file (Score file) which can also be passed to the Plot Distribution As PDF worker to see the distribution of the score density of the complete query dataset. The Query fragments scorer worker also regenerates structure for every atom signature and tags it with its corresponding fragment score and UUID of the compound to which it belong to. These fragment structures with scores are written out to a SDF file (Fragments SDF), as they are helpful in identifying fragments with high NP-likeness. This workflow can be freely downloaded at http://www.myexperiment.org/workflows/2121.html.

Andreas Truszkowski, et al. J Cheminform. 2011;3:54-54.

Citation Full text

11.

Figure 4. From: New developments on the cheminformatics open workflow environment CDK-Taverna.

Molecule curation and atom signature descriptor generation workflow: The Iterative SDfile Readertakes the Structure-Data File (SDF) of compounds (Input SDF) as input and pass the structures down the workflow for molecule curation and atom signature generation. The number of structures to be read, and pumped down the workflow can be configured (Iterations). As soon as the molecule is read, the Tag Molecules with UUID worker tags the molecule with Universal Unique IDentifier (UUID) to keep track of it during the process. The Molecule connectivity checker worker checks the connectedness of the structure and removes counter ions and disconnected fragments. The Remove sugar groups worker removes linear and ring sugars from the structures. The Curate Strange Elements worker removes structure containing elements other than non-metals. Finally, the Generate Atom Signatures worker generates atom signature for each atom in a curated compound, tagged with the respective UUID of the compound. The generated atom signatures are written out to a text file (signatures file) using the Text File Writer worker. The SDF of compound structures can be written out to a file, after tagging with UUID (Tagged SDFile), and also after any curation step (Curated SDF) using the SDFile Writer worker. This workflow can be freely downloaded at http://www.myexperiment.org/workflows/2120.html.

Andreas Truszkowski, et al. J Cheminform. 2011;3:54-54.

Citation Full text

PMC

Result Filters

Article attributes

Text availability

Publication date

Custom date range

Research Funder

Additional filters

Display Settings:

PMC Full-Text Search Results

Items: 11

Display Settings:

Supplemental Content

Recent activity