Send to

Choose Destination
J Chem Inf Model. 2014 Dec 22;54(12):3417-38. doi: 10.1021/ci5003922. Epub 2014 Dec 1.

CLCA: maximum common molecular substructure queries within the MetRxn database.

Author information

The Huck Institutes of the Life Sciences, Pennsylvania State University , University Park, Pennsylvania 16802, United States.


The challenge of automatically identifying the preserved molecular moieties in a chemical reaction is referred to as the atom mapping problem. Reaction atom maps provide the ability to locate the fate of individual atoms across an entire metabolic network. Atom maps are used to track atoms in isotope labeling experiments for metabolic flux elucidation, trace novel biosynthetic routes to a target compound, and contrast entire pathways for structural homology. However, rapid computation of the reaction atom mappings remains elusive despite significant research. We present a novel substructure search algorithm, canonical labeling for clique approximation (CLCA), with polynomial run-time complexity to quickly generate atom maps for all the reactions present in MetRxn. CLCA uses number theory (i.e., prime factorization) to generate canonical labels or unique IDs and identify a bijection between the vertices (atoms) of two distinct molecular graphs. CLCA utilizes molecular graphs generated by combining atomistic information on reactions and metabolites from 112 metabolic models and 8 metabolic databases. CLCA offers improvements in run time, accuracy, and memory utilization over existing heuristic and combinatorial maximum common substructure (MCS) search algorithms. We provide detailed examples on the various advantages as well as failure modes of CLCA over existing algorithms.

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for American Chemical Society
Loading ...
Support Center