 Research
 Open Access
 Published:
Atom mapping with constraint programming
Algorithms for Molecular Biology volume 9, Article number: 23 (2014)
Abstract
Chemical reactions are rearrangements of chemical bonds. Each atom in an educt molecule thus appears again in a specific position of one of the reaction products. This bijection between educt and product atoms is not reported by chemical reaction databases, however, so that the “Atom Mapping Problem” of finding this bijection is left as an important computational task for many practical applications in computational chemistry and systems biology. Elementary chemical reactions feature a cyclic imaginary transition state (ITS) that imposes additional restrictions on the bijection between educt and product atoms that are not taken into account by previous approaches. We demonstrate that Constraint Programming is wellsuited to solving the Atom Mapping Problem in this setting. The performance of our approach is evaluated for a manually curated subset of chemical reactions from the KEGG database featuring various ITS cycle layouts and reaction mechanisms.
Background
A chemical reaction describes the transformation of a set of educt molecules into a set of products. In this process, chemical bonds are rearranged, while the atom types remain unchanged. Thus, there is a onetoone correspondence, the socalled atom map (or atomatom mapping), between the atoms of educts and products. Atom maps convey the complete information necessary to disentangle the mechanism, i.e., the bond rearrangement, of a chemical reaction because they unambiguously identify the bonds that differ between educt and product molecules. The changing parts of the molecules are described by a so called imaginary transition state (ITS) [1],[2] that allows, for instance, a classification of chemical reactions [3][5]. Atom maps are a necessary requisite for computational studies of an organism’s metabolism. For instance, they allow for consistency checks within metabolic pathway analyses [6] and play a key role in the global analysis of metabolic networks [7],[8]. Practical applications include, for example, the tracing or design of the metabolic break down of a candidate drug, which constitutes an important issue in drug design studies [9].
Only the product and educt molecules involved in a chemical reaction are directly observable. The atom map therefore often remains unknown and has to be inferred from partial knowledge. Experimental evidence may be available from isotope labeling experiments. Here, special isotopes, i.e. atoms with special variations, are introduced into educt molecules that can then be identified in product molecules by means of spectroscopy techniques [10]. Such data, however, is not available for most reactions. The complete experimental determination of an atom map is in general a complex and tedious endeavor. Reaction databases, such as KEGG, therefore do not generally provide atom maps. The computational construction of atom maps is therefore an important practical problem in chemoinformatics [11].
Several computational approaches for this problem have been developed over the last three decades (for a recent review see [12]). The educts and products are described as two not necessarily connected labeled graphs I and O, respectively. Vertex labels define atom types, while edge labels indicate bond types. The atom map is then determined as the solution of a combinatorial optimization problem resulting in a bijective mapping of all vertices of the educt molecule graph to corresponding vertices in the product molecule graphs. An illustration of a DielsAlder reaction is given in Figure 1.
The most common formulations are variants of the maximum common subgraph (isomorphism) problem [13]. Already the earliest approaches analyzed the adjacency information within educts and products [14],[15]. The Principle of Minimal Chemical Distance, which is equivalent to minimizing an edge edit distance, was invoked in [16], using a branch and bound approach to solve the corresponding combinatorial optimization problem. Maximum Common Edge Subgraph (MCES) algorithms search for isomorphic subgraphs of the educt/product graphs with maximum number of edges [4],[17][20], an NPhard problem. Furthermore, the use of specialized energetic [21],[22] or weighting [23] criteria allows for the identification of the static parts of the reaction and, subsequently, of the atom mapping. A detailed investigation of the MCES from an Integer Linear Programming (ILP) perspective can be found in [24].
Akutsu [25] showed that the MCES approach fails for certain reactions. As an alternative, the Maximum Common Induced Subgraph (MCIS) problem was proposed as a remedy. This problem is also NP complete. Approximation results can be found in [26]. Algorithms for the MCIS iteratively decompose the molecules until only isomorphic subgraphs remain [7],[25],[27],[28]. Recently, an ILP approach incorporating stereochemistry was presented [29].
Neither the solutions of the MCES nor the MCIS necessarily describe the true atom map. Indeed, both optimality criteria are artificial and can not be derived from basic principles of chemical reactions. In fact, it is not hard to construct counterexamples, i.e., chemical reactions whose true atom maps are neither identified by MCES nor by MCIS. The reorganization of chemical bonds in a chemical reaction is far from arbitrary but follows strict rules that are codified e.g. in the theory of imaginary transition states (ITS) [1],[2]. The ITS encodes the redistribution of bond electrons that occurs along a chemical reaction. Bond electrons define the atomconnecting chemical bonds and their according bond orders. Their redistribution is expressed in terms of the deletion or formation of bonds as well as changes of the oxidation state of atoms, the latter resulting from nonbound electrons that are freed from or integrated into bonds. The ITS can be used to cluster, classify, and annotate chemical reactions [1],[2],[30]. These studies revealed that only a limited number of ITS “layouts” are found among single step reactions and that these layouts represent a cyclic electron redistribution pattern usually involving less than 10 atoms [30]. In a most basic case, an elementary reaction, the broken and newly formed bonds form an alternating cycle (see Figure 1) covering a limited even number of atoms [31], usually less than 8 [2]. In the case of homovalent reactions, i.e., those in which the number of nonbound electron pairs of all atoms (defining their oxidation state) remains unchanged, this cycle is elementary. That is, the transition state is a single, connected even cycle, along which bond orders change by ±1 [30]. This property imposes an additional, strong condition of the atom maps that is not captured by the optimization approaches outlined in the previous paragraphs. Here, we explicitly include it into the specification of the combinatorial problem.
A chemically correct atom map is a bijective map between the vertices of the educt and product graphs such that:

1.
The map preserves atom types

2.
The total bond orders (including lone electron pairs) are preserved. Each broken bond thus must be compensated by a newly formed bond or a change in the oxidation number of an atom.

3.
The broken and newly formed bonds constitute a chemically reasonable imaginary transition state (ITS) following [30]. In the case of elementary chemical reactions, the transition state is an alternating cycle.
A formal definition of the combinatorial problem will be given in the following section. While cyclic transition states are very common, more “complex transition states” appear in nonelementary reactions, i.e., compositions of elementary reactions. Furthermore, even in elementary reactions, it is not true that a shortest ITS cycle is necessarily chemically correct. Empirically, transition states are most frequently sixmembered cycles, while cycles of length 4 or 8 are less abundant [1],[2],[31],[32]. As a consequence, we will consider several variants of the chemical reaction mapping problem:

1.
Decision problem: Is there an atom map with cyclic ITS? Of course one may restrict the question to ITS cycles of length k.

2.
Optimization problem: Find the minimal length k of an ITS cycle that enables an atom map.

3.
Enumeration problem: Find all atom maps with cyclic ITS (of length k).
Given a straightforward encoding of molecular graphs in terms of vertex indices, atom labels, and adjacency information, the atom mapping problem is naturally open to be treated as a constraint satisfaction problem with finite integer domains. This approach is particularly appealing when additional information on the ITS, e.g. its size or atoms involved in the ITS, are known. The theory and model of such a constraintbased atom mapping approach was introduced by us in [33]. This manuscript is an extended version of [33]. Here, we provide a more detailed description of the formalisms and evaluate the performance of the approach on a large reaction data set. The latter was manually curated and compiled to enable a validation of the computational predictions.
Constraint programming formulation of the atom mapping problem
We focus on the identification of the cyclic ITS. Once the ITS has been identified the overall atom mapping is easily derived. We formulate separate constraint satisfaction problems for different ITS layouts and cycle lengths. A fast graph matching approach is used subsequently to extend each ITS to a global atom mapping. In this section we follow closely [33]. We first formally define the problem, which is followed by a description of our constraint programming approach for identifying the cyclic ITS. Finally we discuss how to extend an ITS candidate to a complete atom mapping for the chemical reaction.
Problem definition
Both educts and products of a chemical reaction are each represented by a single, not necessarily connected, undirected graph defined by a set of vertices V and a set of edges E⊆{ {v,v^{′}}  v,v^{′}∈V}. The educt (input) graph is denoted by I=(V_{ I },E_{ I }) and the product (output) graph by O=(V_{ O },E_{ O }). Here, each molecule corresponds to a connected component. Vertices represent atoms and are labeled with the respective atom type accessible via the function l(v∈V_{ I }∪V_{ O }). The principle of mass conservation implies V_{ I }=V_{ O }, i.e. no atom can dissolve or appear during a reaction. Edges encode covalent chemical bonds between atoms. For the CSP formulation we label each edge {x,y}∈E_{ I }∪E_{ O } with the number of shared electron pairs, i.e., its bond order: single, double or triple bonds are represented by a single edge with labels 1, 2, or 3, respectively. Note, this molecule representation ignores stereochemistry, i.e. there is no differentiation between the optimal isomers of chiral molecules. Nonbonding electron pairs of an atom, which define its oxidation state, are represented by self loop edges labeled with the according number of unbound pairs.
We use an adjacency matrix to encode the edge labels of the educt graph (and a corresponding matrix for the products). The matrix elements ${\mathcal{I}}_{v,{v}^{\prime}}$ denote the number of shared bond electron pairs for the edge between the atoms v and v^{′} in the educt graph I. In practice ${\mathcal{I}}_{v,{v}^{\prime}}\in \{0,1,2,3\}$, where 0 means no electrons are shared. Nonbonding electron pairs (loops) are represented by the diagonal entries ${\mathcal{I}}_{v,v}$ and ${\mathcal{O}}_{v,v}$. Now consider a bijective function m:V_{ I }→V_{ O } mapping the vertices of I onto the vertices of O. We can use the mapping inversion m^{−1} to make the indexing of compatible with . This is defined by $\mathcal{I}\circ m$, which is the matrix with x,y entries $={\mathcal{I}}_{{m}^{1}\left(x\right),{m}^{1}\left(y\right)}$, i.e. with rows and columns indexed by V_{ O }. Based on that, we define the reaction matrix${\mathcal{R}}^{m}=\mathcal{O}(\mathcal{I}\circ m)$ as the elementwise matrix subtraction of and the reindexed , which encodes the charge and bond electron differences between educts and products.
Definition. An atom mapping or atom map is a bijection m:V_{ I }→V_{ O } such that

1.
${\forall}_{x\in {V}_{I}}:l\left(x\right)=l\left(m\right(x\left)\right)$ (preservation of atom types)

2.
${\mathcal{R}}^{m}\stackrel{\u20d7}{1}=\stackrel{\u20d7}{0}$ (preservation of bond electrons for each atom)
The reaction matrix ${\mathcal{R}}^{m}$ encodes the imaginary transition state (ITS) [1],[2]. This definition of m is a slightly more formal version of the DugundjiUgi theory [14]. Our notation emphasizes the central role of the (not necessarily unique) bijection m. Since we consider I and O as given fixed input, the atom mapping m uniquely determines ${\mathcal{R}}^{m}$. The triple (m,I,O), furthermore, completely defines the chemical reaction. It therefore makes sense to associate properties of the chemical reaction directly with the atom map m.
Equivalently, the ITS can be represented as a graph R=(V_{ R },E_{ R }) so that E_{ R } consists of the “changing” edges that lose or gain bond electrons during the reaction, i.e. ${\mathcal{I}}_{v,{v}^{\prime}}\ne {\mathcal{O}}_{m\left(v\right),v\left({v}^{\prime}\right)}\iff {\mathcal{R}}_{v,{v}^{\prime}}^{m}\ne 0$. The set of atom vertices V_{ R }⊆V_{ O } covers all vertices with at least one adjacent edge in E_{ R }. Each edge {v,v^{′}}∈E_{ R } is labeled by the electron change ${\mathcal{R}}_{v,{v}^{\prime}}^{m}\ne 0$, i.e. its change in bond order. See the following example adjacency matrices and for the reaction given in Figure 1.
The vertices v_{ i }∈V_{ I } and ${v}_{j}^{\prime}\in {V}_{O}$ are numbered in topdownleftright order of their appearance in Figure 1. The atom mapping $m\left({v}_{i}\right)={v}_{i}^{\prime}$ defines ${\mathcal{R}}^{m}$ and thus the ITS graph R covers only vertices ${v}_{2}^{\prime}$ to ${v}_{7}^{\prime}$ since ${v}_{1}^{\prime}$ and ${v}_{8}^{\prime}$ do not show any bond electron changes.
It is important to note that the existence of an atom mapping m as defined above does not necessarily imply that ${\mathcal{R}}^{m}$ is a chemically plausible ITS.
We say that two edges {v,v^{′}},{v^{′},v^{′′}}∈E_{ R } in R are alternating if ${\mathcal{R}}_{v,{v}^{\prime}}^{m}\ne 0$ and ${\mathcal{R}}_{v,{v}^{\prime}}^{m}+{\mathcal{R}}_{{v}^{\prime},{v}^{\mathrm{\prime \prime}}}^{m}=0$. A simple cycle in R of size k>2 is given by the vertex sequence (v_{1},v_{2},…,v_{ k },v_{1}) with v_{ i }∈V_{ R }, {v_{ i },v_{i+1}}∈E_{ R }, {v_{ k },v_{1}}∈E_{ R }, and ∀i<j≤k:v_{ i }≠v_{ j }. Such a simple cycle is called alternating if all successive edges as well as the cycle closure {v_{2},v_{1}},{v_{1},v_{ k }} are alternating.
Definition. An atom map m is homovalent if ${\mathcal{R}}_{v,v}^{m}=0$ for all v∈V_{ R }. A homovalent reaction is elementary if its ITS R is a simple alternating cycle. Thus ${\mathcal{R}}_{v,{v}^{\prime}}^{m}\in \{k,0,+k\}$ with an absolute bond order change of $k\in {\mathbb{N}}^{+}$ holds for all elementary homovalent reactions.
In the following we outline a novel algorithm for finding atom maps for a given ITS graph R that is guaranteed to retrieve all possible mappings given the educt and product graphs and , respectively. To simplify the presentation, first only elementary homovalent reactions with a bond order change of ±1 are considered. Generalizations are discussed in Section ‘Application and evaluation’.
Constraint programming approach
The central problem to find an elementary homovalent atom mapping is to identify the alternating cycle defining the ITS R given the adjacency information of the educts and products . This can be done via solving the Constraint Satisfaction Problem (CSP) as presented below. Note, due to the alternating edge condition within the ITS, we have to consider cycles with an even number of atoms only. In practice, the ITS of elementary homovalent reactions involves V_{ R }=4, 6, or 8 atoms [31].
Basic CSP formulation
In the following, we will present a first basic CSP for an ITS of size k=V_{ R } that we already introduced in [33]. It is given by the triple (X,D,C) defining the set of variables X, according set of domains D, and the set of constraints C. A solution is an assignment A that maps each variable X_{ i }∈X to a value A_{ i }∈D_{ i } from its domain such that all constraints in C are fulfilled.
We construct an explicit encoding of the ITS atom mapping using k variables representing the cycle in I and another set for the k mapped vertices in O, i.e., $X=\left\{{X}_{1}^{I},\dots ,{X}_{k}^{I}\right\}\cup \left\{{X}_{1}^{O},\dots ,{X}_{k}^{O}\right\}$ with domains ${D}_{i}^{I}={V}_{I}$ and ${D}_{i}^{O}={V}_{O}$. Note, we do not directly encode the overall atom mapping problem but the identification of the two ITS subgraphs in the educts and products. Given this information, the overall atom mapping is easily identified as explained later.
To find a bijective mapping we have to ensure $\forall i\ne j:{X}_{i}^{I}\ne {X}_{j}^{I}$ and $\forall i\ne j:{X}_{i}^{O}\ne {X}_{j}^{O}$, i.e., a distinct assignment of all variables. To enforce atom label preservation we require consistency of labels for ${X}_{i}^{I}$ and ${X}_{i}^{O}$, i.e., an assignment A fulfills $l\left({A}_{i}^{I}\right)=l\left({A}_{i}^{O}\right)$.
Analogously, homovalence is represented by $\left({\mathcal{I}}_{{A}_{i}^{I},{A}_{i}^{I}}\phantom{\rule{2.77626pt}{0ex}}{\mathcal{O}}_{{A}_{i}^{O},{A}_{i}^{O}}\right)=0$. Due to the alternating bond condition, each atom can lose or gain at most one edge during a reaction. Thus, we can further constrain the assignment with $\text{degree}\left({A}_{i}^{I}\right)\text{degree}\left({A}_{i}^{O}\right)\le 1$; here degree(v) denotes the outdegree of vertex v.
Finally, we have to encode the alternating cycle structure of the ITS in the mapping, i.e., for the sequence of bonds with indices 12..k1. For all index pairs within the cycle (i,j) we therefore require pairs with even index i to correspond to the formation of a bond, i.e., we enforce $\left({\mathcal{O}}_{{A}_{i}^{O},{A}_{j}^{O}}{\mathcal{I}}_{{A}_{i}^{I},{A}_{j}^{I}}\right)=1$, while all odd indices i are bond breaking $\left({\mathcal{O}}_{{A}_{i}^{O},{A}_{j}^{O}}{\mathcal{I}}_{{A}_{i}^{I},{A}_{j}^{I}}\right)=1$ accordingly.
The homovalent ITS layout is rotation symmetric in itself (see Figure 2). To partially counter this, we introduce order constraints on the input variables: $\left(\forall i>1:{X}_{1}^{I}<{X}_{i}^{I}\right)$ using e.g. an index order on the vertices. This ties the smallest cycle vertex to the first variable ${X}_{1}^{I}$ and prevents the rotationsymmetric assignments of the input variables. Note, since we constrain the bond (1,2) to be a bond breaking $\left({\mathcal{O}}_{{A}_{1}^{O},{A}_{2}^{O}}{\mathcal{I}}_{{A}_{1}^{I},{A}_{2}^{I}}=1\right)$, the direction of the cycle is fixed and all direction symmetries are excluded as well.
As we will show in the evaluation (Section ‘Application and evaluation’), the basic CSP will produce many ITS candidates that do not extend to an atom mapping over the whole educt and product graphs. Therefore, we introduce an extended version of this CSP that incorporates further constraints derived from the input.
Extended CSP formulation
Investigating the given educt and product graph, we can exclude a large set of symmetric solutions that arise due to an exchange of hydrogens. The latter can form at most one single bond to other atoms. Thus, if a hydrogen participates in the ITS, its adjacent atom will do as well (since the bond is to be broken in the ITS). Most adjacent atoms are nonhydrogens, e.g. carbon atoms, that can have multiple adjacent hydrogens. Since there is exactly one bond breaking and formation for each ITS atom, only one such adjacent hydrogen will be part of the ITS. This results in a combinatorial explosion due to the symmetries of adjacent hydrogen atoms. The latter results from the missing chirality information within the molecular graph encoding (see Problem definition). An example is given in Figure 3. To break this type of symmetry, we select for each nonhydrogen one adjacent “master” hydrogen (e.g. the one with lowest vertex index) and remove all other sibling hydrogens from the domains, both for educt and product variables X^{I} and X^{O}, respectively. The hydrogen vertices to remove are respectively given by ${H}_{\text{rem}}^{I}$ and ${H}_{\text{rem}}^{O}$ based on some vertex ordering ≺. They are defined as ${H}_{\text{rem}}^{I}=\left\{\phantom{\rule{2.77626pt}{0ex}}v\phantom{\rule{2.77626pt}{0ex}}\right\phantom{\rule{2.77626pt}{0ex}}v\in {V}_{I}\wedge l\left(v\right)=\mathtt{\text{H}}\wedge {\exists}_{\{v,{v}^{\ast}\}\in {E}_{I}}\wedge {\exists}_{{v}^{\prime}\ne v\in {V}_{I}}:\left(l\right({v}^{\prime})=\mathtt{\text{H}}\wedge {v}^{\prime}\prec v\wedge \{{v}^{\prime},{v}^{\ast}\}\in {E}_{I})\phantom{\rule{2.77626pt}{0ex}}\}$ and ${H}_{\text{rem}}^{O}$ accordingly. Thus, any assignment A of X^{I} and X^{O} has to fulfill ${A}_{i}^{I}\notin {H}_{\text{rem}}^{I}$ and ${A}_{i}^{O}\notin {H}_{\text{rem}}^{O}$, which is implemented as a domain pruning preprocessing.
Furthermore, we can extend and tune the CSP formulation by comparing the graph structure of educts and products. To this end, we generate the multisets (denoted by 〈…〉) N_{ I } and N_{ O } of local neighborhoods of all atoms (vertices) for the educt and product graph, resp., given by
where N(v) is a tuple of the label of atom vertex v and an encoding of the multiset of all adjacent edges for this vertex. Note, ⊕ denotes string concatenation. N_{ O } is derived accordingly. For example, the neighborhood multisets for the reaction from Figure 1 are
Given the number of occurrences of an element x in a multiset N_{∗} by the multiplicity function ${\mathit{\text{occ}}}_{{N}_{\ast}}\left(x\right)$, the multiset subtraction N_{ I }∖N_{ O } is defined by the occurrence reduction for each element x∈N_{ I } to m a x(0,o c c_{ N I }(x)−o c c_{ N O }(x)). This subtraction N_{ I }∖N_{ O } gives the local neighborhoods that are unique within the educts and thus are part of the ITS, i.e. have to be changed during the reaction. Therefore, we can derive a lower bound on the number of atoms of a certain type that are participating in the ITS. In the example this results in N_{ I }∖N_{ O }=〈 3 ×(C,〈2C〉),(C,〈1N,2C〉) 〉 revealing that at least 4 CCatoms of two neighborhood types (〈2C〉 and 〈1N,2C〉) are ITS members. The neighborhood types are educt/product specific, such that both N_{ I }∖N_{ O } as well as N_{ O }∖N_{ I } are computed.
Given this information, we formulate an extended version of the basic CSP. Here, additional auxiliary node label variables ${X}^{L}=\left\{{X}_{1}^{L},\dots ,{X}_{k}^{L}\right\}$ are introduced, which encode the atom labels still possible for X^{I} assignments, i.e. ${D}_{i}^{L}=\left\{l\left(v\right)\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}v\in {D}_{i}^{I}\right\}$. Next, we derive the multiset of atom labels N^{L} to be present in the ITS with N^{L}=〈l(v)  N(v)∈N_{ I }∖N_{ O }〉. In the example we find N^{L}=〈C,C,C,C〉. To enforce the occurence of these atom labels in the ITS, we add for each each label l with ${\mathit{\text{occ}}}_{N}^{L}l>0$ an according global cardinality (count) constraint on X^{L}. The basic atom label preservation constraint was extended to a ternary constraint that also propagates changes in X^{L} to both X^{I} and X^{O} and vice versa. In addition, we enforce that a valid assignment A^{I} of the ITS variables X^{I} reflects the explicit neighborhood N_{ I }∖N_{ O }, i.e., ${N}_{I}\setminus {N}_{O}\subseteq N\left({A}^{I}\right)=\u3008\phantom{\rule{2.77626pt}{0ex}}N\phantom{\rule{0.3em}{0ex}}\left({A}_{i}^{I}\right)\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}1\le i\le k\phantom{\rule{2.77626pt}{0ex}}\u3009$. An equivalent constraint is added for X^{O} to preserve the neighborhood N_{ O }∖N_{ I }, respectively. To minimize propagation cost, this is ensured by a simple nary constraint propagation after assignment. The CSP is illustrated in Figure 4.
Although the CSPs introduced above are defined for domains of vertices v∈V_{ I }∪V_{ O }, they can be easily reformulated using integer encodings of the atom indices allowing for the application of standard constraint solvers such as Gecode Gecode[35]. This enables the use of efficient propagators for most of the required constraints, such as the algorithm of Regin [36] for globally unique assignments. Only a few binary constraints, e.g. to ensure atom label preservation or the cyclic bond pattern, require a dedicated implementation as discussed in the Conclusions section.
All solutions for these CSPs are chemically valid ITS candidates. In order to check whether or not a true ITS is found we have to ensure that the remaining atoms, i.e., those that do not participate in the ITS, can be mapped without further bond formation or breaking. This is achieved using a standard graph matching approach as discussed in the following.
Overall atom mapping computation
Given the CSP formulation from above, we can enumerate all valid ITS candidates. For a CSP solution we denote with ${a}_{i}^{I}$ and ${a}_{i}^{O}$ the assigned values of the variables ${X}_{i}^{I}$ and ${X}_{i}^{O}$, respectively. Once the ITS candidate is fixed, we can reduce the problem to a general graph isomorphism problem with a simple relabeling of the ITS edges. Thus, we derive two new adjacency matrices ${\mathcal{I}}^{\prime}$ and ${\mathcal{O}}^{\prime}$ from the original matrices and , resp., as follows: For all atom pairs (i,j) within the cyclic index sequence 12..k1, we change the corresponding adjacency information to a unique label using ${\mathcal{I}}_{{a}_{i}^{I},{a}_{j}^{I}}^{\prime}={\mathcal{O}}_{{a}_{i}^{O},{a}_{j}^{O}}^{\prime}\in \{\phantom{\rule{0.3em}{0ex}}f,b\}$ encoding if a bond between the mapped ITS vertices is formed (f) or broken (b). All other adjacency entries are kept the same as in and , respectively. In the following, we provide the ITSbondencoding adjacency matrices ${\mathcal{I}}^{\prime}$ and ${\mathcal{O}}^{\prime}$ for the example in Figure 1 given a 6cycle ITS mapping (left) resulting from a CSP solution.
Bond formations within the ITS are encoded by f while bond breakings are encoded by b. These matrices in concert with atom label information are target to full graph isomorphism search to identify the complete atom maps. In the example only the atom mapping $m\left({v}_{i}\right)={v}_{i}^{\prime}$ is found.
Given these updated “ITS encoding” adjacency matrices ${\mathcal{I}}^{\prime}$ and ${\mathcal{O}}^{\prime}$, the identification of the overall atom mapping m reduces to the graph isomorphism problem based on ${\mathcal{I}}^{\prime}$ and ${\mathcal{O}}^{\prime}$. Thus, all exact mappings of ${\mathcal{I}}^{\prime}$ onto ${\mathcal{O}}^{\prime}$ are valid atom mappings m of an elementary homovalent reaction, since the encoded ITS respects all constraints due to the CSP formulation.
Extension to other ITS layouts
Of course, not all chemical transformations are based on a homovalent elementary ITS. This will in general be the case for multistep reactions and for the socalled ambivalent reactions, in which the number of nonbonding electron pairs (and thus the oxidation number of some atoms) changes in the course of a reaction [30]. Figure 5, for example, shows a reaction for which it is not possible to find a simple homovalent circular ITS using the presented ITS encoding. Still, the reaction shows a cyclic ITS with alternating bond electron changes for all but one bond [1].
We have extended the CSPbased framework outlined above to reactions with arbitrary cyclic ITS layouts, which allows for any defined bond and atom valence changes (i.e. charge changes) within the ITS. Figure 2 exemplifies odd ITS cycle layouts for ambivalent reactions [32]. The main difference to homovalent reaction CSP is the relaxation of the homovalence constraint, which is not enforced for all participating atoms [32]. Furthermore, the preservation of bond electrons for some ITS bonds instead of a change is enforced. The latter holds for instance for the bond connecting N N^{+} and O O^{−} in Figure 5.
Implementation details
Our C++ implementation of the approach currently takes a chemical reaction in SMILES format [38], identifies chemically correct atom mappings, and returns these in annotated SMILES format. The latter provides a numbering of mapped atoms in the educts and products. It is available as C++ source code package v1.0.0 at http://www.bioinf.unifreiburg.de/Software/.
Molecule parsing, writing, and graph representation uses the chemistry module of the Graph Grammar Library (GGL) [39]. We use an explicit hydrogen representation within the CSP formulation, as in [29], because most homovalent elementary reactions involve the replacement of at least one hydrogen. Unfortunately, the compact string encoding of molecules in SMILES format does not explicitly represent hydrogens. Thus, we use the hydrogen correction procedures of the GGL to complete educt and product molecule input. The CSP formulation and solving is performed within the Gecode Gecode framework on finite integer domains [35]. The final graph matching uses the stateoftheart VF2algorithm [40], which is among the fastest available [41].
The CSP uses standard binary order constraints and the nary distinct and counting constraints provided by the Gecode library. Dedicated binary constraints propagating on unassigned domains have been implemented for preservation of atom label, degree, and homovalence. The alternating cycle is implemented by a sequence of k constraints propagating the edge valence change of ±1. The ITS local neighborhood preservation to be enforced in the extended CSP is implemented by a dedicated nary constraint over all variables propagating on assignments only.
We are using a DepthFirstSearch where the branching strategy chooses first variables with minimal domain size and first assigns nonhydrogen atom indices before hydrogen vertices are considered. The latter increases the performance to find the first solution since most reaction mechanism contain more than 50% nonhydrogen atoms. Once a nonhydrogen atom is selected for a variable, propagation will ensure that atomadjacent hydrogens are considered for the variables adjacent within the ITS cycle encoding if appropriate.
For each ITS mapping identified, a full reaction atom mapping is derived via VF2based graph matching. Therein, the discussed problem of hydrogen interchangeability (see Extended CSP formulation) is faced again and would result in symmetric overall atom mappings. This is countered by first producing intermediate “collapsed” educt/product graphs, where all adjacent nonITS hydrogens are merged into the atom labels of their adjacent nonhydrogens. This preserves the adjacency information and enables a unique mapping via VF2 excluding the hydrogensymmetries. Furthermore, this compression speeds up the graph isomorphism identification since the graph size is approximately halved.
While not described here, the CSPs can be easily extended to find candidates for the entire atom mapping by introducing additional matching variables for all atoms participating in the reaction, all constrained to preserve atom label, vertex degree, and bond valence information. But first tests (not shown) revealed that the increase in CSP size and accordingly search and propagation effort needed does not repay due to the efficiency of the VF2 graph isomorphism approach used. Therefore, we omitted this approach from this work.
Application and evaluation
Benchmark sets for the evaluation of atom mapping methods are not readily available, since wellcurated reaction databases, such as the KEGG REACTION database [42], do not provide detailed atom mapping information. Thus a manual data retrieval and curation was necessary to test the constraintbased atom mapping approach presented above.
Predicting elementary reactions
The manual annotation of all of the about 10,000 reactions compiled in the KEGG REACTION database is infeasible with our resources. A data set comprising 630 manually curated atom maps for a subset of the KEGG database has been provided by [34]. Unfortunately, these atom mappings are restricted to nonhydrogen atoms. Thus they do not cover the whole reaction mechanisms, which usually involve hydrogen replacements. We therefore manually extended the data with the corresponding hydrogen mappings within the reaction center. Furthermore, the data set covers nonelementary reactions showing either multiple reaction centers or noncyclic ITS. We found some atom mappings to be incorrect. We finally compiled a fully annotated data subset containing about 400 atom mappings of elementary reactions. The number of nonhydrogen atoms within the reactions ranges from 5 to 110 with a median of 36.
Studying the ITSs of these reactions, we found basically only 3 different ITS layouts covering 3–8 atoms. This exemplifies the very limited number of such layouts to be expected for elementary reactions. The ITS layouts found are visualized in Figure 2top. Most reactions are homovalent (375) and only 14 are found to be ambivalent reactions that change atomic oxidation states. This shows the prominence of homovalent reactions.
We applied the prototypical implementation of our extended CSP formulation to the data set using the ITS layouts depicted in Figure 2. Runtimes were on average low with a median of 0.5 seconds. Nevertheless, there are about 20 reactions where atom mapping computations took longer than ten minutes. All of them are homovalent reactions of various ITS sizes. The increased runtime correlates with the number of involved atoms (Spearman rank correlation coefficient 0.79). Most such reactions contain large, connected static parts that cover about 90% of the involved molecules. Thus, we plan to incorporate an additional preprocessing to identify the small molecular subgraphs that are likely associated with the ITS and focus the CSP on these parts. This will result in drastically reduced search spaces and thus we can expect a substantial decrease of the running times. Atom mapping computations for ambivalent reactions were fast, which results from the additional constraints for the atomic oxidation state changes.
The resulting atom mappings were compared to the manually annotated data. We found only a single incorrect solution for a homovalent reaction according to the KEGG reaction mechanism classification (see Additional file 1): for R01440, our approach predicted an ITS of k=4, while the true mechanism involves k=6 atoms. Three reactions allowed for various mechanisms where the true atom mapping was contained in the set of alternative solutions predicted by our method. All atom mapping computations for ambivalent reactions were correct.
Impact of the extended model
In order to investigate the impact of our extended CSP formulation over the basic version, we selected a representative subset of homovalent elementary reactions from the KEGG REACTION database. We restrict the evaluation to homovalent reactions due to the much higher computational cost. The latter emerges since we can not as easily identify ITS participating atoms as is the case for ambivalent reactions. The latter show at least one atom that changes its oxidation state, which confines the search space drastically.
The reactions have been chosen to provide various ITS and reaction sizes for evaluation. The average size of the selected reactions, i.e. the average number of atoms, is about 30 (Table 1 column 2) while the whole KEGG database shows an average of 50 atoms per reaction. The example reactions cover homovalent ITS sizes of k=4, 6, and 8 as introduced. Since there is no atom mapping information provided within the KEGG database, the example reactions had to be identified manually based on chemical knowledge. This again highlights the need for an automated identification of chemically feasible atom mappings as provided by our approach. The selected homovalent reactions are given in Table 2 with their respective KEGG ID, educts and products.
For each reaction, we applied our approach using both the basic and extended CSP formulation to evaluate the impact of the latter for various reaction and ITS cycle sizes. In Table 1 we report runtime, search, and solution details for the smallest ITS size k that yields a solution. For smaller values of k, the infeasibility tests were done within fractions of seconds and are therefore omitted.
Our atom mapping approach finds a first atom mapping for homovalent elementary reactions within milliseconds. It is clear that the additional constraints within the extended CSP formulation significantly increase the performance of the approach. This becomes even more striking when considering the timings for full solution enumeration. The extended CSP produces several orders of magnitude less ITS candidates (column “Sol. CSP”). Since the time consumption of the VF2 algorithm is about linear in the number of ITS candidates to test, this results in according speedups of the overall approach. Still there is room for optimization since the symmetry breaking within the CSP solution enumeration is not complete and ITS enumeration still allows for some symmetries (data not shown). The latter result from symmetries within the educt and product molecules, which are not handled by the simple ITS ordering applied so far. We are currently working on an extended generic symmetry identification and breaking for ITS, educts and products.
The strength of the extended CSP comes from the precomputed list of local neighborhoods to be part of the ITS candidate and the “hydrogen symmetry” breaking. For the reactions from Table 1, this list comprises on average about half the ITS resulting in the impressive impact of the constraint. For reaction R00059, the list covers the whole ITS with an according immense reduction in ITS candidates.
As already expected based on the results from other approaches [29], only a single or very few reaction mechanisms, i.e., nonsymmetric atom mappings, are identifiable, see Table 2 column “Sol”.
Conclusions
We have presented here the first constraint programming approach to identify chemically feasible atom mappings based on the identification of a cyclic imaginary transition state (ITS). The incorporation of the cyclic ITS structure within the search ensures the chemical correctness of the mapping that is not guaranteed by standard approaches that attempt to solve Maximum Common Edge Subgraph Problems [25]. To our knowledge, this is the first approach explicitly incorporating the cyclic ITS structure into an atom mapping procedure. The formulation of the CSP using only the atoms involved in the ITS results in a very small CSP that can be solved efficiently. Thus, it is well placed as a filter for ITS candidates for the subsequent, computationally more expensive graph matching approaches. The solutions of such an extended CSP are the desired chemically feasible atom mappings. We apply advanced symmetry breaking strategies and thus can enumerate all possible chemical mechanisms underlying a reaction.
The feasibility of the approach was introduced here for the common case of elementary, homovalent reactions, i.e., for reactions in which the transition state is an elementary cycle with an even number of atoms. We have shown that the CSP formulation can be easily extended to arbitrary cyclic ITS layouts. Usually, such reactions are not homovalent, i.e., at least one atom participating in the ITS is gaining or losing nonbonding electrons, which requires some moderate changes in the formulation of the constraints. We are currently identifying all feasible ITS layouts and are developing a generic CSP formulations for arbitrary layouts. This will result in a powerful approach to identify atom mappings with chemically valid ITSs.
At the moment, we apply a hierarchal combination of ITSfiltering via CP techniques followed by full atom mapping identification using a dedicated graph isomorphism algorithm. As already mentioned, there are also approaches to directly solve the graph isomorphism problem using CP [43][45]. While the used VF2 algorithm was shown to be efficient for first solution identification, other approaches (e.g. CPbased) show better performance for full solution space enumeration [46],[47]. Currently, we are not aware of an available, efficient integration of the approaches in Gecode v4, such that they were not yet considered in this work. A prototypical implementation of graph isomorphism using the introduced constraints and propagators did not enable VF2comparable runtimes (data not shown). Since we are dealing with molecular graphs of relatively simple structural complexity, the use of dedicated graph isomorphism algorithms, e.g. for planar graph [48], could increase the performance as well. Furthermore, other CSP encodings of the problem, e.g. by fusing current dedicated constraints like atom label preservation or homovalence into a single extensional table constraint [49], might improve the ITS identification step.
The current framework is designed to identify chemically feasible atom mappings for elementary, i.e. singlestep, reactions. There are cases where shortlived intermediate molecules are formed that immediately react into the final products. Since these intermediate structures are unknown our present approach cannot be directly applied to such reactions. As noted by Hendriksen [2], often there is only a single unknown intermediate linking two consecutive elementary reactions. We therefore plan to create “fused” ITS layouts based on our singlestep ITS encodings that will allow for the correct identification of atom mappings for multistep reactions and reveal the individual steps and intermediate structures. For the combination of ITS layouts, we are currently investigating the multistep reaction analyses by Fujita [50] and Herges [30].
Summarizing, we see constraint programming as a very promising approach to solve atom mapping problems since it provides a very flexible framework to incorporate combinatorial constraints determined by the underlying rules of chemical transformations.
Additional file
References
 1.
Fujita S:Description of organic reactions based on imaginary transition structures. 1. Introduction of new concepts. J Chem Inf Comput Sci. 1986, 26: 205212. 10.1021/ci00052a009.
 2.
Hendrickson JB:Comprehensive system for classification and nomenclature of organic reactions. J Chem Inf Comput Sci. 1997, 37: 852860. 10.1021/ci970040v.
 3.
Kotera M, Okuno Y, Hattori M, Goto S, Kanehisa M:Computational assignment of the EC numbers for genomicscale analysis of enzymatic reactions. J Am Chem Soc. 2004, 126 (50): 1648716498.
 4.
Leber M, Egelhofer V, Schomburg I, Schomburg D:Automatic assignment of reaction operators to enzymatic reactions. Bioinformatics. 2009, 25: 31353142.
 5.
Yamanishi Y, Hattori M, Kotera M, Goto S, Kanehisa M:Ezyme: predicting potential EC numbers from the chemical transformation pattern of substrateproduct pairs. Bioinformatics. 2009, 25 (12): 179186. 10.1093/bioinformatics/btp223. doi:10.1093/bioinformatics/btp223,
 6.
Arita M:Scalefreeness and biological networks. J Biochem. 2005, 138: 14. doi:10.1093/jb/mvi094,
 7.
Blum T, Kohlbacher O:Using atom mapping rules for an improved detection of relevant routes in weighted metabolic networks. J Comput Biol. 2008, 15: 565576.
 8.
Hogiri T, Furusawaa C, Shinfukua Y, Onoa N, Shimizua H:Analysis of metabolic network based on conservation of molecular structure. Biosystems. 2009, 95 (3): 175178. doi:10.1016/j.biosystems.2008.09.002,
 9.
Rautio J, Kumpulainen H, Heimbach T, Oliyai R, Oh D, Järvinen T, Savolainen J:Prodrugs: design and clinical applications. Nat Rev Drug Discov. 2008, 7 (3): 255270.
 10.
Wiechert W:C^{13}metabolic flux analysis. Meta Eng. 2001, 3: 195206. 10.1006/mben.2001.0187. doi:10.1006/mben.2001.0187,
 11.
Warr WA:A short review of chemical reaction database systems, computeraided synthesis design, reaction prediction and synthetic feasibility. Mol Inform. 2014, 33: 469476. 10.1002/minf.201400052. doi:10.1002/minf.201400052,
 12.
Chen WL, Chen DZ, Taylor KT:Automatic reaction mapping and reaction center detection. WIREs Comput Mol Sci. 2013, 3 (6): 560593. 10.1002/wcms.1140. doi:10.1002/wcms.1140,
 13.
Ehrlich HC, Rarey M:Maximum common subgraph isomorphism algorithms and their applications in molecular science: a review. WIREs Comput Mol Sci. 2011, 1 (1): 6879. 10.1002/wcms.5. doi:10.1002/wcms.5,
 14.
Dugundji J, Ugi I:An algebraic model of constitutional chemistry as a basis for chemical computer programs. Topics Cur Chem. 1973, 39: 1964.
 15.
Lynch M, Willett P:The automatic detection of chemical reaction sites. J Chem Inf Comput Sci. 1978, 18: 154159. 10.1021/ci60015a009.
 16.
Jochum C, Gasteiger J, Ugi I:The principle of minimum chemical distance (PMCD). Angew Chem Int Ed. 1980, 19: 495505. 10.1002/anie.198004953.
 17.
de Groot MJL, van Berlo RJP, van Winden WA, Verheijen PJT, Reinders MJT, de Ridder D:Metabolite and reaction inference based on enzyme specificities. Bioinformatics. 2009, 25 (22): 29752983.
 18.
Hattori M, Okuno Y, Goto S, Kanehisa M:Heuristics for chemical compound matching. Genome Inform. 2003, 14: 144153.
 19.
Heinonen M, Lappalainen S, Mielikäinen T, Rousu J:Computing atom mappings for biochemical reactions without subgraph isomorphism. J Comp Biol. 2011, 18: 4358. 10.1089/cmb.2009.0216.
 20.
Raymond JW, Willett P:Maximum common subgraph isomorphism algorithms for the matching of chemical structures. J ComputerAided Mol Design. 2002, 16: 521533. 10.1023/A:1021271615909.
 21.
Apostolakis J, Sacher O, Körner R, Gasteiger J:Automatic determination of reaction mappings and reaction center information. 2. Validation on a biochemical reaction database. J Chem Inf Mod. 2008, 48: 11901198. 10.1021/ci700433d.
 22.
Körner R, Apostolakis J:Automatic determination of reaction mappings and reaction center information. 1. The imaginary transition state energy approach. J Chem Inf Mod. 2008, 48: 11811189. 10.1021/ci7004324.
 23.
Latendresse M, Malerich JP, Travers M, Karp PD:Accurate atommapping computation for biochemical reactions. J Chem Inf Model. 2013, 52 (11): 29702982. 10.1021/ci3002217. doi:10.1021/ci3002217,
 24.
Bahiense L, Manić G, Piva B, de Souza CC:The maximum common edge subgraph problem: a polyhedral investigation. Discr Appl Math. 2012, 160: 25232541. 10.1016/j.dam.2012.01.026.
 25.
Akutsu T:Efficient extraction of mapping rules of atoms from enzymatic reaction data. J Comp Biol. 2004, 11: 449462. 10.1089/1066527041410337. doi:10.1089/1066527041410337,
 26.
Huang X, Lai J, Jennings SF:Maximum common subgraph: some upper bound and lower bound results. BMC Bioinformatics. 2006, 7 (S4): 610.1186/147121057S4S6.
 27.
Crabtree JD, Mehta DP:Automated reaction mapping. J Exp Algor. 2009, 13 (1.15): 129. doi:10.1145/1412228.1498697,
 28.
Crabtree JD, Mehta DP, Kouri TM:An opensource Java platform for automated reaction mapping. J Chem Inf Model. 2010, 50: 17511756. doi:10.1021/ci100061d,
 29.
First EL, Gounaris CE, Floudas CA:Stereochemically consistent reaction mapping and identification of multiple reaction mechanisms through integer linear optimization. J Chem Inf Model. 2012, 52 (1): 8492. doi:10.1021/ci200351b,
 30.
Herges R:Organizing principle of complex reactions and theory of coarctate transition states. Angewandte Chemie Int Ed. 1994, 33: 255276. 10.1002/anie.199402551.
 31.
Fujita S:Description of organic reactions based on imaginary transition structures. 2. Classification of onestring reactions having an evenmembered cyclic reaction graph. J Chem Inf Comput Sci. 1986, 26: 212223. 10.1021/ci00052a010.
 32.
Fujita S:Description of organic reactions based on imaginary transition structures. 3. Classification of onestring reactions having an oddmembered cyclic reaction graph. J Chem Inf Comput Sci. 1986, 26: 224230. 10.1021/ci00052a011.
 33.
Mann M, Nahar F, Ekker H, Backofen R, Stadler PF, Flamm C:Atom mapping with constraint programming. Proc. of the 19th International Conference on Principles and Practice of Constraint Programming (CP’13). LNCS, vol. 8124 . Edited by: Schulte C. 2013, 805822. doi:10.1007/9783642406270_59, Springer, Berlin,
 34.
Muller C, Marcou G, Horvath D, AiresdeSousa J, Varnek A:Models for identification of erroneous atomtoatom mapping of reactions performed by automated algorithms. J Chem Inf Mod. 2012, 52 (12): 31163122. 10.1021/ci300418q. doi:10.1021/ci300418q,
 35.
Gecode Team: Gecode: Generic Constraint Development Environment. Available as an opensource library from [], 2014.,http://www.gecode.org
 36.
Regin JC:A filtering algorithm for constraints of difference. Proceedings of the 12th National Conference of the American Association for Artificial Intelligence . 1994, 362367. American Association for Artificial Intelligence, Menlo Park,
 37.
Meisenheimer J:Über eine eigenartige Umlagerung des MethylallylanilinNoxyds. Chemische Berichte. 1919, 52: 16671677.
 38.
Weininger D:SMILES:a chemical language and information system. 1 Introduction to methodology and encoding rules. J Chem Inf Comp Sci. 1988, 28 (1): 3136. 10.1021/ci00057a005. doi:10.1021/ci00057a005,
 39.
Mann M, Ekker H, Flamm C:The graph grammar library  a generic framework for chemical graph rewrite systems. Theory and Practice of Model Transformations, Proc. of ICMT 2013. LNCS, vol. 7909 . Edited by: Duddy K, Kappel G. 2013, 5253. doi:10.1007/9783642388835_5. Extended abstract at, ICMT, long version at arXiv [http://arxiv.org/abs/1304.1356], Springer, Berlin,http://arxiv.org/abs/1304.1356
 40.
Cordella LP, Foggia P, Sansone C, Vento M:A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans Pattern Anal Mach Intell. 2004, 26 (10): 13671372.
 41.
Cordella LP, Foggia P, Sansone C, Vento M: Performance evaluation of the VF graph matching algorithm Proceedings of the 10th International Conference on Image Analysis and Processing. ICIAP ‘99 . 1999, 11721177. IEEE Computer Society, Washington, D
 42.
Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M:KEGG for integration and interpretation of largescale molecular data sets. Nuc Acids Res. 2012, 40 (Database issue): 109114. 10.1093/nar/gkr988. doi:10.1093/nar/gkr988
 43.
Zampelli S, Mann M, Deville Y, Backofen R: Techniques de decomposition pour l’isomorphisme de sousgraphe. In Proc. of the 4th Journees Francophones de Programmation Par Contraintes (JFPC’08); 2008. An english version of the article is available at [].,http://arxiv.org/abs/0805.1030v1
 44.
Dooms G, Deville Y, Dupont P:CP(Graph)Introducing a graph computation domain in constraint programming. Principles and Practice of Constraint Programming  CP 2005. LNCS, vol. 3709 . 2005, 211225. Springer, Berlin,
 45.
Rudolf M:Utilizing constraint satisfaction techniques for efficient graph pattern matching. Theory and Application of Graph Transformations. LNCS, vol 1764 . 2000, 381394. Springer, Berlin,
 46.
Zampelli S, Deville Y, Solnon C:Solving subgraph isomorphism problems with constraint programming. Constraints. 2010, 15 (3): 327353. 10.1007/s1060100990743.
 47.
Solnon C:Alldifferentbased filtering for subgraph isomorphism. Artif Intell. 2010, 174: 850864. 10.1016/j.artint.2010.05.002.
 48.
Eppstein D:Subgraph isomorphism in planar graphs and related problems. Proceedings of the Sixth Annual ACMSIAM Symposium on Discrete Algorithms. SODA ‘95 . 1995, 632640. Society for Industrial and Applied Mathematics, Philadelphia,
 49.
Gent IP, Jefferson C, Miguel I, Nightingale P:Data structures for generalised arc consistency for extensional constraints. Proceedings of the Twenty Second Conference on Artificial Intelligence (AAAI07), Vancouver, British Columbia, Canada . 2007, 191197. AAAI Press, Menlo Park,
 50.
Fujita S:Description of organic reactions based on imaginary transition structures. 5. Recombination of reaction strings in a synthesis space and its application to the description of synthetic pathways. J Chem Inf Comput Sci. 1986, 26: 238242. 10.1021/ci00052a013.
Acknowledgements
We thank Prof. Alexandre Varnek for providing the manually currated reaction data set from [34] in RDF format. Furthermore, we thank Heinz Ekker for comments and discussions on the initial project design. This work was partially supported by the COST Action CM1304 “Emergence and Evolution of Complex Chemical Systems”. The article processing charge was funded by the German Research Foundation (DFG) and the Albert Ludwigs University Freiburg in the funding programme Open Access Publishing.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
Initial project design was done by MM, CF, and PFS, which was implemented by FN and MM. Final study design and manuscript by MM, CF, PFS, and RB. Manual data retrieval and approach evaluation by NS and MM. All authors read and approved the final manuscript.
Electronic supplementary material
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Received
Accepted
Published
DOI
Keywords
 Atomatom mapping
 Constraint programming
 Chemical reaction
 Imaginary transition state