A novel method for inference of acyclic chemical compounds with bounded branch-height based on artificial neural networks and integer programming

Analysis of chemical graphs is becoming a major research topic in computational molecular biology due to its potential applications to drug design. One of the major approaches in such a study is inverse quantitative structure activity/property relationship (inverse QSAR/QSPR) analysis, which is to infer chemical structures from given chemical activities/properties. Recently, a novel two-phase framework has been proposed for inverse QSAR/QSPR, where in the first phase an artificial neural network (ANN) is used to construct a prediction function. In the second phase, a mixed integer linear program (MILP) formulated on the trained ANN and a graph search algorithm are used to infer desired chemical structures. The framework has been applied to the case of chemical compounds with cycle index up to 2 so far. The computational results conducted on instances with n non-hydrogen atoms show that a feature vector can be inferred by solving an MILP for up to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=40$$\end{document}n=40, whereas graphs can be enumerated for up to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=15$$\end{document}n=15. When applied to the case of chemical acyclic graphs, the maximum computable diameter of a chemical structure was up to 8. In this paper, we introduce a new characterization of graph structure, called “branch-height” based on which a new MILP formulation and a new graph search algorithm are designed for chemical acyclic graphs. The results of computational experiments using such chemical properties as octanol/water partition coefficient, boiling point and heat of combustion suggest that the proposed method can infer chemical acyclic graphs with around \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=50$$\end{document}n=50 and diameter 30.


Introduction
In computational molecular biology, various types of data have been utilized, which include sequences, gene expression patterns, and protein structures. Graph structured data have also been extensively utilized, which include metabolic pathways, protein-protein interaction networks, gene regulatory networks, and chemical graphs. Much attention has recently been paid to analysis of chemical graphs due to its potential applications to computer-aided drug design. One of the major approaches to computer-aided drug design is quantitative structure activity/property relationships (QSAR/QSPR) analysis, the purpose of which is to derive quantitative relationships between chemical structures and their activities/properties. Furthermore, inverse QSAR/QSPR has been extensively studied [13,19], the purpose of which is to infer chemical structures from given chemical activities/properties. Inverse QSAR/QSPR is often formulated as an optimization problem to find a chemical structure maximizing (or minimizing) an objective function under various constraints.
In both QSAR/QSPR and inverse QSAR/QSPR, chemical compounds are usually represented as vectors of real or integer numbers, which are often called descriptors and correspond to feature vectors in machine learning. Using these chemical descriptors, various heuristic and statistical methods have been developed for finding optimal or nearly optimal graph structures under given objective functions [8,13,17]. Inference or enumeration of graph structures from a given feature vector is a crucial subtask in many of such methods. Various methods have been developed for this enumeration problem [6,10,12,16] and the computational complexity of the inference problem has been analyzed [1,14]. On the other hand, enumeration in itself is a challenging task, since the number of molecules (i.e., chemical graphs) with up to 30 atoms (vertices) C, N, O, and S, may exceed 10 60 [4].
As a new approach, artificial neural network (ANN) and deep learning technologies have recently been applied to inverse QSAR/QSPR. For example, variational autoencoders [7], recurrent neural networks [18,23], and grammar variational autoencoders [11] have been applied. In these approaches, new chemical graphs are generated by solving a kind of inverse problems on neural networks that are trained using known chemical compound/activity pairs. However, the optimality of the solution is not necessarily guaranteed in these approaches. In order to guarantee the optimality mathematically, a novel approach has been proposed [2] for ANNs, using mixed integer linear programming (MILP).
Recently, a new framework has been proposed [3,5,24] by combining two previous approaches; efficient enumeration of tree-like graphs [6], and MILP-based formulation of the inverse problem on ANNs [2]. This combined framework for inverse QSAR/QSPR mainly consists of two phases. The first phase solves (I) Prediction Problem, where a feature vector f (G) of a chemical graph G is introduced and a prediction function ψ N on a chemical property π is constructed with an ANN N using a data set of chemical compounds G and their values a(G) of π. The second phase solves (II) Inverse Problem, where (II-a) given a target value y * of the chemical property π, a feature vector x * is inferred from the trained ANN N so that ψ N (x * ) is close to y * and (II-b) then a set of chemical structures G * such that f (G * ) = x * is enumerated by a graph search algorithm. In (II-a) of the above-mentioned previous methods [3,5,24], an MILP is formulated for acyclic chemical compounds. Afterwards, Ito et al. [9] and Zhu et al. [25] designed a method of inferring chemical graphs with cycle index 1 and 2, respectively by formulating a new MILP and using an efficient algorithm for enumerating chemical graphs with cycle index 1 [20] and cycle index 2 [21,22]. The computational results conducted on instances with n non-hydrogen atoms show that a feature vector x * can be inferred for up to around n = 40 whereas graphs G * can be enumerated for up to around n = 15.
In this paper, we present a new characterization of graph structure, called "branch-height." Based on this, we can treat a class of acyclic chemical graphs with a structure that is topologically restricted but frequently appears in the chemical database, formulate a new MILP formulation that can handle acyclic graphs with a large diameter, and design a new graph search algorithm that generates acyclic chemical graphs with up to 50 vertices. The results of computational experiments using such chemical properties as octanol/water partition coefficient, boiling point and heat of combustion suggest that the proposed method is much more useful than the previous method.
The paper is organized as follows. Section 2 introduces some notions on graphs, a modeling of chemical compounds and a choice of descriptors. Section 3 reviews the framework for inferring chemical compounds based on ANNs and MILPs. Section 4 introduces a new method of modeling acyclic chemical graphs and proposes a new MILP formulation that represents an acyclic chemical graph G with n vertices, where our MILP requires only O(n) variables and constraints when the branch-parameter k and the k-branch-height in G (graph topological parameters newly introduced in this paper) is constant. Section 5 describes the idea of our new dynamic programming type of algorithm that enumerates a given number of acyclic chemical graphs for a given feature vector. Section 6 reports the results on some computational experiments conducted for s chemical properties such as octanol/water partition coefficient, boiling point and heat of combustion. Section 7 makes some concluding remarks. Appendix A provides the statistical feature on structure of acyclic chemical graphs in a chemical graph database. Appendix B describes the details of all variables and constraints in our MILP formulation. Appendix C presents descriptions of our new graph search algorithm.

Preliminary
This section introduces some notions and terminology on graphs, a modeling of chemical compounds and our choice of descriptors.
Let R, Z and Z + denote the sets of reals, integers and non-negative integers, respectively. For two integers a and b, let [a, b] denote the set of integers i with a ≤ i ≤ b.

Graphs
A graph stands for a simple undirected graph, where an edge joining two vertices u and v is denoted by uv (= vu). The sets of vertices and edges of a graph G are denoted by V (G) and E(G), respectively. Let H = (V, E) be a graph with a set V of vertices and a set E of edges. For a vertex v ∈ V , the set of neighbors of v in H is denoted by N H (v), and the degree deg H (v) of v is defined to be |N H (v)|. The length of a path is defined to be the number of edges in the path. The distance dist H (u, v) between two vertices u, v ∈ V is defined to be the minimum length of a path connecting u and v in H. The diameter dia(H) of H is defined to be the maximum distance between two vertices in H; i.e., dia(H) max u,v∈V dist H (u, v). Denote by ℓ(P ) the length of a path P .
Trees For a tree T with an even (resp., odd) diameter d, the center is defined to be the vertex v (resp., the adjacent vertex pair {v, v ′ }) that situates in the middle of one of the longest paths with length d. The center of each tree is uniquely determined.
Rooted Trees A rooted tree is defined to be a tree where a vertex (or a pair of adjacent vertices) is designated as the root. Let T be a rooted tree, where for two adjacent vertices u and v, vertex u is called the parent of v if u is closer to the root than v is. The height height(v) of a vertex v in T is defined to be the maximum length of a path from v to a leaf u in the descendants of v, where height(v) = 0 for each leaf v in T . Figure 1 Degree-bounded Trees For positive integers a, b and c with b ≥ 2, let T (a, b, c) denote the rooted tree such that the number of children of the root is a, the number of children of each non-root internal vertex is b and the distance from the root to each leaf is c. We see that the number of vertices in T (a, b, c) is a(b c − 1)/(b − 1) + 1, and the number of non-leaf vertices in T (a, b, c) is a(b c−1 − 1)/(b − 1) + 1. In the rooted tree T (a, b, c), we denote the vertices by v 1 , v 2 , . . . , v n with a breadth-first-search order, and denote the edge between a vertex v i with i ∈ [2, n] and its parent by e i , where n = a(b c − 1)/(b − 1) + 1 and each vertex v i with i ∈ [1, a(b c−1 − 1)/(b − 1) + 1] is a non-leaf vertex. For each vertex v i in T (a, b, c), let Cld(i) denote the set of indices j such that v j is a child of v i , and prt(i) denote the index j such that v j is the parent of v i when i ∈ [2, n]. Let P prc (a, b, c) be a set of ordered index pairs (i, j) of vertices v i and v j in T (a, b, c). We call P prc (a, b, c) proper if the next conditions hold: (a) For each subtree H = (V, E) of T (a, b, c) with v 1 ∈ V , there is at least one subtree H ′ = (V ′ , E ′ ) such that -H ′ is isomorphic to H by a graph isomorphism ψ : V → V ′ with ψ(v 1 ) = v 1 ; and -for each pair (i, j) ∈ P prc (a, b, c), if v j ∈ V ′ then v i ∈ V ′ ; and (b) For each pair of vertices v i and v j in T (a, b, c) such that v i is the parent of v j , there is a sequence (i 1 , i 2 ), (i 2 , i 3 ), . . . , (i k−1 , i k ) of index pairs in P prc (a, b, c) such that i 1 = i and i k = j.
Note that a proper set P prc (a, b, c) is not necessarily unique.
Branch-height in Trees In this paper, we introduce "branch-height" of a tree as a new measure to the "agglomeration degree" of trees. We specify a non-negative integer k, called a branchparameter to define branch-height. First we regard T as a rooted tree by choosing the center of T as the root. Figure 1(a) and (b) illustrate examples of rooted trees. We introduce the following terminology on a rooted tree T .
-A leaf k-branch: a non-root vertex v in T such that height(v) = k.
-A non-leaf k-branch: a vertex v in T such that v has at least two children u with height(u) ≥ k. We call a leaf or non-leaf k-branch a k-branch. Figure 2(a)-(c) illustrate the k-branches of the rooted tree H 2 in Figure 1(b) for k = 1, 2 and 3, respectively.
-A k-branch-path: a path P in T that joins two vertices u and u ′ such that each of u and u ′ is the root or a k-branch and P does not contain the root or a k-branch as an internal vertex.
-The k-branch-subtree of T : the subtree of T that consists of the edges in all k-branch-paths of T . We call a vertex (resp., an edge) in T a k-internal vertex (resp., a k-internal edge) if it is contained in the k-branch-subtree of T and a k-external vertex (resp., a k-external edge) otherwise. Let V in and V ex (resp., E in and E ex ) denote the sets of k-internal and k-external vertices (resp., edges) in T .
-The k-branch-tree of T : the rooted tree obtained from the k-branch-subtree of T by replacing each k-branch-path with a single edge. Figure 1(c) illustrates the 2-branch-tree of the rooted tree H 2 in Figure 1(b).
-A k-fringe-tree: One of the connected components that consists of the edges not in any kbranch-subtree. Each k-fringe-tree T ′ contains exactly one vertex v in a k-branch-subtree, where T ′ is regarded as a tree rooted at v. Note that the height of any k-fringe-tree is at most k. Figure 2(a)-(c) illustrate the k-fringe-tree of the rooted tree H 2 in Figure 1(b) for k = 1, 2 and 3, respectively.
-The k-branch-height bh k (T ) of T : the maximum number of non-root k-branches along a path from the root to a leaf of T ; i.e., bh k (T ) is the height of the k-branch-tree T * (the maximum length of a path from the root to a leaf in T * ). For the example of trees H i , i = 1, 2 in Figure 1(a) and (b), it holds that bh 0 (H 1 ) = bh 0 (H 2 ) = 5, bh 1 (H 1 ) = bh 1 (H 2 ) = 3, bh 2 (H 1 ) = bh 2 (H 2 ) = 2 and bh 3 (H 1 ) = bh 3 (H 2 ) = 1.
We observe that most chemical graphs G with at most 50 non-hydrogen atoms satisfy bh 2 (G) ≤ 2. See Appendix A for a summary of statistical feature of chemical graphs registered in the chemical database PubChem.

Modeling of Chemical Compounds
We represent the graph structure of a chemical compound as a graph with labels on vertices and multiplicity on edges in a hydrogen-suppressed model. Let Λ be a set of labels each of which represents a chemical element such as C (carbon), O (oxygen), N (nitrogen) and so on, where we assume that Λ does not contain H (hydrogen). Let mass(a) and val(a) denote the mass and valence of a chemical element a ∈ Λ, respectively. In our model, we use integers mass * (a) = ⌊10 · mass(a)⌋, a ∈ Λ and assume that each chemical element a ∈ Λ has a unique valence val(a) ∈ [1,4]. We introduce a total order < over the elements in Λ according to their mass values; i.e., we write a < b for chemical elements a, b ∈ Λ with mass(a) < mass(b). Choose a set Γ < of tuples γ = (a, b, m) ∈ Λ × Λ × [1,3] such that a < b. For a tuple γ = (a, b, m) ∈ Λ × Λ × [1,3], let γ denote the tuple (b, a, m). Set Γ > = {γ | γ ∈ Γ < } and Γ = = {(a, a, m) | a ∈ Λ, m ∈ [1, 3]}. A pair of two atoms a and b joined with a bond-multiplicity m is denoted by a tuple γ = (a, b, m) ∈ Γ, called the adjacency-configuration of the atom pair.
We use a hydrogen-suppressed model because hydrogen atoms can be added at the final stage. A chemical graph over Λ and Γ < ∪ Γ = is defined to be a tuple G = (H, α, β) of a graph H = (V, E), a function α : V → Λ and a function β : For a notational convenience, we denote the sum of bond-multiplicities of edges incident to a vertex as follows: A chemical graph G = (H, α, β) is called a "chemical monocyclic graph" if the graph H is a monocyclic graph. Similarly for other types of graphs for H.
We define the bond-configuration of an edge e = uv ∈ E in a chemical graph G to be a tuple (deg H (u), deg H (v), β(e)) such that deg H (u) ≤ deg H (v) for the end-vertices u and v of e. Let Bc denote the set of bond-configurations µ = (d 1 , d 2 , m) ∈ [1,4]

Descriptors
In our method, we use only graph-theoretical descriptors for defining a feature vector, which facilitates our designing an algorithm for constructing graphs. Given a chemical acyclic graph G = (H, α, β), we define a feature vector f (G) that consists of the following 11 kinds of descriptors. We choose an integer k * ∈ [1, 4] as a branch-parameter.
The number K of descriptors in our feature vector x = f (G) is K = 2|Λ| + 2|Γ| + 50. Note that the set of the above K descriptors is not independent in the sense that some descriptor depends on the combination of other descriptors in the set. For example, descriptor bd in i (G) can be determined by γ=(a,b,m)∈Γ:m=i ac in γ (G).

Framework for the Inverse QSAR/QSPR
We review the framework that solves the inverse QSAR/QSPR by using MILPs [9,25], which is illustrated in Figure 3. For a specified chemical property π such as boiling point, we denote by a(G) the observed value of the property π for a chemical compound G. As the first phase, we solve (I) Prediction Problem with the following three steps.
Phase 1. Stage 1: Let DB be a set of chemical graphs. For a specified chemical property π, choose a class G of graphs such as acyclic graphs or monocyclic graphs. Prepare a data set D π = {G i | i = 1, 2, . . . , m} ⊆ G ∩ DB such that the value a(G i ) of each chemical graph G i , i = 1, 2, . . . , m is available. Set reals a, a ∈ R so that a ≤ a(G i ) ≤ a, i = 1, 2, . . . , m.
Stage 2: Introduce a feature function f : G → R K for a positive integer K. We call f (G) the feature vector of G ∈ G, and call each entry of a vector f (G) a descriptor of G.
Stage 3: Construct a prediction function ψ N with an ANN N that, given a vector in R K , returns a real in the range [a, a] so that ψ N (f (G)) takes a value nearly equal to a(G) for many chemical graphs in D. See Figure 3(a) for an illustration of Stages 1 ,2 and 3 in Phase 1.
In this paper, we use the range-based method to define an applicability domain (AD) [15] to our inverse QSAR/QSPR. Set x j and x j to be the minimum and maximum values of the j-th descriptor x j in f (G i ) over all graphs G i , i = 1, 2, . . . , m (where we possibly normalize some descriptors such as ce in a (G), which is normalized with ce in a (G)/n(G)). Define our AD D to be the set of vectors x ∈ R K such that x j ≤ x j ≤ x j for the variable x j of each j-th descriptor, j = 1, 2, . . . , k.
In the second phase, we try to find a vector x * ∈ R K from a target value y * of the chemical propery π such that ψ N (x * ) = y * . Based on the method due to Akutsu and Nagamochi [2],  Figure 3: (a) An illustration of Phase 1: Stage 1 for preparing a data set D π for a graph class G and a specified chemical property π; Stage 2 for introducing a feature function f with descriptors; Stage 3 for constructing a prediction function ψ N with an ANN N ; (b) An illustration of Phase 2: Stage 4 for formulating an MILP M(x, y, g; C 1 , C 2 ) and finding a feasible solution (x * , g * ) of the MILP for a target value y * so that ψ N (x * ) = y * (possibly detecting that no target graph G * exists); Stage 5 for enumerating graphs G * ∈ G such that f (G * ) = x * .
Chiewvanichakorn et al. [5] showed that this problem can be formulated as an MILP. By including a set of linear constraints such that x ∈ D into their MILP, we obtain the next result. A vector x ∈ R K is called admissible if there is a graph G ∈ G such that f (G) = x [3]. Let A denote the set of admissible vectors x ∈ R K . To ensure that a vector x * inferred from a given target value y * becomes admissible, we introduce a new vector variable g ∈ R q for an integer q. For the class G of chemical acyclic graphs, Azam et al. [3] introduced a set C 2 of new constraints with a new vector variable g ∈ R q for an integer q so that a feasible solution (x * , g * ) of a new MILP for a target value y * delivers a vector x * with ψ N (x * ) = y * and a vector g * that represents a chemical acyclic graph G * ∈ G. Afterwards, for the classes of chemical graphs with cycle index 1 and 2, Ito et al. [3] and Zhu et al. [25] presented such a set C 2 of constraints so that a vector g * in a feasible solution (x * , g * ) of a new MILP can represent a chemical graph G * in the class G, respectively.
As the second phase, we solve (II) Inverse Problem for the inverse QSAR/QSPR by treating the following inference problems.
(II-a) Inference of Vectors Input: A real y * with a ≤ y * ≤ a. Output: Vectors x * ∈ A ∩ D and g * ∈ R q such that ψ N (x * ) = y * and g * forms a chemical graph G * ∈ G with f (G * ) = x * .
(II-b) Inference of Graphs Input: A vector x * ∈ A ∩ D. Output: All graphs G * ∈ G such that f (G * ) = x * .
The second phase consists of the next two steps.

Stage 5:
To solve Problem (II-b), enumerate all (or a specified number) of graphs G * ∈ G such that f (G * ) = x * for the inferred vector x * . See Figure 3(b) for an illustration of Stages 4 and 5 in Phase 2.

Our Target Graph Class
In this paper, we choose a branch-parameter k ≥ 1 and define a class G of chemical acyclic graphs G such that -the maximum degree in G is at most 4; -the k-branch height bh k (G) is bounded for a specified branch-parameter k; and -the size of each k-fringe-tree in G is bounded.
The reason why we restrict ourselves to the graphs in G is that this class G covers a large part of the acyclic chemical compounds registered in the chemical database PubChem. See Appendix A for a summary of the statical feature of the chemical graphs in PubChem in terms of k-branch height and the size of 2-fringe-trees. According to this, over 55% (resp., 99%) of acyclic chemical compounds with up to 100 non-hydrogen atoms in PubChem have the maximum degree 3 (resp., 4); and nearly 87% (resp., 99%) of acyclic chemical compounds with up to 50 non-hydrogen atoms in PubChem has the 2-branch height at most 1 (resp., 2). This implies that k = 2 is sufficient to cover the most of chemical acyclic graphs. For k = 2, over 92% of 2-fringe-trees of chemical compounds with up to 100 non-hydrogen atoms in PubChem obey the following size constraint: n ≤ 2d + 2 for each 2-fringe-tree T with n vertices and d children of the root. (1) We formulate an MILP in Stage 4 that, given a target value y * , infers a vector x * ∈ Z K + with ψ N (x * ) = y * and a chemical acyclic graph G * = (H, α, β) ∈ G with f (G * ) = x * . We here specify some of the features of a graph G * ∈ G such as the number of non-hydrogen atoms in order to control the graph structure of target graphs to be inferred and to simplify MILP formulations. In this paper, we specify the following features on a graph G ∈ G: a set Λ of chemical elements, a set Γ < of adjacency-configuration, the maximum degree, the number of non-hydrogen atoms, the diameter, the k-branch-height and the k-branch-leaf-number for a branch-parameter k.
More formally, given specified integers n * , d max , dia * , k * , bh * , bl * ∈ Z other than Λ and Γ, let H(n * , d max , dia * , k * , bh * , bl * ) denote the set of acyclic graphs H such that the maximum degree of a vertex is at most 3 when d max = 3 (or equal to 4 when d max = 4), the number n(H) of vertices in H is n * , the diameter dia(H) of H is dia * , the k * -branch-height bh k * (H) is bh * , the k * -branch-leaf-number bl k * (H) is bl * and (1) holds.
Design of Stage 5; i.e. generating chemical graphs G * that satisfy f (G * ) = x * for a given feature vector x * ∈ Z K + is still challenging for a relatively large instance with size n(G * ) ≥ 20. There have been proposed algorithms for generating chemical graphs G * in Stage 5 for the classes of graphs with cycle index 0 to 2 [6,20,21,22]. All of these are designed based on the branchand-bound method and can generate a target chemical graph with size n(G * ) ≤ 20. To break this barrier, we newly employ the dynamic programming method for designing an algorithm in Stage 5 in order to generate a target chemical graph G * with size n(G * ) = 50. For this, we further restrict the structure of acyclic graphs G so that the number bl 2 (G) of leaf 2-branches is at most 3. Among all acyclic chemical compounds with up to 50 non-hydrogen atoms in the chemical database PubChem, the ratio of the number of acyclic chemical compounds G with bl 2 (G) ≤ 2 (resp., bl 2 (G) ≤ 3) is 78% (resp., 95%). See Section 5 for the details on the new algorithm in Stage 5.

MILPs for Chemical Acyclic Graphs with Bounded Branchheight
In this section, we formulate an MILP M(x, g; C 2 ) to infer a chemical acyclic graph G in the class G for a given specification (Λ, Γ, n * , d max , dia * , k * , bh * , bl * ) defined in the previous section.

Scheme Graphs
We introduce a directed graph with size O(n * · (d max − 1) max{bh * ,k * } + (d max − 1) bh * +k * ), called a scheme graph SG, so that an acyclic graph H ∈ H(n * , d max , dia * , k * , bh * , bl * ) can be chosen from the scheme graph SG. Let t * , s * and c * be integers such that Let a scheme graph SG(d max , k * , bh * , t * ) consist of a tree T B , a path P t * , a set {S s | s ∈ [1, s * ]} of trees, a set {T t | t ∈ [1, t * ]} of trees, and a set of directed edges between T B and P t * so that an acyclic graph H ∈ H(n * , d max , dia * , k * , bh * , bl * ) will be constructed in the following way: (i) The k * -branch-tree of H will be chosen as a subtree of T B = (V B , E B ); (ii) Each k * -fringe-tree rooted at a vertex u s ∈ V (T B ) of H will be chosen as a subtree of S s ; (iii) Each k-branch-path of H (except for its end-vertices) will be chosen as a subpath of P t * or as an edge in T B ; (iv) Each k * -fringe-tree rooted at a vertex v t ∈ V (P t * ) of H will be chosen as a subtree of T t ; and (v) An edge (u, v) directed from T B to P t * will be selected as an initial edge of a k * -branch-path of H and an edge (v, u) directed from P t * to T B will be selected as an ending edge of a k * -branch-path of H.
More formally each component of a scheme graph SG(d max , k * , bh * , t * ) is defined as follows.
Regard T B as an ordered tree by introducing a total order for each set of siblings and call the first (resp., last) child in a set of siblings the leftmost (resp. rightmost) child, which defines the leftmost (rightmost) path from the root u 1 to a leaf in T B , as illustrated in Figure 4 Regard each edge a i ∈ E B as a directed edge (u s , u s ′ ) from one end-vertex u s of a i to the other end-vertex u s ′ of a i such that s = prt(s ′ ) (i.e., u s is the parent of u s ′ ), where head(i) and tail(i) denote the head u s ′ and tail u s of edge a i ∈ E B , respectively.
Let L B denote the set of indices of leaves of T B , and s left (resp., s right ) denote the index s ∈ L B of the leaf u s at which the leftmost (resp., rightmost) path from the root ends.
For each leaf u s , s ∈ L B , let V B,s (resp., E B,s ) denote the set of indices s of non-root vertices u s (resp., indices i of edges a(i) ∈ E B ) along the path from the root to the leaf u s in the tree T B .
For the example of a base-tree T B with bh * = 2 in Figure 4 (ii) S s , s ∈ [1, s * ] is a tree rooted at vertex u s ∈ V B in T B that is isomorphic to the rooted tree T (d max −1, d max −1, k * ), as illustrated in Figure 4(b). Let u s,i and e ′ s,i denote the vertex and edge in S s that correspond to the i-th vertex and the i-th edge in T (d max −1, d max −1, k * ), respectively. Regard each edge e ′ s,i as a directed edge (u s,prt(i) , u s,i ). For this, each vertex u s ∈ V B is also denoted by u s,1 . Figure 4(c). Let v t,i and e t,i denote the vertex and edge in T t that correspond to the i-th vertex and the i- (v) For every pair (s, t) with s ∈ [1, s * ] and t ∈ [1, t * ], join vertices u s and v t with directed edges (u s , v t ) and (v t , u s ), as illustrated in Figure 4(a) .  Figure 4: An illustration of scheme graph SG(d max , k * , bh * , t * ) with d max = 3, k * = 2, bh * = 2, and t * = 5, where the vertices in T B (resp., in P t * ) are depicted with black (resp., gray) circles: (a) A base-tree T B and a link-path P t * are joined with directed edges between them; (b) A tree S s rooted at a vertex u s = u s,1 ∈ V B ; (c) A tree T t rooted at a vertex v t = v t,1 ∈ V P .  Figure 5(b) illustrates the 2-branch-tree of the acyclic graph H in Figure 5(a). Figure 5(c) illustrates a subgraph H ′ of the scheme graph SG(d max , k * , bh * , t * = n * −bl * −1) such that H ′ is isomorphic to the acyclic graph H in Figure 5(a).
In this paper, we obtain the following result.
Theorem 2. Let Λ be a set of chemical elements, Γ be a set of adjacency-configurations, where |Λ| ≤ |Γ|, and K = |Λ| + |Γ| + 28. Given non-negative integers n * ≥ 3, constraints on x and g such that: Note that our MILP requires only O(n * ) variables and constraints when the branch-parameter k * , the k * -branch height and |Γ| are constant. We formulate an MILP in Theorem 2 so that such a graph H is selected as a subgraph of the scheme graph.
We explain the basic idea of our MILP. The MILP mainly consists of the following three types of constraints. C1. Constraints for selecting an acyclic graph H as a subgraph of the scheme graph SG(d max , k * , bh * , t * ); C2. Constraints for assigning chemical elements to vertices and multiplicity to edges to determine a chemical graph G = (H, α, β); and C3. Constraints for computing descriptors from the selected acyclic chemical graph G.
In the constraints of C1, more formally we prepare the following.
(i) In the scheme graph SG(d max , k * , bh * , t * ), we prepare a binary variable u(s, 1) for each vertex u s = u s,1 ∈ V B , s ∈ [1, s * ] so that vertex u s = u s,1 becomes a k * -branch of a selected graph H if and only if u(s, 1) = 1. The subgraph of the base-tree T B that consists of vertices u s = u s,1 with u(s, 1) = 1 will be the k * -branch-tree of the graph H. We also prepare a binary variable For a pair of a vertex u s,1 and a child u s ′ ,1 of u s,1 such that u(s, 1) = u(s ′ , 1) = 1, either the edge . For example, vertices u 1,1 and u 2,1 are connected by a path Figure 5(c).
, v(t, i) = 1) means that the corresponding vertex u s,i (resp., v t,i ) is used as a vertex in a selected graph H. The (non-empty) subgraph of a tree S s (resp., T t ) that consists of vertices u s,i with u(s, i) = 1 (resp., v t,i with v(t, i) = 1) will be a k * -fringe-tree of a selected graph H.
(iii) In the link-path P t * , we prepare a binary variable e(t), t ∈ [2, t * ] for each edge e t,1 = (v t−1,1 , v t,1 ) ∈ E P so that e(t) = 1 if and only if edge e t,1 is used in some path (iv) For each pair (s, t) of s ∈ [1, s * ] and t ∈ [1, t * ], we prepare a binary variable e(s, t) (resp., e(t, s)) so that e(s, t ′ ) = 1 (resp., e(t ′′ , s) = 1) if and only if directed edge (u s,1 , v t ′ ,1 ) (resp., (v t ′′ ,1 , u s,1 )) is used as the first edge (resp., last edge) of some path Based on these, we include constraints with some more additional variables so that a selected subgraph H is a connected acyclic graph. See constraints (13) to (33) in Appendix B for the details.
In the constraints of C3, we introduce a variable for each descriptor and constraints with some more variables to compute the value of each descriptor in f (G) for a selected chemical graph G. See constraints (49) to (77) in Appendix B for the details.

A New Graph Search Algorithm
The algorithm used in Stage 5 in the previous methods of inferring chemical acyclic graphs [3,5,24] are all based on the branch-and-bound algorithm proposed by Fujiwara et al. [6] where an enormous number of chemical graphs are constructed by repeatedly appending and removing a vertex one by one until a target chemical graph is constructed. Their algorithm cannot generate even one acyclic chemical graph when n(G) is larger than around 20.
This section designs a new dynamic programming method for designing an algorithm in Stage 5. We consider the following aspects: (a) Treat acyclic graphs with a certain limited structure that frequently appears among chemical compounds registered in the chemical data base; and (b) Instead of manipulating acyclic graphs directly, first compute the frequency vectors f f f (G ′ ) (some types of feature vectors) of subtrees G ′ of all target acyclic graphs and then construct a limited number of target graphs G from the process of computing the vectors.
In (a), we choose a branch-parameter k * = 2 and treat acyclic graphs G that have a small 2-branch number such as bl 2 (G) ∈ [2,3]. and satisfy the size constraint (1) on 2-fringe-trees. We design a method in (b) based on the mechanism of dynamic programming wherein the first phase computes some compressed forms of all substructures of target objects before the second phase realizes a final object based on the computation process of the first phase. Section 5.1 defines a frequency vector f f f (G) that represents a feature vector f (G) of a chemical graph G. Section 5.2 presents the idea and a sketch of our new algorithms for generating acyclic graphs G with bl 2 (G) ∈ [2,3]. Detailed descriptions of the algorithms are presented in Appendix C.

Multi-rooted Trees and Frequency Vectors
For a finite set A of elements, let Z A + denote the set of functions w w w : is called a non-negative integer vector (or a vector) on A and the value x x x(a) for an element a ∈ A is called the entry of x x x for a ∈ A. For a vector w w w ∈ Z A + and an element a ∈ A, let w w w + 1 1 1 a (resp., w w w −1 1 1 a ) denote the vector w w w ′ such that w w w ′ (a) = w w w(a) + 1 (resp., w w w ′ (a) = w w w(a) − 1) and w w w ′ (b) = w w w(b) for the other elements b ∈ A \ {a}. For a vector w w w ∈ Z A + and a subset B ⊆ A, let w w w [B] denote the projection of w w w to B; i.e., w w w [B] Figure 6: An illustration of chemical acyclic graphs G with diameter dia * and bl 2 (G) = 2, 3: (a) A chemical acyclic graph G with two leaf 2-branches v 1 and v 2 ; (b) A chemical acyclic graph G with three leaf 2-branches v 1 , v 2 and v 3 .
, where dgi denotes the number of vertices with degree i.
Henceforth we deal with vectors w w w that have their w w w in and w w w ex components, both w w w in , w w w ex ∈ Z Λ∪Γ∪Bc∪Dg + , and for convenience we write w w w = (w w w in , w w w ex ) in the sense of concatenation.
Throughout the section, let k * = 2 be a branch-parameter, , and dia * be an integer. We infer a chemical acyclic graph ). Note that any other descriptors of G ∈ G(x x x * ) can be determined by the entries of vector x x x * .
To infer a chemical acyclic graph G ∈ G(x x x * ), we consider a connected subgraph T of G that consists of -a subtree of the 2-branch-subtree G ′ of G and -the 2-fringe-trees rooted at vertices in G ′ .
Our method first generates a set F T of all possible rooted trees T that can be a 2-fringe-tree of a chemical graph G ∈ G(x x x * ), and then extends the trees T by repeatedly appending a tree in F T until a chemical graph G ∈ G(x x x * ) is formed. In the extension, we actually manipulate the "frequency vectors" of trees defined below.
To specify which part of a given tree T plays a role of 2-internal vertices/edges or 2-external vertices/edges in a chemical graph G ∈ G(x x x * ) to be inferred, we designate at most three vertices r 1 (T ), r 2 (T ) and r 3 (T ) in T as terminals, and call T rooted (resp., bi-rooted and tri-rooted) if the number of terminals is one (resp., two and three). For a rooted tree (resp., bi-or tri-rooted tree) T , let V in denote the set of vertices contained in a path between two terminals of T , E in denote the set of edges in T between two vertices in V in , and define For a bi-or tri-rooted tree T , define the backbone path P T of T to be the path of T between vertices r 1 (T ) and r 2 (T ).
Given a chemical acyclic graph T , define f f f t (T ), t ∈ {in, ex} to be the vector w w w ∈ Z Λ∪Γ∪Bc∪Dg + that consists of the following entries: Our aim is to generate all chemical bi-rooted (resp., tri-rooted) trees T with diameter dia * such that f f f (T ) = x x x * .

The Idea of New Algorithms
This section describes the idea and a sketch of our new graph search algorithms.

Case of bl 2 (G) = 2
We call a chemical graph G ∈ G(x * ) with diameter dia * and bl 2 (G) = 2 a target graph.
A chemical acyclic graph G with bl 2 (G) = 2 has exactly two leaf 2-branches v i , i = 1, 2, where the length of the path between the two leaf 2-branch v 1 and v 2 of a target graph G is dia * − 2k * = dia * − 4. We observe that a connected subgraph T of a target graph G that satisfies (2) for bl 2 (G) = 2 is a chemical rooted or bi-rooted tree. We call such a subgraph T an internalsubtree (resp., end-subtree) of G if neither (resp., one) of u and v is a 2-branch in G. When u = v, we call an internal-subtree (resp., end-subtree) T of G an internal-fringe-tree (resp., end-fringetree) of G. Figure 7(a)-(d) illustrate an internal-subtree, an internal-fringe-tree, an end-subtree and an end-fringe-tree of G. Let We regard a target graph G ∈ G(x * ) with bl 2 (G) = 2 and diameter dia * as a combination of two chemical bi-rooted trees T 1 and T 2 with ℓ(P T i ) = δ i , i = 1, 2 joined by an edge e = r 1 (T 1 )r 1 (T 2 ), as illustrated in Figure 8.
We start with generating chemical rooted trees and then iteratively extend chemical bi-rooted trees T with ℓ(P T ) = 1, 2, . . . , δ 1 before we finally combine two chemical bi-rooted trees T 1 and T 2 with ℓ(P T i ) = δ i . To describe our algorithm, we introduce some notations.
-Let T (x * ) denote the set of all bi-rooted trees T (where possibly r 1 (T ) = r 2 (T )) such that f f f in (T ) ≤ x x x * in and f f f ex (T ) ≤ x x x * ex , which is a necessary condition for T to be an internal-subtree or end-subtree of a target graph G ∈ G(x * ).
-Let F T denote the set of all rooted trees T ∈ T (x * ) that can be a 2-fringe-tree of a target graph G, where T satisfies the size constraint (1) of 2-fringe-trees. Figure 7: An illustration of subtrees T of a chemical acyclic graph G in Figure 6 Figure 8: An illustration of combining two bi-rooted trees T 1 = T w w w 1 and T 2 = T w w w 2 with a new edge with multiplicity m joining vertices r 1 (T 1 ) and r 1 (T 2 ) to construct a target graph G, where a i ∈ Λ, end denote the set of all bi-rooted trees T ∈ T (x * ) that can be an end-subtree of a target graph G such that ℓ(P T ) = h, and each 2-fringe-tree T v rooted at a vertex v in P T belongs to F T .
We remark that the size |T (h) end | of trees will be enormously large for n * ≥ 25 and dia * ≥ 10. This suggests that construction of a target graph G by enumerating trees in T (h) end directly never works for such a large size of instances. The idea of our new algorithm is to compute only the set W (h) end of frequency vectors w w w of these trees, whose size |W (h) end | is much more restricted than that of T (h) end . We compute the set W (h) of frequency vectors w w w of trees in T (h) end iteratively for each integer h ≥ 0. During the computation, we keep a sample of a tree T w w w for each of such frequency vectors w w w so that a final step can construct some number of target graphs G by assembling these sample trees. Based on this, we generate target graphs G ∈ G(x * ) by the following steps: T ∈ T (x x x * ) (where r 1 (T ) = r 2 (T )) that can be a 2-fringe-tree of a target graph G ∈ G(x * ); (ii) Compute the set W (0) of all vectors w w w = (w w w in , w w w ex ) such that w w w in = f f f in (T ) and w w w ex = f f f ex (T ) for some tree T ∈ F T ; (iii) For each vector w w w = (w w w in , w w w ex ) ∈ W (0) , choose a sample tree T w w w ∈ F T such that w w w in = f f f in (T ) and w w w ex = f f f ex (T ), and store these sample trees; 2. For each integer h = 1, 2, . . . , δ 2 , iteratively execute the next: (i) Compute the set W (h) end of all vectors w w w = (w w w in , w w w ex ) such that w w w in = f f f in (T ) and w w w ex = f f f ex (T ) for some bi-rooted tree T ∈ T (h) end , where such a vector w w w is obtained from a combination of vectors w w w ′ ∈ W (0) and w w w ′′ ∈ W end , store a sample tree T w w w , which is obtained from a combination of sample trees T w w w ′ with w w w ′ ∈ W (0) and T w w w ′′ with w w w ′′ ∈ W (h−1) end ; 3. We call a pair of vectors w w w 1 ∈ W (δ 1 ) Find the set W pair of all feasible pairs of vectors w w w 1 and w w w 2 ; 4. For each feasible vector pair (w w w 1 , w w w 2 ) ∈ W pair , construct a corresponding target graph G by combining the corresponding samples trees T w w w 1 and T w w w 2 , as illustrated in Figure 8.
For a relatively large instance with n * ≥ 40 and dia * ≥ 20, the number |W pair | of feasible vector pairs in Step 4 is still very large. In fact, the size |W Our algorithm also delivers a lower bound on the number of all target graphs G ∈ G(x * ) in the following way. In Step 1, we also compute the number t(w w w) of trees T ∈ F T such that w w w = f f f (T ) for each w w w ∈ W (0) . In Step 2, when a vector w w w is constructed from two vectors w w w ′ and w w w ′′ , we iteratively compute the number t(w w w) of trees T such that w w w = f f f (T ) by t(w w w) := t(w w w ′ ) × t(w w w ′′ ). In Step 3, when a feasible vector pair (w w w 1 , w w w 2 ) ∈ W pair is obtained, we know that the number of the corresponding target graphs G is t(w w w 1 ) × t(w w w 2 ). Possibly we compute a subset W pair of W pair in Step 3. Then (1/2) (w w w 1 ,w w w 2 )∈ W pair t(w w w 1 ) ×t(w w w 2 ) gives a lower bound on the number of target graphs G ∈ G(x * ), where we divided by 2 since an axially symmetric target graph G can correspond to two vector pairs in W pair .
Detailed descriptions of the five steps in the above algorithm can be found in Appendix C.

Case of bl 2 (G) = 3
We call a chemical graph G ∈ G(x * ) with diameter dia * and bl 2 (G) = 3 a target graph. Let n * inl a∈Λ x x x * in (a), which is the number of 2-internal vertices in a target graph G ∈ G(x x x * ). A chemical acyclic graph G with bl 2 (G) = 3 has exactly three leaf 2-branches v i , i = 1, 2 and exactly one 2-internal vertex v 4 adjacent to three 2-internal vertices v i , i = 1, 2, 3, as illustrated in Figure 6(b). We call vertex v 4 the joint-vertex of G. Without loss of generality assume that the length of the path P v 1 ,v 2 between v 1 and v 2 is dia * − 4 and that the length of the path P v 1 ,v ′ 1 is not smaller than that of P v 2 ,v ′ 2 . Analogously with the case of bl 2 (G) = 2, we define internal-subtree (resp., end-subtree, internalfringe-tree and end-fringe-tree) of G to be a connected subgraph G ′ that satisfies (2). Observe that G can be partitioned into three end-subtrees T i , i = 1, 2, 3, the 2-fringe-tree T 4 rooted at the jointvertex v 4 and three edges v i v 4 , i = 1, 2, 3, where the backbone path P T i connects leaf 2-branch v i and vertex v ′ i . In particular, we call the end-subtree of G that consists of T 1 , T 2 , T 4 and edges v i v 4 , i = 1, 2 the main-subtree of G, which consists of the path P v 1 ,v 2 and all the 2-fringe-trees rooted at vertices in P v 1 ,v 2 . We call T 3 the co-subtree of G.
Let δ i , i = 1, 2, 3 denote the length of the backbone path of T i . Note that We regard a target graph G ∈ G(x * ) with bl 2 (G) = 3 and diameter dia * as a combination of the main-subtree and the co-subtree joined with an edge. We represent the co-subtree as a chemical bi-rooted tree T with ℓ(P T ) = δ 3 . We represent the main-subtree of a target graph G as a tri-rooted tree T with ℓ(P T ) = dia − 4 so that terminals r 1 (T ), r 2 (T ) and r 3 (T ) correspond to the two leaf 2-branches and the joint-vertex of G, respectively.  Figure 9: An illustration of combining a tri-rooted T 1 = T w w w 1 and a bi-rooted tree T 2 = T w w w 2 with a new edge joining vertices r 1 (T 1 ) and r 1 (T 2 ) to construct a target graph G.
We start with generating chemical rooted trees and then iteratively extend chemical bi-rooted trees T with ℓ(P T ) = 1, 2, . . . , dia * − 6 − δ 3 before we combine two chemical bi-rooted trees T ′ 1 and T ′ 2 to obtain a chemical tri-rooted tree T 1 with ℓ(P T 1 ) = δ i and finally combine a chemical tri-rooted tree T 1 and a chemical bi-rooted trees T 2 with ℓ(P T 2 ) = δ 3 , to obtain a target graph G ∈ G(x * ).
Analogously with the case of bl 2 (G) = 2, we define the set T (x * ) of all bi-rooted trees T , the set F T of all rooted trees T ∈ T (x * ) that can be a 2-fringe-tree of a target graph G and the set T (h) end , h ∈ [1, dia * − 6 − δ 3 ]) of all bi-rooted trees T ∈ T (x * ) that can be an end-subtree of a target graph G such that ℓ(P T ) = h.
We generate target graphs G ∈ G(x * ) by the following steps: 1. Analogously with Step 1 for the case of bl 2 (G) = 2, compute the set F T and the set W (0) of all vectors w w w = (w w w in , w w w ex ) such that w w w in = f f f in (T ) and w w w ex = f f f ex (T ) for some tree T ∈ F T . For each vector w w w ∈ W (0) , store a sample tree T w w w ∈ F T ; 2. For each integer h = 1, 2, . . . , dia * − 6 − δ 3 , compute the set W of all vectors w w w = (w w w in , w w w ex ) such that w w w in = f f f in (T ) and w w w ex = f f f ex (T ) for some tri-rooted tree T that represents the main-subtree such that the length of the path P r 2 (T ),r 3 (T ) between terminals r 2 (T ) and r 3 (T ) is δ 1 + 1. For each vector w w w ∈ W (δ 1 +1) main , store a sample tree T w w w ;

We call a pair of vectors
Find the set W pair of all feasible pairs of vectors w w w 1 and w w w 2 ; 6. For each feasible vector pair (w w w 1 , w w w 2 ) ∈ W pair , construct a corresponding target graph G by combining the samples trees T w w w 1 and T w w w 2 , which correspond to the main-subtree and the co-subtree of a target graph G, respectively, as illustrated in Figure 9.
Detailed descriptions of the six steps in the above algorithm can be found in Appendix C.

Experimental Results
We implemented our method of Stages 1 to 5 for inferring chemical acyclic graphs and conducted experiments to evaluate the computational efficiency for three chemical properties π: octanol/water partition coefficient (Kow), boiling point (Bp) and heat of combustion (Hc Results on Phase 1. We implemented Stages 1, 2 and 3 in Phase 1 as follows. Stage 1. We set a graph class G to be the set of all chemical acyclic graphs, and set a branchparameter k * to be 2. For each property π ∈ {Kow, Bp, Hc}, we first select a set Λ of chemical elements and then collected a data set D π on chemical acyclic graphs over the set Λ of chemical elements provided by HSDB from PubChem. To construct the data set, we eliminated chemical compounds that have at most three carbon atoms or contain a charged element such as N + or an element a ∈ Λ whose valence is different from our setting of valence function val. Table 1 shows the size and range of data sets that we prepared for each chemical property in Stage 1, where we denote the following: Stage 2. We used a feature function f that consists of the descriptors defined in Section 2. π is used for a test set in five trials i ∈ [1,5]. For a set {y 1 , y 2 , . . . , y N } of observed values and a set {ψ 1 , ψ 2 , . . . , ψ N } of predicted values, we define the coefficient of determination to be R 2 1 − j∈ [1,N] Table 2 shows the results on Stages 2 and 3, where -K: the number of descriptors for the chemical compounds in data set D π for property π; -test R 2 (best): the largest value of coefficient of determination over the five test sets.
From Table 2, we see that the execution of Stage 3 was successful, where the average of test R 2 is over 0.9 for all three chemical properties.
For each chemical property π, we selected the ANN N that attained the best test R 2 score among the five ANNs to formulate an MILP M(x, y, z; C 1 ) which will be used in Phase 2.
Results on Phase 2. We implemented Stages 4 and 5 in Phase 2 as follows.
Stage 4. In this step, we solve the MILP M(x, y, g; C 1 , C 2 ) formulated based on the ANN N obtained in Phase 1. To solve an MILP in Stage 4, we use CPLEX version 12.8. In our experiment, we choose a target value y * ∈ [a, a]. and fix or bound some descriptors in our feature vector as follows: -Set the 2-leaf-branch number bl * to be each of 2 and 3; -Fix the instance size n * = n(G) to be each integer in {26, 32, 38, 44, 50}; -Set the diameter dia * = dia(G) be one of the integers in {⌈(2/5)n * ⌉, ⌈(3/5)n * ⌉}.
Based on the above setting, we generated six instances for each instance size n * . We set ε = 0.02 in Stage 4.
Observe that most of the MILP instances with bl * = 2, n * ≤ 50 and dia * ≤ 30 (resp., bl * = 3, n * ≤ 50 and dia * ≤ 30) in one minute (resp., in a few minutes). The previously most efficient MILP formulation for inferring chemical acyclic graphs due to Zhang et al. [24] could solve an instance with only up to n * = 20 for the case of d max = 4 and dia * = 9. Our new MILP formulation on chemical acyclic graphs with bounded 2-branch height considerably improved the tractable size of chemical acyclic graphs in Stage 4 for the inference problem (II-a). Figure 10(a)-(c) illustrate some chemical acyclic graphs G with bl 2 (G) = 2 obtained in Stage 4 by solving an MILP. Remember that these chemical graphs obey the AD D defined in Appendix A.  Stage 5. In this stage, we execute our new graph search algorithms for generating target graphs G ∈ G(x x x * ) with bl 2 (G) ∈ {2, 3} for a given feature vector x x x * obtained in Stage 4.
We introduce a time limit of 10 minute for each iteration h in Step 2 and an execution of Steps 1 and 3 for bl * = 2 (resp., each iteration h in Steps 2 and 3 and δ 1 in Step 4 and an execution of Steps 1 and 5 for bl * = 3). In the last step, we choose at most 100 feasible vector pairs and generate a target graph from each of these feasible vector pairs. We also impose an upper bound UB on the size |W| of a vector set W that we maintain during an execution of the algorithm. We executed the algorithm for each of the three bounds UB = 10 6 , 10 7 , 10 8 until a feasible vector pair is found or the running time exceeds a global time limitation of two hours. When no feasible vector pair is found by the graph search algorithms, we output the target graph G * constructed from the vector g * in Stage 4.
Tables 3 to 4 (resp., Tables 5 to 6) show the results on Stage 5 for bl * = 2 (resp., bl * = 3), where we denote the following: -#FP: the number of feasible vector pairs obtained by an execution of graph search algorithm for a given feature vector x x x * ; -G-LB: a lower bound on the number of all target graphs G ∈ G(x x x * ) for a given feature vector x x x * ; -#G: the number of all (or up to 100) chemical acyclic graphs G such that f (G) = x * (where at least one such graph G has been found from the vector g * in Stage 4); -G-time: the running time (sec.) to execute Stage 5 for a given feature vector x x x * . "> 2 hours" means that the running time exceeds two hours.
Previously an instance of chemical acyclic graphs with size n * up to 16 was solved in Stage 5 by Azam et al. [3]. For the classes of chemical graphs with cycle index 1 and 2, the maximum size of instances solved in Stage 5 by Ito et al. [3] and Zhu et al. [25] was around 18 and 15, respectively. Our new algorithm based on dynamic programming solve instances with n * = 50. In our experiments, we also computed a lower bound G-LP on the number of target graphs. Observe that there are over 10 10 or 10 14 target graphs in some cases. Remember that these lower bounds are computed without actually generating each target graph one by one. So when a lower bound is enormously large, this would suggest that we may need to impose some more constraints on the structure of graphs or the range of descriptors to narrower a family of target graphs to be inferred. An Additional Experiment. We also conducted some additional experiment to demonstrate that our MILP-based method is flexible to control conditions on inference of chemical graphs. In Stage 3, we constructed an ANN N π for each of the three chemical properties π ∈ {Kow, Bp, Hc}, and formulated the inverse problem of each ANN N π as an MILP M π . Since the set of descriptors is common to all three properties Kow, Bp and Hc, it is possible to infer a chemical acyclic graph G that satisfies a target value y * π for each of the three properties at the same time (if one exists). We specify the size of graph so that n * = 50, bl * = 2, dia * = 25 and d max = 4, and set target values with y * Kow = 4.0, y * Bp = 400.0 and y * Hc = 13000.0 in an MILP that consists of the three MILP M Kow , M Hc and M Bp . The MILP was solved in 18930 (sec) and we obtained a chemical acyclic graph G illustrated in Figure 12. We continued to execute Stage 5 for this instance to generate more target graphs G * . Table 7 shows that 100 target graphs are generated by our new dynamic programming algorithm.

Concluding Remarks
In this paper, we introduced a new measure, branch-height of a tree, and showed that many of chemical compounds in the chemical database have a simple structure where the number of 2branches is small. Based on this, we proposed a new method of applying the framework for inverse QSAR/QSPR [3,5,24] to the case of acyclic chemical graphs where Azam et al. [3] inferred chemical graphs with around 20 non-hydrogen atoms and Zhang et al. [24] solved an MILP of inferring a feature vector for an instance with up to around 50 non-hydrogen atoms and diameter 8. In our method, we formulated a new MILP in Stage 4 specialized for acyclic chemical graphs with a small branch number and designed a new graph search algorithm in Stage 5 that computes frequency vectors of graphs in a dynamic programming scheme. We implemented our new method and conducted some experiments on chemical properties such as octanol/water partition coefficient, boiling point and heat of combustion. The resulting method improved the performance so that chemical graphs with around 50 non-hydrogen atoms and around diameter 30 can be inferred. Since there are many acyclic chemical compounds having large diameters, this is a significant improvement.
It is left as a future work to design MILPs and graph search algorithms based on the new idea of the paper for classes of graphs with a higher rank. [20] Suzuki M, Nagamochi H, Akutsu T, Efficient enumeration of monocyclic chemical graphs with given path frequencies, Journal of Cheminformatics, vol. 6, no. 1, p. 31, 2014.
[24] Zhang F, Zhu J, Chiewvanichakorn C, Shurbevski A, Nagamochi H, Akutsu T, A new integer linear programming formulation to the inverse QSAR/QSPR for acyclic chemical compounds using skeleton trees, The 33rd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, September 22-25, 2020 Kitakyushu, Japan (to appear).
[25] Zhu J, Wang C Shurbevski A, Nagamochi H, Akutsu T, A novel method for inference of chemical compounds of cycle index two with desired properties based on artificial neural networks and integer programming, Algorithms, vol. 13, no. 5, 124, 2020.

A Statistical Feature of Molecular Structure
We observe the following features of the graph-theoretical structure of chemical graphs registered in the chemical database PubChem. Let DB (≤n) denote the set of chemical graphs with at most n non-hydrogen atoms that are registered in chemical database PubChem. The cycle index (or rank) of a chemical graph G = (H = (V, E), α, β) is defined to be |E| − (|V | − 1) (i.e., the minimum number of edges to be removed to make the graph H acyclic). We call a chemical graph a rank-r chemical graph if the rank of the graph is r. The core of a chemical cyclic graph G is defined to be the induced subgraph G ′ of G such that G ′ consists of vertices in a cycle or vertices in a path joining two cycles. A vertex in the core (not in the core) is called a core vertex (resp., a non-core vertex). The edges not in the core of a chemical cyclic graph G form a collection of trees T , which we call a non-core tree. Each non-core tree contains exactly one core vertex and is regarded as a tree rooted at the core vertex. The k-branch height of a chemical cyclic graph G is defined to be the maximum of k-branch heights over all non-core trees. Let ρ r (%) denote the ratio of the number of chemical graphs with rank at most r ∈ [0, 4] to the number of all chemical graphs in PubChem. See Table 8. (%) denote the ratio of the number of chemical graphs in DB (≤100) such that the maximum degree is at most d ∈ [3,4] to the number of all chemical graphs in DB (≤100) . Let ρ (d) r (%), r ∈ [1,4] denote the ratio of the number of rank-r chemical graphs in DB (≤100) such that the maximum degree of a non-core vertex is at most d ∈ [3,4] to the number of all rank-r chemical graphs in DB (≤100) . See Table 9.   Let ρ r (k, h) (%), r ∈ [0, 4], k = 2, h ∈ [1, 2] denote the ratio of the number of rank-r chemical graphs in DB (≤50) such that the k-branch height is at most h to the number of all rank-r chemical graphs in DB (≤50) . See Table 10. We see that most chemical graphs G with at most 50 nonhydrogen atoms satisfy bh 2 (G) ≤ 2. Table 10: The percentage ρ r (k, h) (%) of the number of rank-r chemical graphs in DB (≤50) such that the k-branch height is at most h to the number of all rank-r chemical graphs in DB (≤50) . ρ 0 (2, 1) ρ 0 (2, 2) ρ 1 (2, 1) ρ 1 (2, 2) ρ 2 (2, 1) ρ 3 (2, 1) ρ 4 (2, 1) 87.23% 99.46% 88.13% 98.76% 96.39% 99.17% 99.43% We show the distribution of 2-branch-height over alkans C n H 2n+2 . Let Aln(n) denote the set of all alkans with n carbon atoms, where |Aln(25)| = 36, 797, 588. Let ρ Aln (2, h) (%), h ∈ [1,4] denote the ratio of the number of alkans in Aln(25) such that the 2-branch height is at most h to the number of alkans in Aln(25). See Table 11. Let ρ 2bt (δ) denote the ratio of the number of acyclic chemical graphs in DB (≤50) such that the degree of the root of the 2-branch-tree is δ ∈ [1,4] to the number of all acyclic chemical graphs in DB (≤50) . See Table 12. Table 12: The percentage ρ 2bt (δ) of the number of acyclic chemical graphs in DB (≤50) such that the degree of the root of the 2-branch-tree is δ ∈ [1,4] to the number of all acyclic chemical graphs in DB (≤50) . (4) 6.39% 83.58% 9.30% 0.73% Among the 2-fringe-trees T of all acyclic chemical graphs in DB (≤100) , over 90% of them satisfy n ≤ 2d + 2 for the number n = |V (T )| of non-hydrogen atoms in a 2-fringe-tree T and the number d of non-hydrogen atoms adjacent to the root in T .

B All Constraints in an MILP Formulation for Chemical Acyclic Graphs
To formulate an MILP that represents a chemical graph, we distinguish a tuple (a, b, m) from a tuple (b, a, m).
where the latter is assumed because otherwise G must consist of two atoms of a = b. Assume that each tuple γ ∈ Γ is proper. Let ǫ be a fictitious chemical element that represents null, call a tuple (a, b, 0) with a, b ∈ Λ ∪ {ǫ} fictitious, and define Γ 0 to be the set of all fictitious tuples; i.e., To represent chemical elements e ∈ Λ ∪ {ǫ} ∪ Γ in an MILP, we encode these elements e into some integers denoted by [e]. Assume that, for each element a ∈ Λ, [a] is a positive integer and that [ǫ] = 0.
We use the range-based method to define an applicability domain for our method. For this, we find the range (the minimum and maximum) of each descriptor over all relevant chemical compounds and represent each range as a set of linear constraints in the constraint set C 1 of our MILP formulation. Recall that D π stands for a set of chemical graphs used for constructing a prediction function. However, the number of examples in D π may not be large enough to capture a general feature on the structure of chemical graphs. For this, we also use some data set from the whole set DB of chemical graphs in a data base. Let DB (i) G denote the set of chemical graphs G ∈ DB ∩ G such that n(G) = i for each integer i ≥ 1. Based on this, we assume that the given lower and upper bounds on the above descriptors satisfy the following. For each t ∈ {in, ex},

B.2 Construction of Scheme Graph
We infer a subgraph H such that the maximum degree is d max ∈ {3, 4}, n(H) = n * , bh k * (H) = bh * and bl k * (H) = bl * . For this, we first construct the scheme graph SG(d max , k * , bh * , t * ). We then prepare a binary variable u(s, i) (resp., v(t, i)) for each vertex u s,i in tree S s (resp., v t,i in tree T t ).
Recall that when the two end-vertices of edge a i = (u s,1 , u s ′ ,1 ) ∈ E B = {a 1 , a 2 . . . , a c * } is connected in a selected subgraph H, either edge a i is directly used in H or a path P i = (u s,1 , v t ′ ,1 , v t ′ +1,1 , . . . , v t ′′ ,1 , u s ′ ,1 ) from u s,1 to u s ′ ,1 visiting some vertices in P t * is constructed in H. We regard the index i of each edge a i ∈ E B = {a 1 , a 2 . . . , a c * } as the "color" of the edge, and define the color set of E B to be [1, c * ]. To introduce necessary linear constraints that can construct such a path P i properly in our MILP, we assign the color i to the vertices v t ′ ,1 , v t ′ +1,1 , . . . , v t ′′ ,1 in P t * when a path P i = (u s,1 , v t ′ ,1 , v t ′ +1,1 , . . . , v t ′′ ,1 , u s ′ ,1 ) is used in H.

B.4 Assigning Multiplicity
We prepare an integer variable β(e) or β(e) for each edge e in the scheme graph SG(d max , k * , bh * , t * ) to denote the multiplicity of e in a selected graph H and include necessary constraints for the variables to satisfy in H.

constants:
Prepare functions tail and head such that a i = (u tail(i) , u head(i) ) ∈ E B ; Assume that each edge in a tree S s , s ∈ [1, s * ] (resp., T t , t ∈ [1, t * ]) is denoted by e ′ s,i (resp., e t,i ) with the integer i ∈ [2, n S tree ] of the head u s,i (resp., v t,i ) of the edge.

B.5 Assigning Chemical Elements and Valence Condition
We include constraints so that each vertex v in a selected graph H satisfies the valence condition; i.e., uv∈E(H) β(uv) ≤ val(α(u)). With these constraints, a chemical acyclic graph G = (H, α, β) on a selected subgraph H will be constructed.

C.2 Sets of Frequency Vectors
For an element a ∈ Λ and integers d ) of a chemical rooted tree T such that r 1 (T ) = r 2 (T ), the height of T is at most 2, α(r 1 (T )) = a, deg T (r 1 (T )) = d, and β(r 1 (T )) = m.
Step 1 first computes the set F T of all possible chemical rooted trees T ∈ T (x x x * ) (where r 1 (T ) = r 2 (T )) that can be a 2-fringe-tree of a target graph G ∈ G(x * ). For this, we design a branch-and-bound procedure where we append a new vertex one by one to construct a rooted tree We next enumerate chemical trees T ∈ T (x * ) rooted a vertex r with α(r) = a that has two or three children by generating a combination of two or three graphs in T a . During generating graphs, our bounding procedure tests whether the current graph satisfies the necessary condition in Lemma 3(ii).
Observe that each vector w w w = (w w w in , w w w ex ) ∈ W (h) end (a, d, m) is obtained from a combination of vectors w w w ′ = (w w w ′ in , w w w ′ ex ) ∈ W    inl (a, d − 1, m ′ ) and w w w ′′ ∈ W (h−1) end (b, d ′′ , m ′′ ), we construct a sample tree T w w w from their sample trees T w w w ′ and T w w w ′′ .

C.3.3 Step 3: Enumeration of Feasible Vector Pairs
A feasible pair of vectors is defined to be a pair of vectors w w w i = (w w w i in , w w w i ex ) ∈ W x x x * in = w w w 1 in + w w w 2 in + 1 1 1 γ + 1 1 1 µ and x x x * ex = w w w 1 ex + w w w 2 ex , or equivalently w w w 1 is equal to the vector (x x x * in − w w w 2 in − 1 1 1 γ − 1 1 1 µ , x x x * ex − w w w 1 ex ), which we call the (γ, µ)complement of w w w 2 , and denote it by w w w 2 .
The main task of Step 3 is to enumerate all feasible vector pairs (w w w 1 , w w w 2 ), w w w i ∈ W To efficiently search for a feasible pair of vectors in two sets W (δ i ) end (a i , d i , m i ), i = 1, 2, we first compute the (γ, µ)-complement vector w w w of each vector w w w ∈ W (δ 2 ) end (a 2 , d 2 , m 2 ) for each pair of γ = (a 1 , a 2 , m) ∈ Γ and µ = (d 1 +1, d 2 +1, m) ∈ Bc with m ∈ [1, min{3, val(a 1 )−m 1 , val(a 2 )−m 2 }], and denote by W (δ 2 ) end the set of the resulting (γ, µ)-complement vectors. Observe that (w w w 1 , w w w 2 ) is a feasible vector pair if and only if w w w 1 = w w w 2 . To find such pairs, we merge the sets W (δ 1 ) end (a 1 , d 1 , m 1 ) and W (δ 2 ) end into a sorted list L γ,µ . Then each consecutive pair of vectors z z z 1 , z z z 2 ∈ L γ,µ gives a feasible pair of vectors z z z 1 and z z z 2 .

C.3.4 Step 4: Construction of Chemical Graphs
The task of Step 4 is to construct for each feasible vector pair w w w i ∈ W (δ i ) end (a i , d i , m i ), i = 1, 2 such that w w w 1 is equal to the (γ = (a 1 , a 2 , m), µ)-complement vector w w w 2 of w w w 2 , construct a target graph T (w w w 1 ,w w w 2 ) ∈ G(x x x * ) by combining the sample trees T i = T w w w i of vectors w w w i with an edge e = r 1 (T 1 )r 1 (T 2 ) such that β(e) = m. Figure 8 illustrates two sample trees T i , i = 1, 2 to be combined with a new edge e = r 1 (T 1 )r 1 (T 2 ). ) and f f f (T [+3])) of chemical rooted trees T such that r 1 (T ) = r 2 (T ), α(r 1 (T )) = a, deg T (r 1 (T )) = d and β(r 1 (T )) = m. For each vector w w w ∈ W (0) end (a, d, m) (resp., w w w ∈ W (0) inl (a, d, m) and w w w ∈ W (0) inl+3 (a, d, m)), we store a sample tree T w w w . This step can be designed in a similar way of Step 1 for the case of bl 2 (G) = 2.  For each vector w w w ∈ W (h) end (a, d, m), we construct a sample tree T w w w from their sample trees T w w w ′ and T w w w ′′ .

C.4.3 Step 3: Generation of Frequency Vectors of End-subtrees with Two Fictitious Edges
The main task of Step 3 is to compute the following sets: for elements a ∈ Λ, integers d ∈
See Figure 9 for the structure of a main-tree. Such a chemical tri-rooted graph T corresponds to the main-subtree of a target graph G ∈ G(x * ).
The main task of Step 4 is to compute the sets W end (a ′ , d ′ , m ′ ) such that δ 1 +δ 2 = dia * −4 and δ 1 ≥ δ 2 , as illustrated in Figure 17. For each vector w w w ∈ W (δ 1 +1) main (a, d, m), we store a sample tree T w w w . This step can be designed in a similar way of Step 3 for the case of bl 2 (G) = 2.
Step 5 computes the set all feasible vector pairs (w w w 1 , w w w 2 ) by using a sorting algorithm as in the Step 4 for the case of bl 2 (G) = 2.

C.4.6 Step 6: Construction of Chemical Graphs
Analogously with Step 4 for the case of bl 2 (G) = 2, Step 6 constructs a target graph T (w w w 1 ,w w w 2 ) ∈ G(x x x * ) for each feasible vector pair (w w w 1 , w w w 2 ) by combining the sample trees T i = T w w w i of vectors w w w i with a new edge e = r 1 (T 1 )r 1 (T 2 ).