Inverse bifurcation analysis: application to simple gene systems

Background Bifurcation analysis has proven to be a powerful method for understanding the qualitative behavior of gene regulatory networks. In addition to the more traditional forward problem of determining the mapping from parameter space to the space of model behavior, the inverse problem of determining model parameters to result in certain desired properties of the bifurcation diagram provides an attractive methodology for addressing important biological problems. These include understanding how the robustness of qualitative behavior arises from system design as well as providing a way to engineer biological networks with qualitative properties. Results We demonstrate that certain inverse bifurcation problems of biological interest may be cast as optimization problems involving minimal distances of reference parameter sets to bifurcation manifolds. This formulation allows for an iterative solution procedure based on performing a sequence of eigen-system computations and one-parameter continuations of solutions, the latter being a standard capability in existing numerical bifurcation software. As applications of the proposed method, we show that the problem of maximizing regions of a given qualitative behavior as well as the reverse engineering of bistable gene switches can be modelled and efficiently solved.


Background
The use of mathematical models provides tools for the analysis of complex molecular interactions aiming at an understanding of processes occurring in living cells. For many problems in cellular control, stochastic effects and time-delays can be ignored and systems of first-order ordinary differential equations (ODEs) can adequately model the underlying processes. Denoting by x and p the biochemical concentrations and parameters, respectively, the instantaneous change in x is described by the vector field f: (x, p). (1) In the study of such systems, an important goal is to understand how the observed physiological behavior arises out of gene network topology and parameters p. Some of these questions may be studied via examining the bifurcation manifolds Σ of the ODE system, which partition the parameter space into regions of different qualitative behavior (see e.g., [1] for a general overview to bifurcation theory). From ODE models and measured parameters, the forward problem of computing the bifurcation diagram has contributed significantly towards elucidating the complex mechanisms underlying cellular processes. For instance, mathematical and symbolic bifurcation analysis has led to an understanding of the possible dynamical behaviors that may arise out of simple gene systems (for a monograph, see [2], examples of more recent papers dealing with natural, designed, and model systems are [3][4][5][6][7]). For cell cycle models, bifurcation diagrams have given biologists a systems-level perspective of the roles played by the various constituent modules, as well as providing the ability to predict the behavior of mutant cells [8,9]. The desire to locate regions in parameter space exhibiting interesting dynamics has led to the development of computational tools for the automatic discovery of bifurcation sets [10].
In contrast, inverse problems have only recently attracted attention in biology as a way to unravel the workings of cellular mechanisms. In inverse problems one looks for causes for observed or desired effects [11]. Mathematically, such problems are typically ill-posed, in particular unstable; special mathematical techniques, called "regularization methods", have to be used to cope with this illposedness. Many variational and iterative regularization techniques have been developed over the years and applied to a variety of problems in science, engineering and finance (see [11] and some references quoted there, and [12]).
In the current context of cell biology, one would like to address problems such as: which parameter configurations lead to an observed qualitative behaviour of the system ("identification")? How can one introduce a certain qualitative behaviour into the system via parameter variations ("design")? We summarize such problems under the name of inverse bifurcation problems, where the task is to map the space of bifurcation diagrams back to the space of parameters. In particular, we consider inverse bifurcation problems of two types: identification and design. For the former, one would like to infer parameter values from observed bifurcation diagrams and hence the issue of uniqueness is typically of concern. For the latter, one is interested in parameter values that produce the desired outcome, hence uniqueness is not an issue. The notion of "inverse bifurcation" was first introduced into biology in [13]; more recently, another inverse bifurcation problem from mathematical ecology was studied in [14] by integral equation methods. These papers are concerned with the existence and uniqueness of nonlinear terms in equations realizing prescribed bifurcation diagrams and use analytical methods. In contrast, given the size and complexity of gene networks, we take a computational approach in addressing problems of inverse bifurcation. In this paper, we consider problems of moderate size for which the illposedness is not yet a crucial issue. Since ill-posedness increases with dimension, a major issue for larger problems will be to use appropriate regularization techniques.
For the design type of inverse bifurcation problems, there exists previous work in the engineering literature: The dis-tance to bifurcation manifolds has been introduced to quantify the "parametric robustness" of system designs in [15]; in the context of design of chemical processes, optimization problems with constraints involving distance to bifurcation manifolds have been treated in [16]. For a recent review see [17]. For biological applications as we have them in mind, this issue of parametric robustness is also important. In addition, other geometric properties of the bifurcation diagram are of interest. These include the size of the parameter region resulting in bistability of solutions and the parametric distance between regions of different qualitative behavior. We will develop methodolgies by which inverse bifurcation problems involving the optimization of such quantities can be solved.
In this paper, we show the applicability of our inverse bifurcation algorithms to low dimensional gene systems. We formulate the inverse problems as constrained optimization problems, whose objective function and constraints involve geometric properties of bifurcation diagrams. We demonstrate that these problems can be solved efficiently by applying gradient-based nonlinear optimization algorithms in combination with oneparameter continuation methods to locate bifurcation points. The latter is a standard capability provided by existing bifurcation analysis software (see [1] for references to state-of-the-art numerical implementations).

Inverse bifurcation analysis
The ODE model (1) defines a mathematical relationship between the parameters p and the time course of biochemical concentrations, x(t). Of the set of all parameters, some describe the biochemical mechanisms constituting the machinery of the regulatory system, while others are parameters to whose variation the gene system should respond. In this paper, we refer to the former as the system parameters and the latter as the control inputs.
In situations where the qualitative behavior of the regulatory system changes in response to shifts in the control input, the corresponding bifurcation diagram is of importance. In particular, the proper working of the regulatory system may depend on the locations, shapes and sizes of the various regions of qualitative behavior. For instance, in the cell cycle model [8] the correct spatial relationship (in parameter space) of the transitions points is crucial for ensuring the survival of cells. In fact, the model predicts that "dynamically challenged" mutants with shifted bifurcation diagrams may suffer irrecoverably. The resulting effects include difficulties exiting mitosis and decreases in cell mass after each cell division. For this particular model, the inverse bifurcation problem is to map geometric as well as topological relationships in the bifurcation diagrams to conditions on the parameters. Another application of inverse bifurcation in the cell cycle model is for the so-called "Pinocchio effect" involving check points. This effect occur when the necessary conditions for safe progression to the next cell phase are not satisfied hence the regulatory system should delay the state transition as far as possible. Here, the inverse bifurcation problem is to find out how the system may be constructed so that, when triggered by the presence or absence of certain chemicals, the range of bistability is maximized.
In the case of system parameters, they are either regulated by other control mechanisms in the cell, or the behavior of the system is constructed to be insensitive to variation in these parameters [18]. Within an operating region in the control input space, the minimal distance in the system parameter space to bifurcation manifolds provides a quantitative measure of the robustness of the system to environmental perturbations. Here, the inverse bifurcation problem would involve finding the parameter combinations so that the system bifurcation diagram is insensitive to the unregulated parameters. Biological applications of this class of problems include homeostasis and rhythmic pacemakers. Again, the problem of interest is to map some shape property of the bifurcation diagram to conditions on the parameter space.
For inverse bifurcation problems of biological interest such as those described above, the question arises as to how to formulate them mathematically so that the solution can be obtained in a computationally tractable and stable manner. Typically, in biological applications the parameter space is of high dimension. Furthermore, in carrying out inverse analysis, the (forward) bifurcation analysis usually has to be applied a large number of times. In most cases, the former condition precludes formulations based on the use of multi-parameter continuation techniques [19]. Instead, formulations based on continuation along rays in parameter space are preferred. Below, we formulate distance to bifurcation manifolds (first introduced in [15]) in terms of a forward operator and an l 2 functional. Subsequently, we describe the sensitivity of the minimal distance with respect to parameters by adjoint methods.
Consider the splitting of m-dimensional parameter space P ⊂ ‫ޒ‬ m into input and system parameters, p = (p i , p s ) ∈ P i × p s . For an ODE system, let Σ denote a bifurcation manifold of interest, consisting of sets in parameter space P for which structural stability breaks down [1]. For a given system parameter p s , we further define Σ(p s ) ≡ Σ ∩ {p s } as the intersection of Σ with the p s -plane. In Figure 2, the geometric relationship between Σ and Σ(p s ) is illustrated. Let the forward operator F :P → P be a mapping in parameter space, taking a given point to its orthogonal projection on Σ(p s ), assumed to be well-defined. That is, The notation is illustrated in Figure 1.
Using F(p), the inverse bifurcation problems examined in this paper are mathematically formulated as: (2) where ||·|| denotes the l 2 -norm and c : P i → ‫ޒ‬ k represents k-dimensional nonlinear constraints. In the rest of this section, we discuss adjoint sensitivity analysis of F(p), which is important for applying gradient-based optimization methods for solving (2) as well as for computing F(p) iteratively. Denote the linearization of the above map at p as F'(p). The adjoint operator F'*(p) is defined to satisfy the following: for all δp, δ ∈ ‫ޒ‬ m , <F'(p)δp, δ > = <δp, F'*(p)δ >, (3) where the notation <·, ·> denotes the l 2 -inner product. Suppose <·, l> is a linearized functional of interest on P.
Given a parameter perturbation δp, the forward functional sensitivity is then given by <F'(p)δp, l>. The same sensitivity can be obtained via the adjoint solution, ψ ≡ F'*(p)l.
From the definition of the adjoint operator (3), it follows that: That is, for all δp ∈ P the functional sensitivities can be computed via a single linear solution for ψ rather than repeated application of the linearized forward operator, For the case where the functional of interest is the l 2 -distance J(p) = ||F(p) i -p i ||, the adjoint solution for the linearization J'(p)(·) is given in terms of the vector normal to the manifold Σ. Denoting N i and N s as the components of the normal vector in P i and P s respectively, it can be shown that (up to sign), To fix the sign of ψ, the component N i is chosen to be p i - Figure 2 illustrates the components of the vector N normal to the manifold Σ.
Thus, obtaining expressions for the adjoint vector for various bifurcations of interest reduces to the problem of deriving the associated normal vectors, N s . Under certain transversality conditions, the normal vectors for several codimension-one and higher bifurcations have been derived [16,17]. Here we consider the generic codimension-one bifurcations, namely saddle-node and Hopf bifurcations. Let the left and right critical eigenvectors of f x at the given bifurcation point be denoted by w and v respectively. That is, w and v solve the following eigen-systems for the critical eigenvalue ω crit and its conjugate crit , The expressions for normal vectors are given as: where superscript H denotes conjugate transpose and λ ≡ (w H f xx v). These expressions above prescribe the components of the adjoint solution, thus enabling efficient gradient calculation via (4). Now, we briefly mention methods for computing the projection F(p). Methods of iterative and direct type for finding the (locally) closest bifurcation point have been derived [17]. In the current work we use the former approach, based on using the component of the normal vector in the input plane, N i .  Figure 3 illustrates the method in a simple example, producing a sequence of iterates ( i ) converging to the point F(p) i that is closest to a (non-convex) neighboring region with respect to p i .
Finally, we mention that the minimal distance functional can be used to model many other problems of interest. For instance, it can serve as an estimate of the separation between a reference manifold Σ ref and a given region of qualitative behavior, or the size of the region of a qualitative behavior via the maximum radius of the inscribed sphere.

Algorithm and software implementation
Here we give an outline of the algorithm for general inverse bifurcation problems. The main ingredients are applications of the projection operator F(p), as well as the adjoint, F'*(p). The computation of the former is denoted by the routine APPLYF (see Figure 4). Each time APPLYF is called, corrector steps (using, for instance, Newton's method) have to be carried out on the previously computed x init to find the initial solution for the current value of p s . Once the corrected solution x corr is computed, the iterative procedure LOCMINDIST (see Figure 5) is called to compute the nearest point on the bifurcation manifold of interest. This procedure is based on a series of oneparameter continuations and gradient calculations according to (6). The inputs include the following: v the initial search direction in parameter space; ε proj the relative tolerance on the iterative proceure. In general, several search directions (denoted by V = {v 1 ,ʜ, v max }) have to be used to approximate the globally closest point. However, for the examples shown in the paper the initial search space V is only 1 dimensional. Once F(p) and the corresponding solution are obtained, F'*(p) is then computed. The derivative information, together with constraints (p low , p upp , c(p)) and Hessian approximation (HessA) are   Figure 6 describes the inverse bifurcation algorithm.
Our Inverse Bifurcation Toolbox is an implementation of the above described algorithm. It combines the capability of several packages: the MATCONT package [20] for performing the underlying one-parameter continuations, the Algorithm LOCMlNDlST Figure 5 Algorithm LOCMlNDlST. Figure 4 Algorithm APPLYF.

Algorithm APPLYF
MATLAB Optimization Toolbox [21] for performing gradient-based constrained optimization, the MathSBML package [22] for reading in biological models in the SBML format and finally the Mathematica Symbolic Toolbox for MATLAB [23] for communications. The combination of MATLAB and Mathematica has the advantage of allowing building on existing freely-available software.
After the SBML model is read in via MathSBML, the vector fields are differentiated symbolically to obtain expressions for f x , f p , f xp and f xx . Subsequently, they are numerically evaluated when called by MATCONT. Reparametrization of parameters is done automatically to allow continuation in arbitrary directions. Presently, the toolbox is able to handle bifurcations of saddle-node and Hopf type. Augmentation to include bifurcations of limit cycles is currently underway.
Given the limitations of the current implementation, our software is suitable for handling problems involving tens of variables and parameters but possibly not suited for problem dimensionality on the order of hundreds or higher. Further algorithm development is needed to significantly upscale the method. In addition, in high dimensional applications instabilities may appear and hence appropriate regularization techniques need to be developed (e.g., stopping rules in the case of iterative methods). Figure 6 Algorithm INVERSE BIFURCATION.

Finding distance to oscillation in a 4 gene model
To understand design principles underlying biological systems, simple oscillatory and switch-like systems have been constructed experimentally. Several systems based on E. coli [24,25] have been successfully demonstrated. Recently, components of the Lac and Ntr systems have been used to construct a genetic clock that upon change in connectivity, exhibits switch behavior [26]. After ignoring the dynamics of the read-out gene, the system of three equations for the concentrations of mRNA (x 1 , x 3 , x 5 ) and proteins (x 2 , x 4 , x 6 ) is given as, where β i are rate constants. The rates of transcription are represented by the following tri-phasic functions, where g jk are kinetic-order parameters, typically less than 4.
Given a stable, steady solution for a gene system with a set of nominal parameters, a relevant engineering question is how to construct a genetic oscillator. For the current system, the analytical solutions for the bifurcation manifolds are available via the Routh-Hurwitz stability criterion ( [26], supplementary data). However, such analysis cannot be easily carried out for more complex models. Instead, computational methods have to be used. Suppose the stable system has rate parameter β 1 = β 3 = 22.2, β 2 = β 4 = 1.39, kinetic-order parameters g 32 = -2, g 12 = g 14 = g 54 = 1. If in practice only the rate parameters β i can be varied, the question is how best to construct an oscillatory system. The bifurcation diagram for the model is shown in Figure 7, where it can be seen that for the nominal values of g jk the system is stable and far away from the line of Hopf bifurcation. The inverse bifurcation question is: do there exist parameters β i such that the system is oscillatory? We consider the problem with the following parameter constraints: By minimizing the distance from the nominal parameter point (g 12 = 1, g 32 g 14 = -2) to the Hopf bifurcation line, the system can be made to lie on the boundary of the stable regime; see Figure 8. Table 1 shows the iteration counts required to obtain the result, using Sequential Quadratic Programming (SQP) with line-search [27]. Owing to the use of line-search, each optimization iteration entails a number of functional evaluations, this in turn requiring a number of one-parameter continuation iterations to find the (locally) closest point on Σ(p s ). The optimality tolerance on the objective is set as 10 -3 relative to its initial value and the relative parameter tolerance for approximating F(p) is set to 10 -4 . Clearly, the number of one-parameter continuations is significantly higher than the optimization iterations. Thus, for high dimensional examples it is important that the former can be carried out efficiently.

Maximizing regime of oscillations in repressilator system
Motivated by biological applications such as increasing the robustness of rhythmic pacemakers, we consider maximizing the region of oscillations with respect to a given operating point. We utilize a case of the generalized repressilator [7] as our test system. After non-dimensionalizing, an ODE system for the protein and RNA concentrations x i and y i is obtained, with the following dimensionless parameters: α i , β i are the ratios of degradation rates, δ i represents the leakiness of the gene transcription, h i the Hill-coefficient reflecting the degree of cooperativity of gene transcription, In this example, we take n = 3 and examine the symmetric case, i.e., α i = α, β = β, etc. Analysis shows that for ranges of parameters α and β, the system can exhibit oscillations or be in a stable, steady-state [7]. Thus, we take p i = (α, β) as the input parameters and seek values of p s = (δ, h) within the constraint set,  Figure 9 shows the initial bifurcation diagram and the computed minimal distance from the reference point. The result of the inverse bifurcation analysis shows that the optimum solution is found at (δ, h) = (10 -4 , 2); thus, the inequality constraints are active equality constraints. Moreover, the signs of the Lagrange multipliers for the inequality constraints show that the (local) optimum distance to the Hopf bifurcation manifold increases if the upper bound h upp is increased and/or the lower bound δ low is decreased. Figure 10 shows the bifurcation diagram with the optimized parameters. Table 2 shows the iteration counts required to obtain the result. The tolerances for optimization are the same as those of Section 4.1.

Engineering bistable switch in G 1 /S transition
We consider the reverse engineering of check points in mammalian cell phase transition. In particular, we use a 9 state, 40 parameter differential equation model [28]. In the publication describing the model, parameter scanning is done to find out parameter values that achieve certain behaviors, such as changing the threshold for the onset of synthesis (S) phase from the G 1 state and conditions ensuring irreversibility of the gene switch. Using our inverse bifurcation analysis, specifying a desired behavior on the transition points gives rise to conditions on the parameters in a more systematic and efficient manner. Figure 11 shows the bifurcation diagram for the regulatory module of the mammalian G 1 /S transition. The bifurcation parameter is the strength of mitogenic stimulation, F m . The quantity of primary interest is the level of E2F1, which is a transcription factor targeting genes underlying Initial bifurcation diagram for 4 gene model: β 1 = β 3 = 222, β 2 = β 4 = 1.39 Hopf Saddle−Node cell cycle progression. For sufficiently low level of F m , the level of E2F1 stays at a constant low value. At a critical value of F m a transcritical point is reached, above which the level of E2F1 starts to increase with F m . As the level of F m increases further, a saddle-node (SN 1 ) is reached and the level of E2F1 undergoes a discontinuous jump upwards, thereby triggering the S phase of the cell cycle.

Example 1: engineer irreversibility
First, we mention a number of main species and their interactions. The regulatory system consists of a core double inhibitor-activator module as well as several positive-feedback loops. The transcription factor E2F1 activates itself as well as pRB, a tumor suppressor. In turn, pRB inhibits itself as well as E2F1. The following equations describe this core module:  Another important collection of species is AP-1, which is used to denote a family of transcription factors that mediate the mitogenic signal F m . The family AP-1 is also activated by E2F1, with feedback strength k 25 . The dynamics for the concentration of the family [AP-1] is modelled as: In the first example, we consider the construction of an irreversible gene switch via parameter changes. It has been demonstrated [28] that for some value of k 25 in (8), the G 1 /S transition becomes irreversible. That is, if F m is increased beyond a certain point, a subsequent decrease to zero will not lead to a G 0 state with a low level of E2F1.
Geometrically, the inverse problem is to find the value of k 25 such that the x-abscissa of the upper saddle node (SN 2 ) is as close to zero as possible. As inequality constraints, we take 0.1 ≤ k 25 ≤ 1.5. The result of the inverse analysis shows that the feedback strength should increase from its initial value of k 25 = 0.9 to k 25 = 1.099, resulting in a change of bifurcation diagram shown in Figure 12. Row 1 of Table 3 shows the number of optimization iterations as well as the number of functional evaluations (which in this case equals the number of one-parameter continuations) required to reach a tolerance of 10 -3 on the function value.

Figure 9
Initial bifurcation diagram for symmetric repressilator: δ = 10 -3 , h = 1.5. the G 1 /S transition point SN 1 should be fixed at the value of the nominal system. In addition to allowing 0.1 ≤ k 25 ≤ 1.5 as before, we also modify the stability of E2F1 and AP-1. That is, we relax the values of φ E2F1 and φ AP-1 in (7) and (8)  Mathematically, the inverse problem is formulated as an nonlinear optimization problem with equality constraint on the x-abscissa of the G 1 /S point, SN 1 . The bifurcation diagram corresponding to the solution (k 25 = 1.182, φ E2F1 = 0.0992, φ AP-1 = 0.0109) is given in Figure 13. Row 2 of Table 3 summarizes the number of iterations required.

Conclusion
We propose an inverse problems methodology to address a class of problems arising in the study and design of gene systems. The methodology is based on the (inverse) map from the space of bifurcation diagrams to the space of bio- Optimized bifurcation diagram for symmetric repressilator: δ = 10 -4 , h = 2

Figure 10
Optimized bifurcation diagram for symmetric repressilator: δ = 10 -4 , h = 2.  chemical parameters. Within this general inverse bifurcation framework, we show that many questions of biological interest may be formulated as optimization problems involving minimal distances to bifurcation manifolds. Adjoint sensitivity analysis is carried out to allow for efficient solution using gradient-based optimization methods.
Several general questions remain for the methodology: what biological problems can be formulated in terms of geometric properties of the bifurcation diagrams? Given desired geometric properties for bifurcation diagrams, how best can the problem be formulated mathematically to allow for tractable solution? Finally, we remark that the development of appropriate numerical methods, novel Example 2: initial and optimized gene switches Figure 13 Example 2: initial and optimized gene switches.