- Research
- Open Access
- Published:

# Probability-based model of protein-protein interactions on biological timescales

*Algorithms for Molecular Biology*
**volume 1**, Article number: 25 (2006)

## Abstract

### Background

Simulation methods can assist in describing and understanding complex networks of interacting proteins, providing fresh insights into the function and regulation of biological systems. Recent studies have investigated such processes by explicitly modelling the diffusion and interactions of individual molecules. In these approaches, two entities are considered to have interacted if they come within a set cutoff distance of each other.

### Results

In this study, a new model of bimolecular interactions is presented that uses a simple, probability-based description of the reaction process. This description is well-suited to simulations on timescales relevant to biological systems (from seconds to hours), and provides an alternative to the previous description given by Smoluchowski. In the present approach (TFB) the diffusion process is explicitly taken into account in generating the probability that two freely diffusing chemical entities will interact within a given time interval. It is compared to the Smoluchowski method, as modified by Andrews and Bray (AB).

### Conclusion

When implemented, the AB & TFB methods give equivalent results in a variety of situations relevant to biology. Overall, the Smoluchowski method as modified by Andrews and Bray emerges as the most simple, robust and efficient method for simulating biological diffusion-reaction processes currently available.

## Background

Molecular biology is moving to an age where the amount of data and its complexity challenge our efforts to understand it. Many recent experimental studies have concentrated on obtaining accurate protein-protein interaction maps for genomes, ranging from unicellular organisms to human. Combining experimental data with modelling makes it possible to tackle this new wealth of information and study the way function emerges from protein interaction networks (for reviews of this field see references [1–3]).

An effective approach to simulating interaction networks, and one which has been used extensively in building cellular models, is through the use of ordinary differential equations (ODEs) (see review by Tyson *etal* [4] and references therein). ODEs, however, suffer from two important limitations.

The first limitation is that they are designed to follow the bulk concentration of the different molecules. In many cases, where small quantities of molecules are involved, the dynamics of the system are known to deviate substantially from the deterministic prediction of the ODEs and are better described by stochastic laws [5]. This can be overcome by implementing stochasticity into the models, which can be achieved in three ways: a first way is to use ODEs where stochastic perturbations have been added, mimicking the way the concentration of molecules fluctuates in time [6]; a second way is to use the method developed by Gillespie which follows reactions as discreet events in time [7]; and a third way – the one taken in the present work – is to explicitly follow the state of all the different molecules in the system independently [8].

A second limitation of the ODE approach, and of subsequent stochastic improvements, is that diffusion is not explicitly taken into account, which means that the effect of concentration gradients cannot be followed [9–11]. Concentration gradients can themselves be modelled, however it then becomes problematic to include the stochastic components (Virtual Cell approach [12–14] and E-Cell [15, 16]).

One way of modelling the stochastic as well as the diffusive aspect of the problem is by explicitly modelling the diffusion and interactions of the individual molecules contained in the system. Such spatial simulations have been performed by Franks *etal*. to study the synaptic cleft using their software M-Cell [10]. Also, recent simulations by Lipkow *etal*. have successfully modelled the individual molecules and their diffusion to show the presence of a protein concentration gradient in the motor response in *Escherichia coli* using their software *SmolDyn* [17, 18]. Another way is to discretise space on a lattice and to use extensions of the Gillespie algorithm such as in SmartCell [11, 19] and MesoRD [20, 21].

Bimolecular interactions have previously been modelled by considering a simple local contact criteria, such a scheme is used in M-Cell [10]. A more formal approach to modelling these interactions follows the description of diffusion limited chemical processes published by Smoluchowski in 1916 [22]. In this approach a chemical reaction is considered to take place when two chemically reactive entities A and B come within a certain distance, *σ*_{
b
}, from one another. This distance, called the reaction radius, is determined by the reaction rate and the diffusion constants of the two species, such that the reaction rate, *k,* is given by:

*k* = 4 *πD*_{+} *σ*_{
b
} (1)

where *D*_{+} = *D*_{
A
}+ *D*_{
B
}, and *D*_{
A
}and *D*_{
B
}are the diffusion constants of A and B.

The Smoluchowski approach requires the diffusion process to be followed using very short timesteps as the distance between the two entities must be precisely monitored over time. However, since the detailed diffusion process is of little interest in biological terms, this requirement translates into an unnecessary computational overhead, as illustrated in Figure 1. In order to circumvent this problem, Andrews and Bray recently devised a scheme which corrects *σ*_{
b
}for longer timesteps, making it more useful for simulating biological systems [17].

The Smoluchowski approach seems to be the most appropriate method currently available to study many important biological systems, however a potential weakness of the Smoluchowski approach is the presence of a sharply defined reaction zone (cf. Figure 1). The aim of the present study is to investigate the potential benefit of replacing this reaction zone by a more realistic probability distribution of interaction between two chemical objects. In this scheme, the reaction is not automatic when two reacting objects come within a certain range of each other. Instead, the decision whether to allow this interaction is made based on a probability. This probability of interaction is dependent upon – among other factors – the distance between the objects. Potential benefits of the study include more accurate results and lower computational costs, thereby allowing for more complicated systems to be investigated.

The approach has been implemented into a freely available simulation package, SoftCell. In the SoftCell software cellular membranes are defined by tessellation using triangles and rates of import/export are assigned to each chemical entity. This tessellation approach makes it possible to define complicated surfaces and any number of internal organelles one might wish to include. The program is written in C++ and is linked with the scripting language, Python, allowing for control and ease of analysis of the data generated. Files defining protein objects, reactions, and membranes use an XML format.

## The model

In the present approach we consider proteins to be freely diffusing point-like objects. On the scale of a whole cell which is the scale we are interested in, long range forces between the objects are shielded by the solvent and can therefore safely be ignored. Diffusion is formally modelled by Brownian dynamics, taking intermolecular forces explicitly into account, and integrating over the velocities of the object [23]. In the absence of long-range forces, the Brownian dynamics treatment of diffusion reduces to a random walk process. The random walk process only considers the position of the objects and not their velocities thereby reducing considerably the computational cost; this approach was therefore used in this work. It is also assumed that any differences in reaction kinetics resulting from the different possible orientations of two reacting molecules relative to each other at the time of encounter can safely be integrated into an average reaction kinetic, so that the objects can be treated as point-like. Interactions between these point-like objects are governed by a set of reaction rules (described in detail below) that are designed to emulate the biological system of interest as closely as possible, as illustrated in Figure 2.

### Reaction rules

We are interested in the reaction probability: the probability that two entities interact during a time step given that they can interact with reaction rate *k*, their diffusion rates are *D*_{1} and *D*_{2} and they start a distance *d* apart at the beginning of the time interval Δ*t*. This probability is illustrated in Figure 1C. The reaction between two diffusing particles can be considered to occur in two steps: firstly the encounter of the two entities through diffusion, followed by the actual chemical reaction. Let us consider two freely diffusing chemical entities, A and B, starting a distance *d*_{0} from each other at time *t*_{0}. At any time *t* later the rate *κ*^{AB}(*t*|*d*_{0}, *t*_{0}) of the reaction between entities A and B can be expressed as:

${\kappa}^{AB}(t|{d}_{0},{t}_{0})={p}_{C}^{AB}(t|{d}_{0},{t}_{0})\cdot {k}_{R}^{AB}\left(2\right)$

where ${p}_{C}^{AB}$ (*t*|*d*_{0}, *t*_{0}) is the probability of the two entities coming into contact at time *t* and ${k}_{R}^{AB}$ is the rate of the reaction once in contact, averaged over all possible orientations of the two entities relative to each other. Both parts of this equation: ${p}_{C}^{AB}$ (*t*|*d*_{0}, *t*_{0}) and ${k}_{R}^{AB}$ can be estimated as described below.

The reaction rate *κ*^{AB}(*t*|*d*_{0}, *t*_{0}) can be integrated over a simulation timestep Δ*t* to provide the probability of at least one reaction taking place in that timestep. We are interested in the probability of at least one event taking place during the time interval Δ*t*, i.e. 1 - P(no event during Δ*t*). The process under consideration is a Poisson process with a time dependent rate of the event taking place. Given the rate *κ*(*t*) of an event taking place at a time *t*, the probability, *P*^{AB}, of at least one reaction taking place during that time interval takes the general form [24, 25]:

*P*^{AB}(Δ*t*) = 1 - *e*^{-I(Δt)} (3)

where

$I(\Delta t)={\displaystyle {\int}_{0}^{\Delta t}\kappa (t)\text{d}t}\left(4\right)$

Such that the probability of a reaction taking place during timestep *δt* can be expressed as:

${P}^{AB}(\Delta t)=1-{e}^{-{\displaystyle {\int}_{0}^{\Delta t}{\kappa}^{AB}(t)\text{d}t}}\left(5\right)$

where *κ*^{AB}(*t*) is given in equation (2).

### The contact probability: ${p}_{C}^{AB}$ (*t*|*d*_{0}, *t*_{0})

The probability of contact, ${p}_{C}^{AB}$ (*t*|*d*_{0}, *t*_{0}), is determined by the diffusion process of the two entities A and B during the time interval Δ*t* = *t* - *t*_{0}. The interacting bodies follow the laws of diffusion such that the probability of finding a given entity in an infinitesimal volume element *dV*, a distance *d* away from its starting position a time Δ*t* later, is given by the well known Gaussian distribution: ${(4\pi D\Delta t)}^{-3/2}{e}^{-\frac{{d}^{2}}{4D\Delta t}}dV$, where *D* is the diffusion constant of the entity [26].

The present approach is illustrated in Figure 3. The two entities diffuse freely starting a distance *d*_{0} apart. The probability of them coming into contact increases with time and reaches a maximum. Subsequently the two entities diffuse further and the probability of them coming into contact decreases with time. A mathematically equivalent description is given if A is considered to be diffusing with diffusion constant *D*_{+} = *D*_{
A
}+ *D*_{
B
}while B remains stationary. It follows that the probability of contact, ${p}_{C}^{AB}$ (*t*|*r*_{0}, *t*_{0}), is given by:

${p}_{C}^{AB}(t|{d}_{0},{t}_{0})={(4\pi {D}_{+}\Delta t)}^{-3/2}{e}^{-\frac{{d}_{0}^{2}}{4{D}_{+}\Delta t}}\delta {V}_{C}\left(6\right)$

where *D*_{+} = *D*_{
A
}+ *D*_{
B
}, Δ*t* = *t* - *t*_{0} and *δV*_{
C
}is a small contact volume defined such that if two entities are found to be within this volume, they are considered to be in contact. This small contact volume, *δV*_{
C
}, will be considered further below.

### The reaction rate: ${k}_{R}^{AB}$

In order to get a good first approximation for ${p}_{C}^{AB}$ (*t*|*d*_{0}, *t*_{0}), we initially consider the well-mixed limit. In this limit the distribution of the two entities A and B is uniform over space. This approximation is thus only valid in the long timestep limit. For short timesteps, the approximations break down and a correction to the reaction rate, ${k}_{R}^{AB}$, has to be introduced. An exact analytical solution has recently been presented that is also correct for short time steps [27]. However, the mathematics involved are complex and difficult to implement; the approach used here, although approximate, is simpler and is not expected to alter the findings significantly.

#### The well-mixed limit

Let us consider the simple reaction: $\text{A}+\text{B}\stackrel{k}{\to}\text{Y}$ occurring in a finite volume *V*. To first order, the rate of change in the number of molecules Y is:

$\frac{\text{d}{n}_{\text{Y}}}{\text{d}t}=\frac{k}{{N}_{A}}\frac{{n}_{\text{A}}{n}_{\text{B}}}{V}\left(7\right)$

where *n*_{A}, *n*_{B} and *n*_{Y} are the number of molecules of A, B and Y (respectively), *N*_{
A
}is Avogadro's constant and k is rate of the reaction. The rate of change in *n*_{Y} can also be expressed as:

$\frac{\text{d}{n}_{\text{Y}}}{\text{d}t}={\widehat{p}}_{C}^{AB}\cdot {k}_{R}^{AB}\left(8\right)$

where ${\widehat{p}}_{C}^{AB}$ is here the ensemble average probability of any two entities A and B being in contact in volume *V* and ${k}_{R}^{AB}$ the rate of reaction if they are in contact.

The objects are considered to be uniformly distributed over the volume V such that the probability of A and B occupying the same contact volume *δV*_{
C
}, ${\widehat{p}}_{C}^{AB}$, can be expressed as:

${\widehat{p}}_{C}^{AB}=\frac{{n}_{\text{A}}{n}_{\text{B}}}{V}\delta {V}_{C}\left(9\right)$

where *δV*_{
C
}is the same infinitesimal volume as in equation (6). By combining equations (7), (8), and (9), we can extract the rate of the reaction if A and B are in contact:

${k}_{R,\text{mixed}}^{AB}=\frac{k}{{N}_{A}\delta {V}_{C}}\left(10\right)$

Combining equation (10) and ${p}_{C}^{AB}$ from equation (6) into equation (2), the rate of the reaction between entities A and B, starting a distance *d*_{0} apart at a time *t* later, is given by:

${\kappa}_{\text{mixed}}^{AB}(t|{d}_{0},{t}_{0})={(4\pi {D}_{+}\Delta t)}^{-3/2}{e}^{-\frac{{d}_{0}^{2}}{4{D}_{+}\Delta t}}\cdot \frac{k}{{N}_{A}}\left(11\right)$

In doing this, notice that the contact volume *δV*_{
C
}cancels out of the equations. This effectively removes any information about the size of the particles from subsequent considerations.

Inserting ${k}_{\text{mixed}}^{AB}$ from equation (11) into equation (4), *I*(Δ*t*) can be solved analytically using the standard integral:

${\int}_{0}^{\Delta t}{t}^{-\frac{3}{2}}}{e}^{-\frac{1}{t}}\text{d}t=\sqrt{\pi}\text{Erfc}\left(\frac{1}{\sqrt{\Delta t}}\right)\left(12\right)$

where Erfc is the complimentary error function defined by:

$\text{Erfc}(x)=\frac{2}{\pi}{\displaystyle {\int}_{x}^{\infty}{e}^{-{z}^{2}}}\text{d}z\left(13\right)$

such that *I*(Δ*t*) has the analytical form:

$I(\Delta t)=\frac{k}{{N}_{A}}\cdot \frac{1}{4d\pi {D}_{+}}\text{Erfc}\left(\frac{d}{\sqrt{4\Delta t{D}_{+}}}\right)\left(14\right)$

Finally we can express the probability ${P}_{\text{mixed}}^{AB}$ of a reaction taking place between entities A and B, starting a distance *d*_{0} apart, during the time interval Δ*t* as:

${P}_{\text{mixed}}^{AB}(t|{d}_{0},{t}_{0})=1-{e}^{-\frac{k}{{N}_{A}}\cdot \frac{1}{4{d}_{0}\pi {D}_{+}}\text{Erfc}\left(\frac{{d}_{0}}{\sqrt{4\Delta t{D}_{+}}}\right)}\left(15\right)$

The probability of two chemical entities interacting in a timestep Δ*t* is thus expressed in equation (15) in terms of the reaction rate *k*, the sum of the diffusion constants of the two entities *D*_{+} and the time interval Δ*t*. This approach provides a good description of the interactions in terms of the underlying diffusion process and reactivity of the two entities.

#### Short timestep correction

The equations above hold for situations where the system can be considered to be well-mixed. However, this assumption breaks down for small timesteps: as chemical entities react over time, there tend to be fewer potentially interacting partners close to each other so that the distribution of the two entities is no longer uniform. In the long timescale limit this is not a problem as the system is well-mixed by each diffusion step and the approximations hold. At each timestep, the reaction process creates 'dips' in the probability distribution of the entities, the spatial extent of these 'dips' is comparable to the spatial extent of the probability of reaction. In order to remain well-mixed the distance covered by one step of diffusion must be greater than the spatial extent of the 'dips' created by the reaction process. For diffusion constants typical of biomolecules, the spatial width at half-maximum of ${P}_{\text{mixed}}^{AB}$ (*t*|*d*_{0}, *t*_{0}) goes to ~0.1 *μ* m for Δ*t* of the order of seconds. Considering this distance as being covered by diffusion, this gives us a typical timescale of Δ*t* ≥ 0.01 s. The system can therefore be assumed to be well-mixed for timesteps of Δ*t* ≥ 0.01 s. For shorter timesteps, this effect can be corrected for, as will be shown below.

Due to the reaction process, the average concentration around a chemical entity is less than predicted by the uniform distribution. The desired rate can be derived by correcting ${k}_{R,\text{mixed}}^{AB}$ by a scaling factor as follows:

${k}_{R}^{AB}=C\cdot {k}_{R,\text{mixed}}^{AB}\left(16\right)$

The procedure we used for doing this is very similar to that used by Andrews and Bray [17] to correct for the same effect in the Smoluchowski approach and is outlined below. More elaborate mathematical considerations of this process can be found in the recent paper by Zon and Wolde 2005 [27].

Using the rate of reaction upon encounter from equation (16), the probability of the reaction taking place after each diffusion step is now given by:

${P}^{AB}({t}_{0}+\Delta t|{d}_{0},{t}_{0})=1-{e}^{-C\cdot \frac{k}{{N}_{A}}\cdot \frac{1}{4{d}_{0}\pi {D}_{+}}\text{Erfc}\left(\frac{{d}_{0}}{2s}\right)}\left(17\right)$

where Δ*t* is the timestep of the simulation and we use the substitution $s=\sqrt{2{D}_{+}\Delta t}$.

For the purposes of the correction, we are interested in average effects, so from now on we consider the average concentrations of entities A and B and not the positions of entities A and B. Let us consider the radial concentration *ρ*_{
B
}(*r*, *t*) of entity B around entity A, with A considered to be static at *r* = 0, while entity B has diffusion constant *D*_{+} = *D*_{
A
}+ *D*_{
B
}.

The radial concentration *ρ*_{
B
}(*r*, *t*) of entity B around entity A, at time *t* is propagated for a simulation timestep, Δ*t*, to give *ρ*_{
B
}(*r*, *t* + Δ*t*) using a Green's function [28]:

${\rho}_{B}(r,t+\Delta t)={\displaystyle {\int}_{0}^{\infty}{\rho}_{B}}({r}^{\prime},t){G}_{s}(r,{r}^{\prime})4\pi {{r}^{\prime}}^{2}\text{d}{r}^{\prime}\left(18\right)$

where the Green's function *G*_{
s
}(*r*, *r'*) is given by:

${G}_{s}(r,{r}^{\prime})=\frac{1}{4\pi r{r}^{\prime}s\sqrt{2\pi}}({e}^{-\frac{{(r-{r}^{\prime})}^{2}}{2{s}^{2}}}-{e}^{-\frac{{(r+{r}^{\prime})}^{2}}{2{s}^{2}}})\left(19\right)$

The entity A is then allowed to interact with entity B such that the new concentration of B is given by:

${{\rho}^{\prime}}_{B}(r,t+\Delta t)=(1-{P}^{AB}(r))\cdot {\rho}_{B}(r,t+\Delta t)\left(20\right)$

where *P*^{AB}(*r*) is the probability of A and B interacting in the following timestep from equation (17).

The reaction step acts as a sink for the concentration of B, while the concentration of B is assumed to be constant at *r* = ∞. The long-distance equilibrium solution for *ρ*_{
B
}(*r*) is known to be of the form [26, 29]:

${\rho}_{B}(r)=1+\frac{a}{r}\left(21\right)$

This allows us to solve numerically for *ρ*_{
B
}(*r*, *t* + Δ*t*) around *r* = 0 while using an analytical extension for long distances (long distance was defined as *r* > *r*_{
P
}, *r*_{
P
}such that *P*^{AB}(*r*) = 10^{-6}). Equation (18) is then split into a numerical and an analytical part:

${\rho}_{B}(r,t+\Delta t)={\displaystyle {\int}_{0}^{{r}_{P}}{\rho}_{B}}({r}^{\prime},t){G}_{s}(r,{r}^{\prime})4\pi {{r}^{\prime}}^{2}\text{d}{r}^{\prime}+{\displaystyle {\int}_{{r}_{P}}^{\infty}{\rho}_{B}^{\text{ana}}}({r}^{\prime},t){G}_{s}(r,{r}^{\prime})4\pi {{r}^{\prime}}^{2}\text{d}{r}^{\prime}\left(22\right)$

${\rho}_{B}(r,t+\Delta t)={\displaystyle {\int}_{0}^{{r}_{P}}{\rho}_{B}}({r}^{\prime},t){G}_{s}(r,{r}^{\prime})4\pi {{r}^{\prime}}^{2}\text{d}{r}^{\prime}+{\rho}_{B}^{\text{ana}}(r,t+\Delta t)\left(23\right)$

By inserting (21), the analytical extension, ${\rho}_{B}^{\text{ana}}$ (*r*, *t* + Δ*t*), can be derived and is given by:

${\rho}_{B}^{\text{ana}}(r,t+\Delta t)=\frac{s}{r\sqrt{2\pi}}{G}_{s}(r,{r}_{P})+\frac{1}{2}({E}_{+}+{E}_{-})+\frac{a}{2r}({E}_{-}-{E}_{+})\left(24\right)$

where: ${E}_{\pm}=\text{Erfc}\left(\frac{{r}_{P}\pm r}{s\sqrt{2}}\right)$

A diffusion step is performed using numerical integration for the remainder of equation (23). The value of the constant *a* in ${\rho}_{B}^{\text{ana}}$ (*r*, *t*) is found by fitting the last 10% of *ρ*_{
B
}(*r*, *t*) relative to *r*_{
P
}after each diffusion step. After each diffusion step the two entities are allowed to interact following the probability given in equation (17). In order to achieve a steady state, entity A is left unaffected by the reaction process. The process of diffusion/reaction is repeated until the radial concentration *ρ*_{
B
}(*r*, *t*) reaches a steady state: *ρ*_{
B
}(*r*, ∞).

The effective rate of the reaction is given by:

${k}_{\text{eff}}={\displaystyle {\int}_{0}^{\infty}{P}^{AB}}(r){\rho}_{B}(r,\infty )4\pi {r}^{2}\text{d}r\left(25\right)$

The values of *k*_{eff} were determined for a range of values of the correction factor, *C*, and variable *s*. Using this array of data it is then possible to find the correct value of the correction factor, *C*, for any pair of values of the desired reaction rate and simulation timestep: (*k*_{eff}, Δ*t*).

Figure 4 shows examples of the probability distribution and corresponding reaction radius, *σ*_{
b
}, using the Smoluchowski approach for different timesteps. As can be seen, at large timesteps, the distribution of B before the reaction is close to uniform and the correction factor correspondingly small. On the other hand at small timesteps, the probability distribution of *B*, before reaction, is far from uniform and the correction factor, *C*, is large. For *k* = 10^{6} M^{-1}s^{-1} and *D*_{
A
}= *D*_{
B
}= 1 *μ* m^{2}s^{-1} the correction factors, *C*, are 1.12, 1.46, 4.48 and 4.44 10^{3} for timesteps of 10^{-1}s, 10^{-2}s, 10^{-3}s and 10^{-4}s, respectively. Figure 4 also illustrates the fact that as the timesteps diminish the corrected reaction probability converges with the Smoluchowski cutoff at *σ*_{
b
}. For short timesteps the two approaches appear to be equivalent.

Figure 4 also shows the radial probability distribution of the reactants for the different timesteps. These distributions are continuous throughout the reaction region. In contrast, those reported by Andrews and Bray show a strong discontinuity due to the sharply defined reaction radius [17].

### The challenge of long timesteps

Relation (15) holds only for a pair of interacting molecules. In the more general context of an actual chemical reaction, the number of potential reactive partners at any one time can be much higher. The assumption is that during a time step Δ*t* the probability of an entity interacting with its closest neighbour, *P*_{closest}, is much higher than the probability of it interacting with its next nearest neighbour, *P*_{next nearest}: *P*_{closest} ≫ *P*_{next nearest}.

This condition will clearly be fulfilled if the average distance, *d*_{travel}, travelled by an entity during the time interval Δ*t* is less than the average distance between particles, *d*. The average travel distance is: *d*_{travel} = ${d}_{\text{travel}}=\sqrt{6D\Delta t}$, where *D* is the diffusion constant. The average interparticle distance given by: $d={(\frac{V}{N})}^{1/3}$, where *V* is the volume and *N* is the number of particle, can also be expressed in terms of the concentration as: *d* ∝ *C*^{-1/3}. where *C* is the concentration. For typical biological situations, *C* ≃ *μ* M ml^{-1} and *D* ≃ *μ* m^{2}s^{-1}, such that the condition reduces to Δ*t* <~0.002 s.

When this condition is not fulfilled, a given entity can in principle interact with several other entities at any given timestep. This will affect the probability of the reaction with any given particle as reactions are considered mutually exclusive. In principle these probabilities can be calculated at each timestep during the simulation so that the correct statistics are reproduced. However, it was decided that the resulting extension to the code would introduce extra computational overhead, without significant benefit, and was not implemented. On the other hand, simply assuming that entities can interact with at most one other entity in the following timestep, when in fact they can interact with several, leads to the simulated reaction taking place at a rate greater than the expected rate.

Another problem which emerges for long timesteps concerns boundaries: for reactions, the algorithm assumes free diffusion in the space around the entities. This assumption is mostly correct when the timestep is small but at large timestep, the chance of encountering a boundary during that timestep become significant. At that point the free diffusion assumption assumed in the previous equations breaks down leading the reaction happening too fast in the simulation. Simply put, entities close to boundaries have less volume in which to diffuse and, therefore, a higher chance of encounter than entities far from any boundary. We can expect this effect to become important when the scale of the system becomes comparable to the typical distance travelled during a timestep. For biological systems on the *μ* m scale, and chemical entities diffusing with diffusion constants ~*μ* ms^{-1}, this sets an absolute upper limit on the timesteps at ~0.1 s. Properly taking into account these boundary effects is beyond the scope of the present work. However, boundary effects are not expected to play an important role as the timescale for these effects is ~0.1 s, which is a much longer timescale than the limit previously set by single particle interaction at ~0.002 s.

## Validation of the model

The model was tested in a number of ways. The first test was performed by simulating an enzymatic reaction: *A* + *E* → *B* + *E*. 1000 molecules of A were simulated with 10 molecules of E and the effective reaction rate was measured by fitting the change in the concentration of *A* over time. We ran 4 sets of simulations with the following parameters for the diffusion rate *d* of the chemical objects and timestep Δ*t*: 1) d = 1, Δ*t* = 0.01; 2) d = 1, Δ*t* = 0.001; 3) d = 5, Δ*t* = 0.01; 4) d = 5, Δ*t* = 0.001. For each set of d and Δ*t*, the reaction rate *k* was successively set to 10^{6.0}, 10^{6.2}, 10^{6.4} and 10^{6.6} M^{-1}s^{-1}. For comparison purposes, these 4 sets of runs were performed using both the present model and the Andrews and Bray model. The corrected binding radius used in the Andrews and Bray approach was calculated using the code provided by the authors.

As has been pointed out by Andrews and Bray [17], in such simulations the measured rate of the reaction varies with time. This is due to the fact that the simulation starts, the local concentration gradient is not yet established, and the initial reaction rate is, therefore, higher than the desired rate as the local concentration gradient is not yet established. Subsequently, the system tends towards a steady state, and the reaction rate is correctly predicted by both methods with 99% accuracy when using the correction term. The two methods were statistically indistinguishable over the four sets of runs (slope and intercept of *k*_{
measured
}= *f* (*k*_{
desired
}) were identical with *p* > 0.55).

The model was also tested for reactions at low numbers of reactants (*n*_{
A
}= 10) where the effective rate of the reaction becomes subject to significant stochastic fluctuations. 10000 runs were performed using both the present and corrected Smoluchowski approaches; the reaction rate was determined for each run. Again four sets of runs were performed using the same diffusion constants and timesteps as described above. The reaction rate used was 10^{6} M^{-1}s^{-1}. The distribution of the reaction rates at low concentrations produced by the present and corrected Smoluchowski approaches were compared and found to be indistinguishable (*p* > 0.2 on t-test).

Finally, the present and corrected Smoluchowski approaches were also compared in a situation containing a concentration gradient. The concentration gradient was produced by a point source of a molecule A (*k* = 2 *s*^{-1}) which reacts with an enzyme E ([*E*] = 40 nM) with *k*_{
E
}= 10^{6} M^{-1}s^{-1}. The diffusion constants and timestep parameters where again varied as previously. The gradient generated were found to be identical (*p* > 0.9 on U-test). All statistical analyses were performed using the R package [30].

## Example

In a typical cell, the concentration of a given protein inside the nucleus depends on the balance between the rate at which it is being translated from mRNA in the cytosol, the rate at which it is being transported into the nucleus and the rate at which it is being degraded. In turn, the amount of mRNA present in the cytosol depends on the rate at which the gene is being transcribed in the nucleus, the rate at which it is being exported and the rate at which it is being degraded. This mechanism enables the concentration of protein in the nucleus to be tightly controlled.

As an illustration of the versatility of the present approach this simple system was simulated. Our model system, illustrated in Figure 2, consisted of a rod-shaped cell with a nucleus at its centre. Inside the nucleus, a gene is switched on at time *t* = 0, for 20 minutes. Transcription events then take place, generating mRNA molecules. These mRNA molecules diffuse out of the nucleus and encounter ribosomes which translate them into proteins. These proteins are considered to be tagged for the nucleus and therefore are allowed to pass through the membrane and accumulate in the nucleus. Both the mRNA and the protein have ubiquitination/destruction pathways that regulate their lifetime inside the cell such that the system reaches a steady state with a finite concentration of mRNA and protein. At time *t* = 20 min the gene is turned off and the concentration of mRNA and protein drops rapidly.

Figure 5 presents the average concentrations of protein in the nucleus and mRNA molecules in the cell over the time course of the simulation. Data obtained using the Smoluchowski approach as modified by Andrews and Bray are virtually indistinguishable from the ones produced using the present approach and are not shown. As expected, a delay occurs before the protein concentration in the nucleus increases. The mRNA concentration quickly reaches a maximum value of ~0.06 nM over the first 2 minutes. The nuclear protein concentration increases sharply over the first 10 minutes and then starts to plateau at values of ~13 nM. At *t* = 20 min the gene is turned off and the mRNA concentration quickly falls off (half-life ~1 min) with the protein concentration quickly following (half-life ~5 min) due to degradation. Figure 5 also shows the timecourses of an individual run. As expected, individual runs present stochastic behaviour characteristic of such biological systems. These dynamics are typical of what one would expect of such a system [31].

## Discussion & conclusion

We have presented a formal, theoretically sound framework that provides reliable and accurate simulations of the diffusion-reaction process for biological systems. We compared it with the methods of Smoluchowski [22] and its extension by Andrews and Bray [17]. Figure 1 illustrates the different approaches compared in this study. The first case (A) is the original Smoluchowski approach. In this approach, at each short timestep *δt* in the diffusion process the distance between the chemical entities is checked and if they come into close proximity (distance *d* <*σ*_{
b
}) the two entities are said to have reacted together. The downside of the approach is that many diffusion steps need to be computed to simulate the reaction kinetics accurately. The second approach (B) is that of Andrews and Bray [17]. In their scheme, the reaction radius, *σ*_{
b
}, is adjusted so that the correct reaction kinetics are reproduced for timesteps Δ*t* ≥ 100 × *δt*. This approach produces an efficient algorithm that yields the correct reaction kinetics while using larger timesteps. Finally, (C) illustrates the present approach where the reaction radius is replaced by a smooth interaction probability. The two entities are considered to diffuse freely during the timestep Δ*t* thereby producing a probability *P*^{AB}(*d*, Δ*t*) of interaction.

Although differences were expected to appear between the Andrews and Bray and the current approach in certain circumstances (such as low reactant concentrations, or in the presence of concentration gradients), the results indicate that the reactions rates produced by both methods converge. This is thought to be essentially due to the averaging that takes place as the number of interactions increases. Hence the two methods are for practical purposes equivalent (*p* > 0.55). It cannot be ruled out, however, that differences will appear for more complex systems. For example, in the context of reversible reactions, recombination effects might be best modelled using a probability based method. Overall, the Andrews and Bray method for simulating diffusion-reaction processes appears robust at low concentration and gradient effects. However, a possible improvement on this method would be the analytical derivation of the radius of reaction for long timesteps, in place of its present approximation. The Andrews and Bray method was consistently computationally more efficient, running up to ~15% faster depending upon the system being simulated.

An in depth theoretical analysis of the diffusion-reaction approach in the context of event driven simulations has recently been published by Zon and Wolde [27]. Here again the aim is to increase the reach of present simulations by using longer timesteps. Using event driven simulations, the timestep can be increased substantially when reactive species are far apart or present at very low concentrations. However, as in the present work a limit on the length of the timestep is set by the requirement that they have to be short enough to ensure that an object can only interact with one other object during a timestep; this sets an upper limit to how large a timestep can be, and it remains to be shown whether they offer any clear computational advantage.

We have shown that the modified Smoluchowski method provides results that are indistinguishable from those produced using the much more elaborate and realistic model presented here, at a lower computational cost. The Andrews and Bray, radius-based, method thus appears to be the most simple, robust and efficient method for simulating diffusion-reaction processes currently available.

## References

- 1.
Bhalla US: Models of cell signaling pathways. Curr Opin Genet Dev. 2004, 14 (4): 375-381. 10.1016/j.gde.2004.05.002

- 2.
Ma'ayan A, Blitzer RD, lyengar R: Toward predictive models of mammalian cells. Annu Rev Biophys Biomol Struct. 2005, 34: 319-349. 10.1146/annurev.biophys.34.040204.144415

- 3.
Takahashi K, Arjunan SNV, Tomita M: Space in systems biology of signaling pathways – towards intracellular molecular crowding in silico. FEBS Lett. 2005, 579 (8): 1783-1788. 10.1016/j.febslet.2005.01.072

- 4.
Tyson JJ, Chen K, Novak B: Network dynamics and cell physiology. Nat Rev Mol Cell Biol. 2001, 2 (12): 908-916. 10.1038/35103078

- 5.
Firth CA, Bray D: Computational modeling of genetic and biochemical networks. MIT Press 2001

- 6.
Gibson MA, Bruck J: Efficient exact stochastic simulation of chemical systems with many species and many channels. J Phys Chem A. 2000, 104: 1876-1889. 10.1021/jp993732q. 10.1021/jp993732q

- 7.
Gillepsie DT: A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. Journal of Computational Physics. 1976, 403-434.

- 8.
Morton-Firth CJ, Bray D: Predicting temporal fluctuations in an intracellular signalling pathway. J Theor Biol. 1998, 192: 117-128. 10.1006/jtbi.1997.0651

- 9.
Slepchenko BM, Schaff JC, Carson JH, Loew LM: Computational cell biology: spatiotemporal simulation of cellular events. Annu Rev Biophys Biomol Struct. 2002, 31: 423-441. 10.1146/annurev.biophys.31.101101.140930

- 10.
Franks KM, Bartol TMJ, Sejnowski TJ: A Monte Carlo model reveals independent signaling at central glutamatergic synapses. Biophys J. 2002, 83 (5): 2333-2348.

- 11.
Ander M, Beltrao P, DiVentura B, Ferkinghoff-Borg J, Foglierini M, Kaplan A, Lemerle C, Tomas-Oliveira I, Serrano L: SmartCell: a framework to simulate cellular processes that combines stochastic approximation with diffusion and localisation: analysis of simple gene networks. Systems Biology. 2004, 1: 129-139. 10.1049/sb:20045017

- 12.
Schaff J, Fink CC, Slepchenko B, Carson JH, Loew LM: A general computational framework for modeling cellular structure and function. Biophys J. 1997, 73 (3): 1135-1146.

- 13.
Loew LM, Schaff JC: The Virtual Cell: a software environment for computational cell biology. Trends Biotechnol. 2001, 19 (10): 401-406. 10.1016/S0167-7799(01)01740-1

- 14.
National Resource for Cell Analysis and Modeling. http://www.nrcam.uchc.edu

- 15.
Tomita M, Hashimoto K, Takahashi K, Shimizu TS, Matsuzaki Y, Miyoshi F, Saito K, Tanida S, Yugi K, Venter JC, Hutchison CAr: E-CELL: software environment for whole-cell simulation. Bioinformatics. 1999, 15: 72-84. 10.1093/bioinformatics/15.1.72

- 16.
E-CELL Project. http://www.e-cell.org

- 17.
Andrews SS, Bray D: Stochastic simulation of chemical reactions with spatial resolution and single molecule detail. Phys Biol. 2004, 1 (3–4): 137-151. 10.1088/1478-3967/1/3/001

- 18.
Lipkow K, Andrews SS, Bray D: Simulated diffusion of phosphorylated CheY through the cytoplasm of Escherichia coli. J Bacterial. 2005, 187: 45-53. 10.1128/JB.187.1.45-53.2005. 10.1128/JB.187.1.45-53.2005

- 19.
Smartcell: A Cell Network Simulation Program. http://smartcell.embl.de

- 20.
Hattne J, Fange D, Elf J: Stochastic reaction-diffusion simulation with MesoRD. Bioinformatics. 2005, 21 (12): 2923-2924. 10.1093/bioinformatics/bti431

- 21.
MesoRD – Mesoscopic Reaction Diffusion Simulator. http://mesord.sourceforge.net

- 22.
Smoluchowski MV: Versuch einer mathematischen theorie der koagulationskinetic kolloider lösugen. Zeitschrift f physik chemie. 1916, 92: 129-168.

- 23.
Ermak DL, McCammon JA: Brownian dynamics with hydrodynamic interactions. J Chem Phys. 1978, 69 (4): 1352-1360. 10.1063/1.436761.

- 24.
Cox DR: The theory of stochastic processes. Methuen. 1965.

- 25.
Ross SM: Stochastic Processes. Wiley Series in Probability and Mathematical Statistics, Wiley 1983.

- 26.
Berg HC: Random walks in Biology. Princeton University Press 1993.

- 27.
van Zon JS, ten Wolde PR: Green's-function reaction dynamics: a particle-based approach for simulating biochemical networks in time and space. J Chem Phys. 2005, 123 (23): 234910-10.1063/1.2137716

- 28.
Rice SA: Diffusion-limited reactions. Elsevier. 1985.

- 29.
Crank J: The Mathematics of Diffusion. Oxford University Press 1979.

- 30.
The R Project for Statistical Computing. http://www.r-project.org

- 31.
Fersht A: Structure and mechanism in protein science: a guide to enzyme catalysis and protein folding. Freeman, W H 1999.

## Acknowledgements

The authors thank Rafael Carazo-Salas and the members of the BMM Laboratory for useful discussions and insights. The authors also thank Gavin Kelly for help with the statistical analysis. This work was supported by Cancer Research UK and partially funded by an EMBO long-term research fellowship (awarded to A. L. Tournier).

## Author information

### Affiliations

### Corresponding authors

## Additional information

### Competing interests

The authors declare that they have no competing interests.

### Authors' contributions

ALT designed the new methodology and mathematics, PWF helped with implementation and in checking the derivations, PAB provided the initial impetus and supported the project through its different stages. All authors read and approved the final manuscript.

## Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

## Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## About this article

### Cite this article

Tournier, A.L., Fitzjohn, P.W. & Bates, P.A. Probability-based model of protein-protein interactions on biological timescales.
*Algorithms Mol Biol* **1, **25 (2006). https://doi.org/10.1186/1748-7188-1-25

Received:

Accepted:

Published:

### Keywords

- Diffusion Constant
- Chemical Entity
- Brownian Dynamic
- Random Walk Process
- Diffusion Step