Probability-based model of protein-protein interactions on biological timescales
© Tournier et al; licensee BioMed Central Ltd. 2006
Received: 25 September 2006
Accepted: 11 December 2006
Published: 11 December 2006
Skip to main content
© Tournier et al; licensee BioMed Central Ltd. 2006
Received: 25 September 2006
Accepted: 11 December 2006
Published: 11 December 2006
Simulation methods can assist in describing and understanding complex networks of interacting proteins, providing fresh insights into the function and regulation of biological systems. Recent studies have investigated such processes by explicitly modelling the diffusion and interactions of individual molecules. In these approaches, two entities are considered to have interacted if they come within a set cutoff distance of each other.
In this study, a new model of bimolecular interactions is presented that uses a simple, probability-based description of the reaction process. This description is well-suited to simulations on timescales relevant to biological systems (from seconds to hours), and provides an alternative to the previous description given by Smoluchowski. In the present approach (TFB) the diffusion process is explicitly taken into account in generating the probability that two freely diffusing chemical entities will interact within a given time interval. It is compared to the Smoluchowski method, as modified by Andrews and Bray (AB).
When implemented, the AB & TFB methods give equivalent results in a variety of situations relevant to biology. Overall, the Smoluchowski method as modified by Andrews and Bray emerges as the most simple, robust and efficient method for simulating biological diffusion-reaction processes currently available.
Molecular biology is moving to an age where the amount of data and its complexity challenge our efforts to understand it. Many recent experimental studies have concentrated on obtaining accurate protein-protein interaction maps for genomes, ranging from unicellular organisms to human. Combining experimental data with modelling makes it possible to tackle this new wealth of information and study the way function emerges from protein interaction networks (for reviews of this field see references [1–3]).
An effective approach to simulating interaction networks, and one which has been used extensively in building cellular models, is through the use of ordinary differential equations (ODEs) (see review by Tyson etal  and references therein). ODEs, however, suffer from two important limitations.
The first limitation is that they are designed to follow the bulk concentration of the different molecules. In many cases, where small quantities of molecules are involved, the dynamics of the system are known to deviate substantially from the deterministic prediction of the ODEs and are better described by stochastic laws . This can be overcome by implementing stochasticity into the models, which can be achieved in three ways: a first way is to use ODEs where stochastic perturbations have been added, mimicking the way the concentration of molecules fluctuates in time ; a second way is to use the method developed by Gillespie which follows reactions as discreet events in time ; and a third way – the one taken in the present work – is to explicitly follow the state of all the different molecules in the system independently .
A second limitation of the ODE approach, and of subsequent stochastic improvements, is that diffusion is not explicitly taken into account, which means that the effect of concentration gradients cannot be followed [9–11]. Concentration gradients can themselves be modelled, however it then becomes problematic to include the stochastic components (Virtual Cell approach [12–14] and E-Cell [15, 16]).
One way of modelling the stochastic as well as the diffusive aspect of the problem is by explicitly modelling the diffusion and interactions of the individual molecules contained in the system. Such spatial simulations have been performed by Franks etal. to study the synaptic cleft using their software M-Cell . Also, recent simulations by Lipkow etal. have successfully modelled the individual molecules and their diffusion to show the presence of a protein concentration gradient in the motor response in Escherichia coli using their software SmolDyn [17, 18]. Another way is to discretise space on a lattice and to use extensions of the Gillespie algorithm such as in SmartCell [11, 19] and MesoRD [20, 21].
Bimolecular interactions have previously been modelled by considering a simple local contact criteria, such a scheme is used in M-Cell . A more formal approach to modelling these interactions follows the description of diffusion limited chemical processes published by Smoluchowski in 1916 . In this approach a chemical reaction is considered to take place when two chemically reactive entities A and B come within a certain distance, σ b , from one another. This distance, called the reaction radius, is determined by the reaction rate and the diffusion constants of the two species, such that the reaction rate, k, is given by:
k = 4 πD+ σ b (1)
where D+ = D A + D B , and D A and D B are the diffusion constants of A and B.
The Smoluchowski approach seems to be the most appropriate method currently available to study many important biological systems, however a potential weakness of the Smoluchowski approach is the presence of a sharply defined reaction zone (cf. Figure 1). The aim of the present study is to investigate the potential benefit of replacing this reaction zone by a more realistic probability distribution of interaction between two chemical objects. In this scheme, the reaction is not automatic when two reacting objects come within a certain range of each other. Instead, the decision whether to allow this interaction is made based on a probability. This probability of interaction is dependent upon – among other factors – the distance between the objects. Potential benefits of the study include more accurate results and lower computational costs, thereby allowing for more complicated systems to be investigated.
The approach has been implemented into a freely available simulation package, SoftCell. In the SoftCell software cellular membranes are defined by tessellation using triangles and rates of import/export are assigned to each chemical entity. This tessellation approach makes it possible to define complicated surfaces and any number of internal organelles one might wish to include. The program is written in C++ and is linked with the scripting language, Python, allowing for control and ease of analysis of the data generated. Files defining protein objects, reactions, and membranes use an XML format.
We are interested in the reaction probability: the probability that two entities interact during a time step given that they can interact with reaction rate k, their diffusion rates are D1 and D2 and they start a distance d apart at the beginning of the time interval Δt. This probability is illustrated in Figure 1C. The reaction between two diffusing particles can be considered to occur in two steps: firstly the encounter of the two entities through diffusion, followed by the actual chemical reaction. Let us consider two freely diffusing chemical entities, A and B, starting a distance d0 from each other at time t0. At any time t later the rate κ AB (t|d0, t0) of the reaction between entities A and B can be expressed as:
where (t|d0, t0) is the probability of the two entities coming into contact at time t and is the rate of the reaction once in contact, averaged over all possible orientations of the two entities relative to each other. Both parts of this equation: (t|d0, t0) and can be estimated as described below.
The reaction rate κ AB (t|d0, t0) can be integrated over a simulation timestep Δt to provide the probability of at least one reaction taking place in that timestep. We are interested in the probability of at least one event taking place during the time interval Δt, i.e. 1 - P(no event during Δt). The process under consideration is a Poisson process with a time dependent rate of the event taking place. Given the rate κ(t) of an event taking place at a time t, the probability, P AB , of at least one reaction taking place during that time interval takes the general form [24, 25]:
P AB (Δt) = 1 - e-I(Δt) (3)
Such that the probability of a reaction taking place during timestep δt can be expressed as:
where κ AB (t) is given in equation (2).
The probability of contact, (t|d0, t0), is determined by the diffusion process of the two entities A and B during the time interval Δt = t - t0. The interacting bodies follow the laws of diffusion such that the probability of finding a given entity in an infinitesimal volume element dV, a distance d away from its starting position a time Δt later, is given by the well known Gaussian distribution: , where D is the diffusion constant of the entity .
where D+ = D A + D B , Δt = t - t0 and δV C is a small contact volume defined such that if two entities are found to be within this volume, they are considered to be in contact. This small contact volume, δV C , will be considered further below.
In order to get a good first approximation for (t|d0, t0), we initially consider the well-mixed limit. In this limit the distribution of the two entities A and B is uniform over space. This approximation is thus only valid in the long timestep limit. For short timesteps, the approximations break down and a correction to the reaction rate, , has to be introduced. An exact analytical solution has recently been presented that is also correct for short time steps . However, the mathematics involved are complex and difficult to implement; the approach used here, although approximate, is simpler and is not expected to alter the findings significantly.
Let us consider the simple reaction: occurring in a finite volume V. To first order, the rate of change in the number of molecules Y is:
where nA, nB and nY are the number of molecules of A, B and Y (respectively), N A is Avogadro's constant and k is rate of the reaction. The rate of change in nY can also be expressed as:
where is here the ensemble average probability of any two entities A and B being in contact in volume V and the rate of reaction if they are in contact.
The objects are considered to be uniformly distributed over the volume V such that the probability of A and B occupying the same contact volume δV C , , can be expressed as:
In doing this, notice that the contact volume δV C cancels out of the equations. This effectively removes any information about the size of the particles from subsequent considerations.
where Erfc is the complimentary error function defined by:
such that I(Δt) has the analytical form:
Finally we can express the probability of a reaction taking place between entities A and B, starting a distance d0 apart, during the time interval Δt as:
The probability of two chemical entities interacting in a timestep Δt is thus expressed in equation (15) in terms of the reaction rate k, the sum of the diffusion constants of the two entities D+ and the time interval Δt. This approach provides a good description of the interactions in terms of the underlying diffusion process and reactivity of the two entities.
The equations above hold for situations where the system can be considered to be well-mixed. However, this assumption breaks down for small timesteps: as chemical entities react over time, there tend to be fewer potentially interacting partners close to each other so that the distribution of the two entities is no longer uniform. In the long timescale limit this is not a problem as the system is well-mixed by each diffusion step and the approximations hold. At each timestep, the reaction process creates 'dips' in the probability distribution of the entities, the spatial extent of these 'dips' is comparable to the spatial extent of the probability of reaction. In order to remain well-mixed the distance covered by one step of diffusion must be greater than the spatial extent of the 'dips' created by the reaction process. For diffusion constants typical of biomolecules, the spatial width at half-maximum of (t|d0, t0) goes to ~0.1 μ m for Δt of the order of seconds. Considering this distance as being covered by diffusion, this gives us a typical timescale of Δt ≥ 0.01 s. The system can therefore be assumed to be well-mixed for timesteps of Δt ≥ 0.01 s. For shorter timesteps, this effect can be corrected for, as will be shown below.
Due to the reaction process, the average concentration around a chemical entity is less than predicted by the uniform distribution. The desired rate can be derived by correcting by a scaling factor as follows:
The procedure we used for doing this is very similar to that used by Andrews and Bray  to correct for the same effect in the Smoluchowski approach and is outlined below. More elaborate mathematical considerations of this process can be found in the recent paper by Zon and Wolde 2005 .
Using the rate of reaction upon encounter from equation (16), the probability of the reaction taking place after each diffusion step is now given by:
where Δt is the timestep of the simulation and we use the substitution .
For the purposes of the correction, we are interested in average effects, so from now on we consider the average concentrations of entities A and B and not the positions of entities A and B. Let us consider the radial concentration ρ B (r, t) of entity B around entity A, with A considered to be static at r = 0, while entity B has diffusion constant D+ = D A + D B .
The radial concentration ρ B (r, t) of entity B around entity A, at time t is propagated for a simulation timestep, Δt, to give ρ B (r, t + Δt) using a Green's function :
where the Green's function G s (r, r') is given by:
The entity A is then allowed to interact with entity B such that the new concentration of B is given by:
where P AB (r) is the probability of A and B interacting in the following timestep from equation (17).
The reaction step acts as a sink for the concentration of B, while the concentration of B is assumed to be constant at r = ∞. The long-distance equilibrium solution for ρ B (r) is known to be of the form [26, 29]:
This allows us to solve numerically for ρ B (r, t + Δt) around r = 0 while using an analytical extension for long distances (long distance was defined as r > r P , r P such that P AB (r) = 10-6). Equation (18) is then split into a numerical and an analytical part:
By inserting (21), the analytical extension, (r, t + Δt), can be derived and is given by:
A diffusion step is performed using numerical integration for the remainder of equation (23). The value of the constant a in (r, t) is found by fitting the last 10% of ρ B (r, t) relative to r P after each diffusion step. After each diffusion step the two entities are allowed to interact following the probability given in equation (17). In order to achieve a steady state, entity A is left unaffected by the reaction process. The process of diffusion/reaction is repeated until the radial concentration ρ B (r, t) reaches a steady state: ρ B (r, ∞).
The effective rate of the reaction is given by:
The values of keff were determined for a range of values of the correction factor, C, and variable s. Using this array of data it is then possible to find the correct value of the correction factor, C, for any pair of values of the desired reaction rate and simulation timestep: (keff, Δt).
Figure 4 also shows the radial probability distribution of the reactants for the different timesteps. These distributions are continuous throughout the reaction region. In contrast, those reported by Andrews and Bray show a strong discontinuity due to the sharply defined reaction radius .
Relation (15) holds only for a pair of interacting molecules. In the more general context of an actual chemical reaction, the number of potential reactive partners at any one time can be much higher. The assumption is that during a time step Δt the probability of an entity interacting with its closest neighbour, Pclosest, is much higher than the probability of it interacting with its next nearest neighbour, Pnext nearest: Pclosest ≫ Pnext nearest.
This condition will clearly be fulfilled if the average distance, dtravel, travelled by an entity during the time interval Δt is less than the average distance between particles, d. The average travel distance is: dtravel = , where D is the diffusion constant. The average interparticle distance given by: , where V is the volume and N is the number of particle, can also be expressed in terms of the concentration as: d ∝ C-1/3. where C is the concentration. For typical biological situations, C ≃ μ M ml-1 and D ≃ μ m2s-1, such that the condition reduces to Δt <~0.002 s.
When this condition is not fulfilled, a given entity can in principle interact with several other entities at any given timestep. This will affect the probability of the reaction with any given particle as reactions are considered mutually exclusive. In principle these probabilities can be calculated at each timestep during the simulation so that the correct statistics are reproduced. However, it was decided that the resulting extension to the code would introduce extra computational overhead, without significant benefit, and was not implemented. On the other hand, simply assuming that entities can interact with at most one other entity in the following timestep, when in fact they can interact with several, leads to the simulated reaction taking place at a rate greater than the expected rate.
Another problem which emerges for long timesteps concerns boundaries: for reactions, the algorithm assumes free diffusion in the space around the entities. This assumption is mostly correct when the timestep is small but at large timestep, the chance of encountering a boundary during that timestep become significant. At that point the free diffusion assumption assumed in the previous equations breaks down leading the reaction happening too fast in the simulation. Simply put, entities close to boundaries have less volume in which to diffuse and, therefore, a higher chance of encounter than entities far from any boundary. We can expect this effect to become important when the scale of the system becomes comparable to the typical distance travelled during a timestep. For biological systems on the μ m scale, and chemical entities diffusing with diffusion constants ~μ ms-1, this sets an absolute upper limit on the timesteps at ~0.1 s. Properly taking into account these boundary effects is beyond the scope of the present work. However, boundary effects are not expected to play an important role as the timescale for these effects is ~0.1 s, which is a much longer timescale than the limit previously set by single particle interaction at ~0.002 s.
The model was tested in a number of ways. The first test was performed by simulating an enzymatic reaction: A + E → B + E. 1000 molecules of A were simulated with 10 molecules of E and the effective reaction rate was measured by fitting the change in the concentration of A over time. We ran 4 sets of simulations with the following parameters for the diffusion rate d of the chemical objects and timestep Δt: 1) d = 1, Δt = 0.01; 2) d = 1, Δt = 0.001; 3) d = 5, Δt = 0.01; 4) d = 5, Δt = 0.001. For each set of d and Δt, the reaction rate k was successively set to 106.0, 106.2, 106.4 and 106.6 M-1s-1. For comparison purposes, these 4 sets of runs were performed using both the present model and the Andrews and Bray model. The corrected binding radius used in the Andrews and Bray approach was calculated using the code provided by the authors.
As has been pointed out by Andrews and Bray , in such simulations the measured rate of the reaction varies with time. This is due to the fact that the simulation starts, the local concentration gradient is not yet established, and the initial reaction rate is, therefore, higher than the desired rate as the local concentration gradient is not yet established. Subsequently, the system tends towards a steady state, and the reaction rate is correctly predicted by both methods with 99% accuracy when using the correction term. The two methods were statistically indistinguishable over the four sets of runs (slope and intercept of k measured = f (k desired ) were identical with p > 0.55).
The model was also tested for reactions at low numbers of reactants (n A = 10) where the effective rate of the reaction becomes subject to significant stochastic fluctuations. 10000 runs were performed using both the present and corrected Smoluchowski approaches; the reaction rate was determined for each run. Again four sets of runs were performed using the same diffusion constants and timesteps as described above. The reaction rate used was 106 M-1s-1. The distribution of the reaction rates at low concentrations produced by the present and corrected Smoluchowski approaches were compared and found to be indistinguishable (p > 0.2 on t-test).
Finally, the present and corrected Smoluchowski approaches were also compared in a situation containing a concentration gradient. The concentration gradient was produced by a point source of a molecule A (k = 2 s-1) which reacts with an enzyme E ([E] = 40 nM) with k E = 106 M-1s-1. The diffusion constants and timestep parameters where again varied as previously. The gradient generated were found to be identical (p > 0.9 on U-test). All statistical analyses were performed using the R package .
In a typical cell, the concentration of a given protein inside the nucleus depends on the balance between the rate at which it is being translated from mRNA in the cytosol, the rate at which it is being transported into the nucleus and the rate at which it is being degraded. In turn, the amount of mRNA present in the cytosol depends on the rate at which the gene is being transcribed in the nucleus, the rate at which it is being exported and the rate at which it is being degraded. This mechanism enables the concentration of protein in the nucleus to be tightly controlled.
As an illustration of the versatility of the present approach this simple system was simulated. Our model system, illustrated in Figure 2, consisted of a rod-shaped cell with a nucleus at its centre. Inside the nucleus, a gene is switched on at time t = 0, for 20 minutes. Transcription events then take place, generating mRNA molecules. These mRNA molecules diffuse out of the nucleus and encounter ribosomes which translate them into proteins. These proteins are considered to be tagged for the nucleus and therefore are allowed to pass through the membrane and accumulate in the nucleus. Both the mRNA and the protein have ubiquitination/destruction pathways that regulate their lifetime inside the cell such that the system reaches a steady state with a finite concentration of mRNA and protein. At time t = 20 min the gene is turned off and the concentration of mRNA and protein drops rapidly.
We have presented a formal, theoretically sound framework that provides reliable and accurate simulations of the diffusion-reaction process for biological systems. We compared it with the methods of Smoluchowski  and its extension by Andrews and Bray . Figure 1 illustrates the different approaches compared in this study. The first case (A) is the original Smoluchowski approach. In this approach, at each short timestep δt in the diffusion process the distance between the chemical entities is checked and if they come into close proximity (distance d <σ b ) the two entities are said to have reacted together. The downside of the approach is that many diffusion steps need to be computed to simulate the reaction kinetics accurately. The second approach (B) is that of Andrews and Bray . In their scheme, the reaction radius, σ b , is adjusted so that the correct reaction kinetics are reproduced for timesteps Δt ≥ 100 × δt. This approach produces an efficient algorithm that yields the correct reaction kinetics while using larger timesteps. Finally, (C) illustrates the present approach where the reaction radius is replaced by a smooth interaction probability. The two entities are considered to diffuse freely during the timestep Δt thereby producing a probability P AB (d, Δt) of interaction.
Although differences were expected to appear between the Andrews and Bray and the current approach in certain circumstances (such as low reactant concentrations, or in the presence of concentration gradients), the results indicate that the reactions rates produced by both methods converge. This is thought to be essentially due to the averaging that takes place as the number of interactions increases. Hence the two methods are for practical purposes equivalent (p > 0.55). It cannot be ruled out, however, that differences will appear for more complex systems. For example, in the context of reversible reactions, recombination effects might be best modelled using a probability based method. Overall, the Andrews and Bray method for simulating diffusion-reaction processes appears robust at low concentration and gradient effects. However, a possible improvement on this method would be the analytical derivation of the radius of reaction for long timesteps, in place of its present approximation. The Andrews and Bray method was consistently computationally more efficient, running up to ~15% faster depending upon the system being simulated.
An in depth theoretical analysis of the diffusion-reaction approach in the context of event driven simulations has recently been published by Zon and Wolde . Here again the aim is to increase the reach of present simulations by using longer timesteps. Using event driven simulations, the timestep can be increased substantially when reactive species are far apart or present at very low concentrations. However, as in the present work a limit on the length of the timestep is set by the requirement that they have to be short enough to ensure that an object can only interact with one other object during a timestep; this sets an upper limit to how large a timestep can be, and it remains to be shown whether they offer any clear computational advantage.
We have shown that the modified Smoluchowski method provides results that are indistinguishable from those produced using the much more elaborate and realistic model presented here, at a lower computational cost. The Andrews and Bray, radius-based, method thus appears to be the most simple, robust and efficient method for simulating diffusion-reaction processes currently available.
The authors thank Rafael Carazo-Salas and the members of the BMM Laboratory for useful discussions and insights. The authors also thank Gavin Kelly for help with the statistical analysis. This work was supported by Cancer Research UK and partially funded by an EMBO long-term research fellowship (awarded to A. L. Tournier).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.