MORPH-PRO: a novel algorithm and web server for protein morphing
© Castellana et al.; licensee BioMed Central Ltd. 2013
Received: 7 January 2013
Accepted: 29 June 2013
Published: 11 July 2013
Skip to main content
© Castellana et al.; licensee BioMed Central Ltd. 2013
Received: 7 January 2013
Accepted: 29 June 2013
Published: 11 July 2013
Proteins are known to be dynamic in nature, changing from one conformation to another while performing vital cellular tasks. It is important to understand these movements in order to better understand protein function. At the same time, experimental techniques provide us with only single snapshots of the whole ensemble of available conformations. Computational protein morphing provides a visualization of a protein structure transitioning from one conformation to another by producing a series of intermediate conformations.
We present a novel, efficient morphing algorithm, Morph-Pro based on linear interpolation. We also show that apart from visualization, morphing can be used to provide plausible intermediate structures. We test this by using the intermediate structures of a c-Jun N-terminal kinase (JNK1) conformational change in a virtual docking experiment. The structures are shown to dock with higher score to known JNK1-binding ligands than structures solved using X-Ray crystallography. This experiment demonstrates the potential applications of the intermediate structures in modeling or virtual screening efforts.
Visualization of protein conformational changes is important for characterization of protein function. Furthermore, the intermediate structures produced by our algorithm are good approximations to true structures. We believe there is great potential for these computationally predicted structures in protein-ligand docking experiments and virtual screening. The Morph-Pro web server can be accessed at http://morph-pro.bioinf.spbau.ru.
The number of solved protein structures in PDB  has grown enormously in recent years. However, the function of many proteins is highly correlated with their movement. X-Ray crystallography, which contributes most of the structures in PDB, gives us only a static view of protein structure. Recent developments in computational protein morphing [2–4] provide visualization of a molecule transitioning from one conformation to another by producing a series of intermediate conformations. In this paper we present a novel, computationally efficient algorithm for generating intermediate structures between two solved conformations of the same protein. In addition, we explore the possibility that intermediate structures generated in the morphing procedure may also represent realistic approximations of the actual protein conformational change, including the structures of the intermediate conformations.
Various attempts to predict the trajectory of proteins through conformational space have been made. Some success has been achieved through the use of elastic network models [5, 6]. However, the accuracy of these methods depends on the chosen starting conformation (either apo- or holo-) and collectivity of the atoms in the motion . Other attempts require numerous iterations of energy-minimization , which can be computationally expensive. Molecular dynamics simulations  may also be useful in determining the nature of conformational changes, but currently require significant computing power. Furthermore, motion planning techniques can be adapted to model molecular motions [10–12], providing an attractive alternative to the mentioned approaches due to their efficiency.
The most widely-used application to produce protein morphs is the Morph Server developed by Krebs and Gerstein . The goal of the Morph Server is to provide visualization and classification of protein movements. Our emphasis is on the fast generation of intermediate structures that represent realistic conformations.
Given two aligned proteins as input, our MORPH-PRO algorithm produces a series of intermediate conformations. We use linear interpolation, so that at each step every residue will move along the straight line between its current position and its ending position. Unfortunately, this can lead to biologically infeasible intermediate structures with atoms occupying the same space, incorrect bond lengths, and incorrect bond angles. Therefore, we use the atom positions generated by linear interpolation as a first approximation to the correct solution, and use a dynamic programming algorithm to ensure that certain biological constraints are satisfied. This produces structures which better resemble real proteins. Because these techniques are very efficient, our algorithm can produce many intermediate structures very quickly.
The intermediate structures produced by morphing algorithms show great promise in molecular docking . Molecular docking, which uses computer simulations to model and score protein-ligand binding, is a critical tool for drug discovery. Protein flexibility is believed to play a significant role in ligand binding . One method for including flexibility in the docking experiment is to perform ensemble docking , which uses multiple conformations of the protein for evaluation. Performing docking against several conformations of a protein has been shown to provide better screening results, than against a single static structure . The intermediate structures produced by morphing algorithms may improve our ability to detect these ligands, and therefore aide in the development of drug-like molecules .
In this section we analyze the simplest form of the morphing problem and present our MORPH-PRO algorithm. We designate P s t a r t and P e n d as the sequences of 3-D coordinates of the C α atoms for the starting and ending conformations. For simplicity, we assume that proteins P s t a r t and P e n d have an equal number of residues, and are aligned in 3-D. Later we will discuss the situation where P s t a r t and P e n d do not meet these conditions and will address various extensions to the simplest model of the protein morphing problem.
We represent a sequence of n points in 3-D (n-tuple) as a 3·n matrix (p i j ), where p i j is the i-th coordinate of the j-th point. Let n be the number of residues in P s t a r t and P e n d . Given a parameter α, we define the α-intermediate of proteins P and P′ as (1−α)·P+α·P′. The simplest way to morph P s t a r t into P e n d is to generate intermediate reconstructions (1−α)·P s t a r t +α·P e n d for 0<α<1. However, some α-intermediates may not look like real proteins, for example they may consist of consecutive C α atoms at biologically impossible distances. Below we show how to solve the protein morphing problem thereby transforming every intermediate reconstruction (being a sequence of n points) into a protein-like sequence of points. At each iteration, every point first moves by an appropriate distance towards its ending position, and then the obtained sequence of points is adjusted to become protein-like.
Below we describe the algorithm for transforming a sequence of points P into a protein-like structure P r o t e i n i z e(P).
Given a sequence P of n points, we define d j (P) as the distance between the (j)-th and the (j+1)-th points in P: . A sequence P is (a,ϵ)-equidistant if a−ϵ ≤ d j (P) ≤ a+ϵ for 1 ≤ j ≤ n−1. Protein structures exhibit a strict distance constraint between consecutive C α atoms that are 3.8 Å apart within an error margin of 0.1 Å. A sequence of points is protein-like if it is (3.8,0.1)-equidistant. We note that the consecutive C α atoms in cis-proline do not adhere to this distance rule, and these cases are not handled by our algorithm.
We define the distance d(P,P′) between two sequences P and P′, of n points each, as . An (a,ϵ)-equidistant sequence P′ is called an optimal ( a,ϵ )-equidistant approximation of P if d(P,P′) is minimum among all possible (a,ϵ)-equidistant sequences P′. Below we describe an approximate solution to the following problem:
Optimal Equidistant Sequence Problem (OESP): Given a sequence of points, find its optimal equidistant approximation.
The solution of OESP can be determined by backtracking. The time complexity of generating a protein-like conformation of C α atoms from a collection of n points, if one exists, is O(n Q2).
The above approach solves OESP and produces a (3.8,0.1)-equidistant sequence. There is more, however, to consider when defining a protein-like structure than consecutive residue distance. We now redefine the notion of a protein-like sequence of points to take into account consecutive residue angles and proximity constraints.
Given 3-D points q1, q2, and q3, a function ang(q1,q2,q3) is defined as the minor angle in degrees created by the lines through q1 and q2 and through q2 and q3, respectively. Given a sequence P of n points, we let a n g j (P)=a n g(pj−1,p j ,pj+1) for 2 ≤ j ≤ n−1. A sequence P is (a,b)-angle consistent if a° ≤ a n g j (P) ≤ b° for 2 ≤ j ≤ n−1. We observed that most C α angles in real proteins fall in the range of 70° to 120°.
Furthermore, a sequence P of points is z-distance consistent if the distance between any two non-consecutive points in P is at least z Å. We determined that a distance of 2.0 Å was typical in real proteins.
Finally, a sequence P is protein-like if it is (3.8,0.1)-equidistant, (70,120)-angle consistent, and 2.0-distance consistent.
We chose the multiplier 100 because it worked well to prevent C α clashes in our morphs. The addition of the angle and distance constraints requires O(n2Q3).
and thus, the time complexity of the simplified strategy is also O(n Q2).
The simplified strategy may provide a sub-optimal intermediate structure. However, if a structure is produced, it obeys both the angle and proximity constraints. It should be noted that the simplified strategy may fail to find a solution to OESP instances, even when a solution can be found by the advanced algorithm. The advanced algorithm looks for an optimal path among all feasible ones stretching from the first to the last lattice, while the former takes into consideration only a subset of paths. In addition, the simplified strategy may require an increase of the lattice size (see Parameter Selection), thus reducing the difference in the running time in practice of the algorithms.
Our experiments described in detail below were carried out using the simplified strategy.
Our algorithm only interpolates intermediate positions for residues which are aligned. Therefore, if the input proteins have different lengths we use the Needleman-Wunsch global sequence alignment algorithm  to align them, and reduce our starting and ending conformations to include only positions that are aligned. We chose to use a sequence-based alignment method because P s t a r t and P e n d are likely related proteins and will have similar sequences. The output of this phase of the algorithm is a set of coordinates of aligned C α’s for P s t a r t and P e n d . In this situation, the i t h residue in the alignment may not correspond to the i t h residue in P s t a r t . If the i t h and (i+1) s t residues produced from the alignment are not consecutive in P s t a r t then E S c o r e for the edge connecting them is 0. Similarly, if either the (i−1) t h and i t h or the i t h and (i+1) s t residues are not consecutive in P s t a r t then A S c o r e for the angle at the i t h residue is 0.
In order for the morphing algorithm to work, the proteins should be aligned in 3-D using a structure alignment program. In the implementation we used for the experiments described in this paper, this task is accomplished by Kabsch’s algorithm  (also see ). Our server uses the Quaternion Characteristic Polynomial (QCP) method recently proposed by .
For our experiments we set the number of intermediate structures, K, to be the rounded displacement of the largest C α movement. For example, if the greatest movement of any C α from the starting conformation and the ending conformation is 15.2 Å, then K = 15. This results in only small differences between consecutive structures.
We selected the edge length and point density for the lattices based on experimental evidence. Increasing the density of vertices in the lattice allows for a finer grained set of possible coordinates, but we found that a density higher than 6 points per Å (216 points per Å3) does not produce significantly better intermediate structures. Consequently, we fixed the density at 6 points per Å. The length of the lattice edge is set initially to 1 Å. However, if OESP solution cannot be found at this lattice size, we increase the lattice edge length (to 1.5 Å and then to 2.0 Å). If an OESP solution cannot be found with lattice edge length of 2.0 Å then our algorithm will not produce a morph.
We implemented the MORPH-PRO server using an open source web framework Ruby on Rails and SQLite3 database engine, and a new 3D graphics standard WebGL. The algorithm for protein morphing was implemented in ANSI C. We used BioRuby  – an open source bioinformatics library for Ruby – for parsing PDB files, and the QCProt 1.3 realization of the QCP algorithm for aligning proteins in 3D, distributed under a BSD open source license.
The interface allows a user to upload two PDB files containing the starting and the ending conformations, and either to explicitly indicate the number of intermediate conformations or to let it be determined automatically (based on the maximum C α displacement, as described in Parameter Selection). After the intermediate conformations are computed, the morphing process can be visualized either as a movie or step-by-step. A transformation between two consecutive conformations is accomplished via linear interpolation. A 3D chain representing a conformation can be rotated, and zoomed in and out. In addition, a user can choose an appropriate level of detail for rendering and elect to use the full algorithm or the simplified version. A publicly available archive of submitted morph requests is stored on the server in an SQLite3 database, making it easier to re-run the algorithm on the same input.
We evaluate our morphs by looking at both the biological feasibility of each individual structure, as well as the series of structures as a whole. We evaluate our morphs by comparing to proteins which have 3 or more solved structures in PDB, as proposed by . In many instances, multiple conformations of the same protein are not available. Instead, we used proteins from the same family with nearly identical sequences as endpoints in our morph.
We created a morph between two members of the pyrophosphokinase family (PDB codes: 1DY3, 1RAO). The alignment produced 158 residues with a maximum C α displacement of 22 Å. The RMSD between the starting structure and the ending structure is 4.07 Å.
It is also beneficial to look at the intermediate structures in the context of the entire morph. We have shown that our intermediates are protein-like, and we now demonstrate that the series of intermediate structures closely mimics the series of conformations a protein would visit. If multiple conformations of the same protein are known, then we can compare our predicted trajectory to the solved trajectory by calculating the RMSD between our intermediates and the experimentally solved intermediates. However, alternate conformations were not available for these proteins, so instead we used solved structures for proteins in the pyrophosphokinase family.
RMSD of predicted structures to solved intermediate structures
RMSD to 1RB0(Å)
RMSD to 1HKA(Å)
The technique of looking at RMSD of the intermediate structures to known structures is most useful when X-Ray structures of actual intermediate conformations are available. There are three conformations solved for the F1-ATPase molecular motor (PDB code: 1E79) which exhibit a subtle change. The RMSD between the starting and ending conformations is 1.78 Å. The protein has 492 residues and the largest movement of a C α is 11 Å. We produce a morph of 11 total structures from 1E79A to 1E79C.
Virtual screening  is a technique which simulates the binding of a protein and a ligand, in order to determine the best ligand candidates from a large database. Most often, virtual screening is used as part of a drug development pipeline, guiding the selection of likely drug candidates. The predicted binding affinity of a ligand for a protein is determined by a docking algorithm, which finds the orientation and location of the ligand with respect to the protein. Modeling protein flexibility is very difficult due to the large degrees of freedom of a protein structure [13, 31]. One promising approach to implicitly incorporating protein flexibility is to dock against an ensemble of static protein structures .
If multiple conformations of the target protein are solved using NMR or X-Ray studies, these are good candidates for ensemble docking. However, in the more common case of unknown intermediate conformations a computational method can provide accurate models more quickly. Use of computationally-produced intermediates in virtual screening has shown promising results .
To test the potential for our intermediate structures in virtual screening we examined docking scores of our structures versus those solved experimentally against a small database of ligands. First, we produced a morph of the c-Jun N-terminal kinase 1 (JNK1). The starting conformation of this protein (1UKH) was solved complexed with a peptide (pepJIP1) derived from the binding portion of the scaffolding protein JIP1. The ending conformation (1UKI) was solved complexed with pepJIP1 and the ATP mimic SP600125. The binding of pepJIP1 to the JIP1 binding site on JNK1 causes a small conformational change at the ATP site. Though the movement is small, it produces a morph of 3 intermediates (P2,P3,P4) in addition to the starting and ending conformations. The absent backbone atoms and side chains of each intermediate structure were reconstructed using Maxsprout , and energy minimization was performed using Swiss-PDB viewer . As a basis for comparison, the X-Ray structures of 1UKH and 1UKI were also reduced to their C α’s and then reconstructed in the same manner to produce P1 and P5, respectively.
Binding affinities for 5 JNK1 putative ligands
1U K I
1U K H
1U K I
1U K I
1U K H
1U K I
1U K I
1U K H
1U K H
1U K H
The first column behaves as expected. The structure which has the highest binding affinity for SP600125 is 1UKI which is the structure of JNK1 complexed with SP600125. The X-Ray structures docked with SP600125 rank significantly higher than the reconstructed P1 and P5. This suggests that better side chain reconstruction could greatly improve the docking results.
For three of the other ligands, the second intermediate structure, P2 scores higher than any other intermediate structure as well as any X-Ray structure. This demonstrates that our intermediate structures would be more likely to identify ligands which bind to JNK1 than either of the two X-Ray structures.
It is clear that there is much to learn about the nature of protein structure dynamics that is not addressed in the static information contained in PDB. The intermediate structures representing a protein as it moves from one conformation to another may yield much information about how a protein functions. Experimental techniques are inadequate for this task due to practical and technological limitations. For this reason, structural biology is in great need of algorithms which can accurately predict the intermediate structures as a protein undergoes a conformational change.
While other morphing algorithms require computationally expensive energy and elastic network modeling calculations, our morphing algorithm is based on a few simple observations of protein structure, and therefore produces multiple intermediate conformations very quickly. Our intermediate structures represent possible protein structures, and demonstrate the motion of a protein as it changes between conformations. In the case of morphing between homologs, the intermediate structures give us clues to how protein structures have evolved.
The morphed structures also show promise in the area of virtual screening. Most techniques limit protein flexibility to the side chain atoms, and may allow limited flexibility of the substrate. Our morph produces intermediate structures which are hypotheses for possible backbone movements. For this reason, some ligands bound more favorably to our intermediate structures than the solved structures. These are strong implications for the potential of morphs in guiding drug development.
Like all other approaches, our algorithm also has limitations. Linear interpolation, with only small corrections, prevents our method from correctly producing a morph for proteins with very large or complex movements. Many of these morphs could be solved by allowing a larger movement from the first approximation (a larger lattice), or allowing higher granularity of possible C α positions (more points in each lattice) but the time cost would be significant. Clearly, in protein morphing there is a trade-off between speed and accuracy.
Root mean square deviation
Protein Data Bank.
We would like to acknowledge the valuable input provided by Mallika Veeramalai, Piotr Cieplak, and Lukasz Jaroszewski (Joint Center for Structural Genomics). We thank Hongbin Yuan and Maurizio Pellechia at the Sanford-Burnham Medical Research Institute for input on protein docking. We also thank all the members of the Joint Center for Molecular Modeling for helpful discussions. The Joint Center for Molecular Modeling is funded by the National Institute of General Medical Sciences [P20 GM07622] and is a part of the Protein Structure Initiative. NEC was supported in part by the NSF IGERT training grant [DGE-0504645]. PAP, AL and KV were supported by the Government of the Russian Federation (grant 11.G34.31.0018).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.