An efficient algorithm for protein structure comparison using elastic shape analysis
 S. Srivastava^{1, 2},
 S. B. Lal^{1},
 D. C. Mishra^{1},
 U. B. Angadi^{1},
 K. K. Chaturvedi^{1},
 S. N. Rai^{2, 3} and
 A. Rai^{1, 4}Email author
DOI: 10.1186/s1301501600891
© The Author(s) 2016
Received: 9 March 2016
Accepted: 21 September 2016
Published: 29 September 2016
Abstract
Background
Protein structure comparison play important role in in silico functional prediction of a new protein. It is also used for understanding the evolutionary relationships among proteins. A variety of methods have been proposed in literature for comparing protein structures but they have their own limitations in terms of accuracy and complexity with respect to computational time and space. There is a need to improve the computational complexity in comparison/alignment of proteins through incorporation of important biological and structural properties in the existing techniques.
Results
An efficient algorithm has been developed for comparing protein structures using elastic shape analysis in which the sequence of 3D coordinates atoms of protein structures supplemented by additional auxiliary information from sidechain properties are incorporated. The protein structure is represented by a special function called squareroot velocity function. Furthermore, singular value decomposition and dynamic programming have been employed for optimal rotation and optimal matching of the proteins, respectively. Also, geodesic distance has been calculated and used as the dissimilarity score between two protein structures. The performance of the developed algorithm is tested and found to be more efficient, i.e., running time reduced by 80–90 % without compromising accuracy of comparison when compared with the existing methods. Source codes for different functions have been developed in R. Also, user friendly webbased application called ProtSComp has been developed using above algorithm for comparing protein 3D structures and is accessible free.
Conclusions
The methodology and algorithm developed in this study is taking considerably less computational time without loss of accuracy (Table 2). The proposed algorithm is considering different criteria of representing protein structures using 3D coordinates of atoms and inclusion of residue wise molecular properties as auxiliary information.
Keywords
Protein structure comparison Backbone atoms Geodesic distance Side chain propertiesBackground
Comparison of protein structures is an important for understanding structural, functional and evolutionary relationship among protein specially in case of novel proteins [1]. In addition to this, it is being extensively used for identifying homologous residues [2, 3], finding recurrent folds [4], identifying structural motifs and functional sites, searching similar structure in structural database, predicting interaction among residues/proteins, and hierarchical classification of proteins [5–10]. Structural analysis of proteins is much more important than sequence analysis as protein structures are more conserved than sequences [1, 11]. The comparison of protein can also be used for evaluation of sequence alignment methods [12, 13], prediction of unknown protein structures and evaluation of predicted 3D structure of a protein.
In the last two decades, research in the area of protein structure comparison has gained momentum but the problem of finding optimal alignment having significant role in biological context still continues [1]. Number of methods for comparing two protein structures has been proposed in the literature. These methods are either based on various distance measures or scoring schemes. There is strong need to develop standard scoring function [14, 15] based on strong theoretical foundation as majority of existing techniques are heuristic in nature [1]. These existing techniques are not only less accurate but have more computational time and space complexity [16]. Hence, there is a scope for improvement in the existing methods for better comparison of protein structures [1, 15, 17].
Algorithms of two protein 3D structures comparison approaches can be broadly classified into two categories, i.e., (1) is based on rigid body alignment by super positioning protein structures heuristically with scaling, rotation, transformation and then superpositioning [18] and (2) based on fragmentation of structures and assembling by nonsequential alignment [18, 19]. The techniques of first category can perform better when the protein structures are small and each having equal number of residues in their sequences. The basic limitations of second category are selection of appropriate fragments size, computational time and space complexity for alignments. Various metrics for comparing and scoring identity between two protein structures are employed in both category of approaches, but the most commonly used are p values and root mean square deviation (RMSD). These metrics are rarely used for protein structure comparison with respect to single technique. Further, method such as Distance mAtrix aLIgnment (DALI) employ similarity score which is not a metric but it uses heuristic rule to search the neighborhoods based on strong matches [20]. Comparing of these techniques with respect to implementation and their practical utilities, these methods are difficult to use practically due to space and time complexity [21].
Recently, an attempt has been made for protein structure comparison using geodesic distance as dissimilarity score based on a particular Riemannian metric [22]. In this technique 3D coordinates of backbone atoms have been used to derive parameterized curve in real numbers in three dimensional space i.e. R^{3}, for representing the protein structures. The alignment of two protein structures is being defined as the alignment of the two curves derived from backbone atoms of two structures i.e., one from each protein. Each of these parameterized curve is represented by a special function called square root velocity function (SRVF). Further, shapes comparison has been done after removing all shape preserving transformations from these curves. It has been pointed out that this comparison can be improved further by using higher dimensional composite curves by concatenating the geometric (3D) coordinates with primary and secondary structures as auxiliary coordinates [23, 24] and side chain atoms. These side chain atoms play an important role in determination of protein structure and consequently protein functions. The orientations of side chains and molecular properties of residues have significant effect on protein conformational dynamics and hence the protein function [25]. Therefore, the inclusion of the side chain atoms and molecular properties are likely to improve this protein structures comparative analysis and it may lead to a better alignment as compared to the alignment obtained from existing techniques.
Therefore, in this study an attempt has been made to develop a method/algorithm based on the elastic shape analysis [26–29] considering both geometrical and molecular properties of protein. In the proposed algorithm, side chain atoms along with molecular properties such as hydrophobicity, polarity, orientation (dihedral angles), mass of residues, functional group type (aliphatic, acyclic, hydroxyl or sulphurcontaining, aromatic) and number of sidechain atoms as auxiliary information have been included. The proposed technique requires significantly less time without compromising with the accuracy for comparing protein structures. The developed algorithm has been implemented using open source R software. The method has been elaborated stepwise in the “Proposed algorithm” section. The performance of the developed method was compared with the existing methods i.e., ESA [22, 23], combinatorial extension (CE) [30] and jFATCAT [31], Matt [32], multiple structural alignment algorithm (MUSTANG) [33] for which the details are provided in the “Results and discussion” section. Our method was found to be more accurate for classification purpose and efficient in terms of computational time.
Proposed algorithm
The concept of shape elastic metric has been employed for calculating deformation and quantifying the difference between two 3D structures of proteins. This concept of shape and shape metric was developed by Kendall [34] for quantification and modelling of shapes. This includes analysis of shapes, detecting and tracking patterns in the images, classification and clustering of images, finding trajectory and path of objects, morphological changes in objects, etc. Further, it has been observed that SRVF and elastic metric performed better in comparison to their counterparts during its applications in many fields such as image analysis, movies analysis, RNA and protein 3D structure comparison etc. [22, 35]. Hence, in this study, SRVF and shape elastic metric have been employed for comparing proteins 3D structures.
An algorithm for comparison of two protein 3D structures based on elastic shape analysis [22, 34, 35] has been developed and implemented as web based tool for comparing two protein structures. This tool requires PDB files [36] as input and provides geodesic distance along with graphical display of optimal matching and superposed protein curves as an output for visualization.
a. Algorithm
In the proposed algorithm, both geometric properties from 3D coordinates of atoms and molecular properties having significant role in protein folding were considered to derive a curve from protein structure (PDB file). Geometric properties are derived in three criteria from 3D coordinates of atoms for each residue of a protein, i.e., (1) by using the backbone (N, C_{α} and C) atoms (ESABB), (2) using C_{α} atoms only (ESACA) and (3) the mean coordinates of backbone atoms for each residue (ESAMCBB). Additionally, dihedral angles (phi, psi and omega) are included as compulsion in criteria (2) and (3). The molecular properties considered for development of this algorithm are hydrophobicity, polarity, mass of residues functional group type (aliphatic, acyclic, hydroxyl or sulphurcontaining, aromatic) and number of sidechain atoms. These factors are used as auxiliary information [37–40]. In case of glycine, only the backbone atoms are being considered as an exception.
Step 1
Here, the superscript j, j = 1 and 2, refers to the protein 1 and protein 2, respectively. The subscript (3 + k) refers to the first 3 i.e. x, y, z coordinates of atoms and k coordinates are auxiliary information.
Step 2
Step 3
These values are arranged in increasing order and then the unique values are merged. It may be noted that the value of n will lie between max(n _{ 1 }, n_{2}) and n _{ 1 } + n _{ 2 } − 2. The recalculated SRVFs, \({\bf{Q}}_{(3 + k) \times n}^{(1)}\) and \({\bf{Q}}_{(3 + k) \times n}^{(2)}\) corresponding to new T can be conveniently represented by Q _{ 1 } and Q _{ 2 } for protein 1 and protein 2, respectively.
Step 4
 4.1
SVD (A) = USV ^{ T }, where A = Q _{ 1 } Q _{ 2 } ^{ T }
 4.2
Optimal rotation matrix, R _{3×3} = USV ^{ T }
 4.3
The final optimal rotation matrix, R _{ A } with (3 + k) × (3 + k) dimension:
 4.4
Rotate the second curve with respect to first curve, i.e., \({\mathbf{Q}}_{{{\mathbf{2R}}}} {\mathbf{ = Q}}_{{\mathbf{2}}} {\mathbf{R}}_{{\mathbf{A}}}\)
Step 5
 5.1
At first, compute the weights of all edges,
EW (r, s) = edge weight calculation between vertex for r = 1 to n vertices of Q _{ 1 } and s = 1 to n vertices of \({\mathbf{Q}}_{{{\mathbf{2R}}}}\)
 5.2
Find out the shortest path using Floyd–Warshall allpairs shortestpath algorithm and matching of edge weights
 5.3
Obtain G (gamma function values), T _{ g } (gamma change point parameter values) and the minimum distance (squared L^{2} distance between matched curves)
 5.4
Obtained second curve \(({\mathbf{Q}}_{{2{\text{R}}}}^{ *} )\) after optimal reparameterization.
Step 6
The same procedure as given in step 3 is used to calculate a new change point parameter T _{ r } and the corresponding SRVFs, Q _{ 1r } and Q _{ 2r } are recalculated. Finally, obtained geodesic distance [θ = cos−1(d)] between the curves, where \({\text{d}}= ({\mathbf{Q}}_{{{\mathbf{1r}}}} .{\mathbf{Q}}_{{{\mathbf{2r}}}} ){\mathbf{T}}_{{\mathbf{r}}}^{{\mathbf{T}}}\). The symbol ‘.’ represents the dot product of the matrices.
b. Evaluation criteria
The proposed algorithm has been implemented in R software. In order to evaluate the performance of the proposed algorithm for protein 3D structure comparison with existing algorithms i.e., (1) CE, (2) jFATCAT and (3) ESA, the benchmark data was collected from the literature [23]. Further, distance matrices based on all four 3D structure comparison algorithms mentioned above have been obtained for the benchmark data. The performance of the 3D structure protein comparison algorithms can be evaluated through cluster analysis using distance matrices. Different statistical performance measures such as rand index, precision, recall and Fmeasure were used for this evaluation.
R package development
Benchmark data
Two datasets of protein structures from structural classification of proteins (SCOP) [6, 7] database have been taken as benchmark datasets. These datasets were also used by Liu et al. [23] for evaluation of algorithms for 3D structure comparison of proteins. First dataset comprises of 50 proteins from five important SCOP classes with 10 proteins from each class, i.e., class I [All α proteins], class II [All β proteins], class III [α and β proteins (α/β)], class IV [α and β proteins (α + β)] and class V [Multidomain proteins]. Second dataset consists of 100 proteins structures from three important classes, having 45 proteins from class I, 40 from class II and 15 from class III of SCOP database.
Computation of distance matrix
The distance matrix of size N×N for N protein structures were computed for all four algorithms i.e.,, (1) CE, (2) jFATCAT, (3) ESA and (4) proposed algorithm, The distance matrices for first three existing algorithms are based on 3D coordinates of backbone atoms, however the proposed method also incorporates auxiliary information along with these 3D coordinates. In order to make this distance matrix uniform, a sigmoid function has been used for conversion of values of geodesic distance and RMSD to common similarity measures between 0 and 1.
Performance measures
Confusion matrix
Group  Predicted class 1  Predicted class 2  …  Predicted class i  …  Predicted class n 

True class 1  M_{11}  M_{12}  …  M_{1i}  …  M_{1n} 
True class 2  M_{21}  M_{22}  …  M_{2i}  …  M_{2n} 
:  :  :  …  :  …  : 
True class i  M_{i1}  M_{i2}  …  M_{ii}  …  M_{in} 
:  :  :  …  :  …  : 
True class n  M_{n1}  M_{n2}  …  M_{ni}  …  M_{nn} 
Results and discussion
Performance measures of 100 proteins dataset from ESA, CE and jFATCAT methods at class level with computational time
Method/levels  Time (hours) for N×N comparison  Measure  Spectral Kmeans  Kmeans  Fuzzy Cmeans 

CE  126.18  Precision  0.9600  0.8622  0.7141 
Recall  0.9333  0.7573  0.9792  
Fmeasure  0.9465  0.8064  0.8259  
RI  0.9694  0.9538  0.9226  
jFACTCAT  019.14  Precision  0.6653  0.4929  0.5058 
Recall  0.6043  0.5019  0.6741  
F measure  0.6333  0.4974  0.5780  
RI  0.8554  0.8430  0.8154  
Original ESA  020.40  Precision  0.8396  0.5075  0.4812 
Recall  0.7563  0.7744  0.6347  
F measure  0.7957  0.6132  0.5474  
RI  0.9420  0.8248  0.8032  
ESAMCBB  002.20  Precision  0.7767  0.5523  0.5710 
Recall  0.9275  0.6277  0.5232  
F measure  0.8454  0.5876  0.5461  
RI  0.9359  0.8440  0.8338  
ESAMCBB + HP  002.20  Precision  0.9168  0.5058  0.5699 
Recall  0.8400  0.7925  0.5307  
F measure  0.8767  0.6175  0.5496  
RI  0.9557  0.8298  0.8369  
ESAMCBB + POL  002.20  Precision  0.8974  0.5416  0.5576 
Recall  0.8165  0.6000  0.5088  
F measure  0.8551  0.5693  0.5321  
RI  0.9444  0.8159  0.8322  
ESACA  002.20  Precision  0.8572  0.5075  0.5322 
Recall  0.7621  0.7744  0.4800  
F measure  0.8069  0.6132  0.5048  
RI  0.9364  0.8961  0.8234  
ESACA + HP  002.20  Precision  0.8495  0.7588  0.5576 
Recall  0.7525  0.6997  0.5088  
F measure  0.7981  0.7281  0.5321  
RI  0.9411  0.9020  0.8322  
ESACA + POL  002.20  Precision  0.8572  0.5058  0.5205 
Recall  0.7621  0.7925  0.4672  
F measure  0.8069  0.6175  0.4924  
RI  0.9297  0.8388  0.8194 
The proposed algorithm was evaluated with existing algorithms based on computational time (Table 2). It is observed from the table that the computational time required for comparison of 100 proteins dataset for CE, jFATCAT and Original ESA are more i.e., 126.18, 19.14, 20.40 h respectively. However, our proposed algorithm takes considerably less time i.e., 2.20 h. Therefore, our algorithm is quite efficient in terms of computational time.
It has been reported earlier [23] that original ESA, which is based on all backbone atoms of the protein structures, time consumed to perform the experiment of 100 protein structures comparison was recorded on a desktop computer (8 GB RAM; 64bit Windows 7 OS; MATLAB version 7.9.0) was 59 h but when it is implemented in R, it took 20.40 h. Under the same setup, the proposed algorithm implemented using R (version 3.1.3), the computing time varied from 2.80 to 3.00 h. As per algorithm, we employed three different criterion to evaluate variation in the results based on various geometric properties such as (1) backbone atoms, (2) calpha and (3) centroid of backbone atoms along with orientation (dihedral angles). In addition to this, the auxiliary information i.e., hydrophobicity and polarity for each amino acid in a protein are considered. In view of time complexity, the earlier ESA method used 3D coordinates of all backbone atoms (N, Cα and C) [23]. In this case, if there are n number of amino acids (or residues) then the length of curve will be 3n as each amino acid is being represented by three atoms. The rest of the criterion i.e., (2) and (3) are based on n number of centroid 3D coordinates, five molecular properties and three geometric properties as dihedral angles. These dihedral angles are phi, psi and omega for each amino acid, and five molecular properties as mentioned above. Hence, the proposed algorithm is faster than the existing ESA [23] as the proposed algorithm is based on n number of data and earlier ESA is 3n in data size that reduces the onethird of the computational time without much compromising on performance.
The performance of Spectral KMeans clustering is better for comparison of various algorithms in terms of precision followed by Fuzzy CMeans clustering. However, results obtained by KMeans clustering techniques are not satisfactorily in terms of precision.
The performance of CE in terms of recall, Fmeasure and RI is much better in comparison to all existing methods i.e. jFATCAT, original ESA etc. However, in case of proposed algorithm (ESAMCBB), recall and RI are comparable with CE through Spectral KMean clustering. It may be noted that computational time for CE is 126.18 h whereas proposed algorithm takes around 2.20 h for same task.
In terms of RI, the accuracy for the first and second set of proteins increased up to 88.72 and 95.57 %, respectively when hydrophobicity was included as auxiliary information. It was also observed that the RI of the protein structures of second set shows 94.11 % accuracy when distance was calculated using 3D coordinates of C_{α} atoms and hydrophobicity as the auxiliary information. This may be due to the fact that the proposed algorithm used only single coordinate for each residue as centroid of backbone atoms or C_{α} with dihedral angles (phi, psi and omega). These dihedral angles are indirectly using all coordinates by single data point with three more additional parameters. The proposed algorithm also included molecular properties of each residue and hence the results of proposed algorithm are comparable with ESA of all backbone atoms.
Computational time (in seconds) required in comparing two protein structures using different methods
Method  ~100 residues  ~200 residues  ~300 residues 

Matt  1.300  3.000  5.100 
MUSTANG  0.160  2.300  2.100 
ESA  1.200  2.600  15.000 
Proposed method (ESAMCBB)  0.740  1.040  1.540 
Proposed method (ESACA)  0.556  0.745  1.466 
The proposed method performed better in terms of classification accuracy due to the inclusion of side chain/amino acid properties. This is due to the fact that inclusion side chain/amino acid properties provides more appropriate representations of protein structures as per elastic shape analysis. Further, hydrophobicity plays important role in the folding of protein structures as the hydrophobic residues tend to moves towards inner structure of the protein whereas, hydrophilic atoms moves towards the protein surface during protein folding [37]. Therefore, incorporation of this feature as auxiliary information led to the improvement in classification of proteins.
Web server implementation
Conclusions
With the advent of highthroughput methods, the availability of structural information of proteins is increasing at a much accelerated pace. There is a requirement of automatic annotation and classification of proteins in order to save resources in terms of time. Therefore, the fast and efficient algorithm is developed that will find the best alignment between two protein structures.
In this study, a computationally efficient algorithm has been developed in terms of run time for comparing protein structures based on ESA approach. The 3D coordinates of protein backbone atoms using different criteria have been used including the auxiliary information based on sidechain properties residue wise. The proposed algorithm has been developed using R.
The proposed algorithm performed equally well in terms of accuracy with respect to existing techniques due to the inclusion of side chain and amino acid properties. Inclusion of hydrophobicity as auxiliary information shows better result since it plays important role in the folding of protein structures. Incorporation of molecular properties as auxiliary information led to the improvement in comparison of two protein 3D structures. The proposed algorithm is faster in terms of computational time than the existing algorithm since it is based on n number of data instead of 3n in data size employed by existing algorithms.
Abbreviations
 RMSD:

root mean square deviation
 DALI:

Distance mAtrix aLIgnment
 SRVF:

square root velocity function
 SVD:

singular value decomposition
 GUI:

graphical user interface
 UPGMA:

unweighted pair group method with arithmetic mean
 RI:

rand index
 CE:

combinatorial extension
 ESA:

elastic shape analysis
 SCOP:

structural classification of proteins
Declarations
Authors’ contributions
Conceptualization of research work by SS, DCM and AR. Coding and implementation by SS, UBA and SBL. Analysis of algorithm and writing of the manuscript by SS, UBA and KKC. Valuable suggestions to improve the manuscript by AR and SNR. SNR was supported partially by Dr. Miller, Director James Graham Brown Cancer Center and Wendell Cherry Chair in Clinical Trial Research. All authors read and approved the final manuscript.
Acknowledgements
Authors wish to acknowledge the Indian Council of Agricultural Research (ICAR) for providing ASHOKA (Advanced Supercomputing Hub for Omics Knowledge in Agriculture) to conduct the study.
Competing interests
The authors declare that they have no competing interests.
Availability
The webbased application is accessible from http://www.backwin.cabgrid.res.in:8080/ProtSComp.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Kolodny R, Koehl P, Levitt M. Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J Mol Biol. 2005;346(4):1173–88.View ArticlePubMedPubMed CentralGoogle Scholar
 Thompson JD, Plewniak F, Poch O. BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics. 1999;15:87–8.View ArticlePubMedGoogle Scholar
 Sauder JM, Arthur JW, Dunbrack RL. Large scale comparison of protein sequence alignment algorithms with structure alignments. Proteins Struct Func Genet. 2000;40:6–22.View ArticleGoogle Scholar
 Novotny M, Madsen D, Kleywegt GJ. Evaluation of proteinfoldcomparison servers. Proteins Struct Funct Genet. 2004;54:260–70.View ArticlePubMedGoogle Scholar
 Holm L, Ouzounis C, Sander C, Tuparev G, Vriend G. A database of protein structure families with common folding motifs. Protein Sci. 1992;1(12):1691–8.View ArticlePubMedPubMed CentralGoogle Scholar
 Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995;247:536–40.PubMedGoogle Scholar
 Lo Conte L, Ailey B, Hubbard TJ, Brenner SE, Murzin AG, et al. SCOP: a structural classification of proteins database. Nucl Acids Res. 2000;28:257–9.View ArticlePubMedPubMed CentralGoogle Scholar
 Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, et al. CATH—a hierarchic classification of protein domain structures. Structure. 1997;5:1093–108.View ArticlePubMedGoogle Scholar
 Orengo CA, Pearl FM, Thornton JM. The CATH domain structure database. Methods Biochem Anal. 2003;44:249–71.PubMedGoogle Scholar
 Shindyalov IN, Bourne PE. An alternative view of protein fold space. Proteins Struct Funct Genet. 2000;38:247–60.View ArticlePubMedGoogle Scholar
 Perutz MF, Rossmann MG, Cullis AF, Muirhead H, Will G, North ACT. Structure of myoglobin: a threedimensional Fourier synthesis at 5.5 Angstrom resolution, obtained by Xray analysis. Nature. 1960;185:416–22.View ArticlePubMedGoogle Scholar
 Domingues F, Lackner P, Andreeva A, Sippl M. Structurebased evaluation of sequence comparison and fold recognition alignment accuracy. J Mol Biol. 2000;297:1003–13.View ArticlePubMedGoogle Scholar
 Friedberg I, Kaplan T, Margalit H. Evaluation of PSIBLAST alignment: accuracy in comparison to structural alignments. Protein Sci. 2000;9:2278–84.View ArticlePubMedPubMed CentralGoogle Scholar
 Mayr G, Domingues FS, Lackner P. Comparative analysis of protein structure alignments. BMC Struct Biol. 2007;7:50.View ArticlePubMedPubMed CentralGoogle Scholar
 Hasegawa H, Holm L. Advances and pitfalls of protein structural alignment. Curr Opin Struct Biol. 2009;19:341–8.View ArticlePubMedGoogle Scholar
 Wohlers I, MalodDognin N, Andonov R, Klau GW. CSA: comprehensive comparison of pairwise protein structure alignments. Nucl Acids Res. 2012;40:W303–9.View ArticlePubMedPubMed CentralGoogle Scholar
 Levitt M, Gerstein M. A unified statistical framework for sequence comparison and structure comparison. Proc Natl Acad Sci USA. 1998;95:5913–20.View ArticlePubMedPubMed CentralGoogle Scholar
 Eidhammer I, Jonassen I, Taylor WR. Protein bioinformatics: an algorithmic approach to sequence and structure analysis. New York: Wiley; 2004. ISBN 0470848391.Google Scholar
 Salem S, Zaki MJ, Bystroff C. FlexSnap: Flexible Nonsequential Protein Structure Alignment. Algorithms Mol Biol. 2010;5:12. doi:10.1186/17487188512.View ArticlePubMedPubMed CentralGoogle Scholar
 Holm L, Sander C. Protein structure comparison by alignment of distance matrices. J Mol Biol. 1993;233(1):123–38.View ArticlePubMedGoogle Scholar
 Li SC. The difficulty of protein structure alignment under the RMSD. Algorithms Mol Biol. 2013;8:1.View ArticlePubMedPubMed CentralGoogle Scholar
 Liu W, Srivastava A, Zhang J. A mathematical framework for protein structure comparison. PLoS Comput Biol. 2011;7(2):e1001075. doi:10.1371/journal.pcbi.1001075.View ArticlePubMedPubMed CentralGoogle Scholar
 Liu W, Srivastava A, and Zheng J. Protein structure alignment using elastic shape analysis. In: Proceedings of the first ACM international conference on bioinformatics and computational biology (BCB ’10). New York: ACM; 2010. p. 62–70. doi:10.1145/1854776.1854790.
 Schenk GF, Margraf T, Torda AE. Protein sequence and structure alignments within one framework. Algorithms Mol Biol. 2008;3:4. doi:10.1186/1748718834.View ArticlePubMedPubMed CentralGoogle Scholar
 Gajula, MNVP: Computer simulation meets experiment: molecular dynamics simulations of spin labeled proteins. PhD Thesis, Osnabrueck: University of Osnabrueck; 2008.
 Klassen E, Srivastava A, Mio W, Joshi SH. Analysis of planar shapes using geodesic paths on shape spaces. IEEE Trans Pattern Anal. 2004;26:372–83.View ArticleGoogle Scholar
 Joshi SH, Srivastava A, Klassen E and Jermyn I. A novel representation for computing geodesics between ndimensional elastic curves. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) June 2007, Minneapolis.
 Mio W, Srivastava A, Joshi S. On shape of plane elastic curves. Int J Comput Vision. 2007;73:307–24.View ArticleGoogle Scholar
 Srivastava A, Klassen E, Joshi SH, Jermyn IH. Shape analysis of elastic curves in euclidean spaces. IEEE Trans Pattern Anal Mach Intell. 2011;33:1415–28.View ArticlePubMedGoogle Scholar
 Shindyalov IN, Bourne PE. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 1998;11:739–47.View ArticlePubMedGoogle Scholar
 Ye Y, Godzik A. FATCAT: a web server for flexible structure comparison and structure similarity searching. Nucl Acids Res. 2004;32(Web Server issue):W582–5. doi:10.1093/nar/gkh430.View ArticlePubMedPubMed CentralGoogle Scholar
 Menke M, Berger B, Cowen L. Matt: local flexibility aids protein multiple structure alignment. PLoS Comput Biol. 2008;4:e10.View ArticlePubMedPubMed CentralGoogle Scholar
 Konagurthu AS, Whisstock JC, Stuckey PJ, Lesk AM. MUSTANG: a multiple structural alignment algorithm. Proteins. 2006;64:559–74.View ArticlePubMedGoogle Scholar
 Kendall DG. Shape manifolds, procrustean metrics and complex projective spaces. Bull London Math Soc. 1984;16(2):81–121.View ArticleGoogle Scholar
 Laborde J, Robinson D, Srivastava A, Klassen E, Zhang J. RNA global alignment in the joint sequencestructure space using elastic shape analysis. Nucl Acids Res. 2013;41(11):e114. doi:10.1093/nar/gkt187.View ArticlePubMedPubMed CentralGoogle Scholar
 Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, et al. The protein data bank. Nucl Acids Res. 2000;28:235–42.View ArticlePubMedPubMed CentralGoogle Scholar
 Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J Mol Biol. 1983;157(1):105–32 (PMID 7108955).View ArticleGoogle Scholar
 Zimmerman JM, Eliezer N, Simha R. The characterization of amino acid sequences in proteins by statistical methods. J Theor Biol. 1968;21:170–201 (PMID: 5700434).View ArticlePubMedGoogle Scholar
 Grantham R. Amino acid difference formula to help explain protein evolution. Science. 1974;185:862–4 (PMID: 4843792).View ArticlePubMedGoogle Scholar
 Morris AL, MacArthur MW, Hutchinson EG, Thornton JM. Stereochemical quality of protein structure coordinates. Proteins. 1992;12(4):345–64.View ArticlePubMedGoogle Scholar
 R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing 2014, Vienna. http://www.Rproject.org/.
 Grant BJ, Rodrigues APC, ElSawy KM, McCammon JA, Caves LSD. Bio3D: an R package for the comparative analysis of protein structures. Bioinformatics. 2006;22:2695–6.View ArticlePubMedGoogle Scholar
 Idé J. Rpdb: Read, write, visualize and manipulate PDB files. R package version 2.2. 2014. http://www.CRAN.Rproject.org/package=Rpdb.
 Adler D, Murdoch D, et al. rgl: 3D visualization device system (OpenGL). R package version 0.95.1201. 2014. http://www.CRAN.Rproject.org/package=rgl.