Efficient unfolding pattern recognition in single molecule force spectroscopy data
© Andreopoulos and Labudde; licensee BioMed Central Ltd. 2011
Received: 9 July 2010
Accepted: 6 June 2011
Published: 6 June 2011
Single-molecule force spectroscopy (SMFS) is a technique that measures the force necessary to unfold a protein. SMFS experiments generate Force-Distance (F-D) curves. A statistical analysis of a set of F-D curves reveals different unfolding pathways. Information on protein structure, conformation, functional states, and inter- and intra-molecular interactions can be derived.
In the present work, we propose a pattern recognition algorithm and apply our algorithm to datasets from SMFS experiments on the membrane protein bacterioRhodopsin (bR). We discuss the unfolding pathways found in bR, which are characterised by main peaks and side peaks. A main peak is the result of the pairwise unfolding of the transmembrane helices. In contrast, a side peak is an unfolding event in the alpha-helix or other secondary structural element. The algorithm is capable of detecting side peaks along with main peaks.
Therefore, we can detect the individual unfolding pathway as the sequence of events labeled with their occurrences and co-occurrences special to bR's unfolding pathway. We find that side peaks do not co-occur with one another in curves as frequently as main peaks do, which may imply a synergistic effect occurring between helices. While main peaks co-occur as pairs in at least 50% of curves, the side peaks co-occur with one another in less than 10% of curves. Moreover, the algorithm runtime scales well as the dataset size increases.
Our algorithm satisfies the requirements of an automated methodology that combines high accuracy with efficiency in analyzing SMFS datasets. The algorithm tackles the force spectroscopy analysis bottleneck leading to more consistent and reproducible results.
Keywordsprotein unfolding single-molecule force spectroscopy pattern recognition Force-Distance curve
Mutations cause structural instabilities in a protein leading it to misfold. The misfolded protein conformation may interrupt ion transport and signal transduction. Protein instability and misfolding cause disease states, including cystic fibrosis, Charcot-Marie-Tooth disease, arrhythmias, hearing loss and retinitis pigmentosa .
To distinguish F-D curves showing different protein unfolding pathways, and draw statistical conclusions on the unfolding events' locations (amino acids), their occurrences, and their co-occurrences with other events, one must be able to analyse a large number of F-D curves by objective procedures . The manual analysis is known to be slow and subject to human errors . There is a need for data analysis and pattern recognition algorithms that offer fully automated processing of large SMFS datasets on the basis of objective criteria . The scientific analysis of F-D curves should reveal the molecular interactions and different unfolding pathways. So far, various software packages have been developed to analyze SMFS data [10–12]. In this paper, we propose an algorithm for an automated classification and analysis of F-D curves. We apply and evaluate our method on a dataset of unfolding experiments performed on the bacterioRhodopsin (bR) membrane protein.
2 Biological datasets
2.1 Structure of the bacterioRhodopsin trimer/lipid complex
Figure 1 shows that the maximum rupture length of the unfolded bR molecule would be 92 aa (~ 29 nm) if the tip binds to the CD loop, and 158 aa (~ 50 nm) if the tip binds to the AB loop; the last potential barrier would be built by the G-helix. By selecting the F-D curves exhibiting an overall length between 180-220 aa (~ 60 - 70 nm) we are sure to analyze only curves from bR molecules that were attached by their C-terminus to the SMFS tip [16, 18].
2.2 Analysis of bR unfolding pathway
To evaluate the quality and performance of our method, we used a dataset on the bR protein including 26 F-D curves. Our goal is the detection of possible unfolding pathways in bR [19–21]. Figure 1 shows a typical F-D curve. The force (pN) is either output by the AFM or it is computed by multiplying the cantilever deflection (nm) with the spring constant (pN/nm). The distance is the tip-sample separation (nm) between the cantilever tip and the sample surface (the length of the extended protein); this is either output by the AFM or else it is computed by subtracting the deflection from the Z-sensor (nm).
The main unfolding pathway of bR is characterised by the presence of three main peaks, which suggest a pairwise unfolding of the transmembrane helices . On manual analysis of bR unfolding pathways it was found that besides three main peaks that occur in most F-D curves, other peaks referred to as side peaks occur with smaller probabilities indicating that bR can exhibit different unfolding intermediates. The goal of our algorithm is to match the peaks between different curves if they correspond to the same unfolding events; then, unfolding pathways can be distinguished on the basis of unfolding events.
3 Methodology for Force-Distance pattern recognition
3.1 Step 1: denoise the F-D curves
3.2 Step 2: find the derivatives of the F-D curves
To get the derivatives we deal with each F-D as an arc length parameterised curve c(x) = [dist(x), force(x)], such that , which implies , which implies |dist'(x)| ≤ 1 and |force'(x)| ≤ 1. In other words, arc length parameterised curves do not change abruptly, implying that this parameterisation makes it feasible for us to discretise the space of derivatives, since all derivative values will be in the range [-1 ⋯ 1].
Without such a bound on the space of derivatives this approach would run into problems, since it would be difficult to appropriately discretise a curve.
We discretise the space of derivatives for the x-axis (distance) and y-axis (force) into 1, 000 bins. We then represent the curve as a sequence of tuples (dx i , dy i ), each of which denotes the current derivative cell in which the curve is located. A new tuple (dx i , dy i ) is added to the sequence of tuples whenever the curve's derivative changes significantly enough to warrant a new derivative cell (Figure 5). Therefore, a linear curve would be encoded by a single derivative cell, since its slope is constant.
With each derivative cell we also associate the arc length (distance) in the denoised curve that the cell covers. The arc length of a curve can be thought of as the "length" of a piece of string if it were laid upon the curve. Let t be the absolute length of a F-D curve segment - this is the length of a string if it was laid along the F-D curve segment. We use the arc length to ignore any cells that cover small F-D curve segments, as determind by a minimum threshold t small . The arc length of a curve c(x) from point t0 to t is defined to be , where |c'(x)| is the norm of the vector c'(x).
3.2.1 Translational Invariance
3.3 Step 3: unfolding events
3.4 Step 4: matching unfolding events between curves
Step 4 supports finding patterns of unfolding events in the F-D curves, rather than simple peaks. To describe the unfolding patterns of the F-D curves we match the unfolding events between curves . For this purpose we use a progressive alignment, the aim of which is to align the F-D curves by a pairwise matching of detected unfolding events . Unfolding events are matched between F-D curves if they likely correspond to the same helices unfolding.
3.4.1 Main peaks and side peaks
The alignment allows matching unfolding events between curves. After the alignment, we represent an F-D curve as a sequence of (0, 1) signs, corresponding to whether or not an event occurs. A possible event is represented by a sign of (0, 1). All F-D curves have the same maximum number of possible events. The curve alignment on the basis of the detected events allows to find the unfolding pathways for bR.
By examining the frequency of an event over all curves we categorise it as a main peak or side peak. A peak with highest frequency is a main peak, while peaks of lower frequency are side peaks. It is possible for both a side and main peak to be found in an unfolding event of a curve, in which case the side peak is the cliff before or after the main peak ("CAB" or "ACB" in Figure 7).
4 Results and Discussion
Unfolding of transmembrane helices in bR results in different unfolding pathways.
(1 0 0)
(1 0 0)
100 100 10/11
(1 1 0)
(1 1 0)
100 110 10/11
(1 0 1)
(1 0 1)
100 101 10/11
(1 1 1)
(1 1 1)
100 111 10/11
The co-occurrences of all main peaks in the curves.
Co-occurrence frequency (out of 26 curves)
contour length [aa]
1 & 2
80 & 143
1 & 3
80 & 215
2 & 3
143 & 215
1 & 2 & 3
80 & 143 & 215
The side peaks do not co-occur frequently in the same curves.
Co-occurrence frequency (out of 26 curves)
contour length [aa]
1 & 1
39 & 97
1 & 2
39 & 167
1 & 2
97 & 167
1 & 3
97 & 201
2 & 3
167 & 201
1 & 2 & 3
39 & 97 & 167 & 201
4.1 Matching unfolding events in F-D curves
Our analysis provides several advantages over simply detecting minima in the derivatives of the smoothed force curves. After matching unfolding events in all included F-D curves, it is possible to fit the WLC model, as Figure 9a shows. The tables show the contour lenghts. Besides computing the contour lengths of the WLCs, we can also distinguish the different unfolding pathways directly in the process. The unfolding pathways we find give hints on the stability inside proteins. Moreover, we can compare the wildtype protein's unfolding pathways with mutants of the protein under study, or we can study the effect of a ligand.
4.2 Side peaks: co-occurrences analysis
The main peaks appear in most of the included F-D curves and have a relatively high co-occurrence with one another in the curves. However, the different unfolding pathways are defined by the side peaks that occur in a minority of curves. Different co-occurrences are observed for various main and side peak pairs, which define the unfolding pathways. The helices in transmembrane proteins often stabilise one another. Intermediate side peaks between main peaks reflect stepwise unfolding of helix pairs and helices alone, such as helices E and D, or B and C [25–27].
Table 2 shows that the main peaks frequently co-occur with one another in F-D curves.
Table 3 shows that the side peaks co-occur less frequently with one another.
The co-occurrences of side peaks and main peaks in curves.
Side & Main peak
Co-occurrence frequency (out of 26 curves)
contour length [aa]
1 & 1
39 & 80
1 & 2
39 & 143
1 & 1
97 & 80
1 & 2
97 & 143
2 & 2
167 & 143
2 & 3
167 & 215
3 & 2
201 & 143
3 & 3
201 & 215
We have also analyzed four bR mutants, as well as the ompG protein with this algorithm . Even though the mutant proteins are known to have different unfolding patterns, we could detect the known unfolding events. Our results for mutant proteins corresponded to the results of Sapra et al. [20, 22]
4.3 Comparison to previous methods and runtime
Our method has similar precision and recall to the method published previously by Marsico et al.  However, our algorithm has the advantage of faster detection of protein unfolding patterns. For the 26 bR curves the method by Marsico et al. took several hours. Our method's total runtime for denoising, getting the derivatives, discretising, detecting the unfolding events and aligning the 26 curves was less than one second.
Moreover, we attempted to evaluate Punias  and Hooke  on the manually annotated bR dataset. These algorithms focus on fitting the Worm-like Chain model on F-D curves in an automated fashion, and do not focus on finding the unfolding pathways as our algorithm; therefore a complete comparison cannot be done. On fitting the WLC on the manually annotated bR dataset, Punias achieved 79% precision, 53% recall and 64% F-measure. Hooke achieved 73% precision, 45% recall and 56% F-measure. These results indicate that our method is at least as effective as Punias and Hooke.
Single-molecule force spectroscopy is a promising method for measuring the unfolding forces of single molecules and cells. SMFS can analyze membrane proteins in their natural membrane environment. Our main contribution is a novel method for analyzing and classifying SMFS data. Our pattern recognition algorithm is successful in finding unfolding pathways of bR. Our method for finding unfolding events and alignment is much faster than a manual selection and annotation. With our automated approach, the detection of unfolding events is not subjective to the manual annotator, but rather is based on objective criteria. Overall, our algorithm gives a high success rate in observation of bR unfolding pathways. The method also has the advantages of discovering side and main peaks along with unfolding patterns, fitting the WLC model on the peaks, and computing the amino acid distances between contour lengths. As future work, we plan to link the unfolding events to structural features, such as residue-residue contacts and membrane topology.
We thank Daniel Mueller and his group for providing the experimental data and fruitful discussions. We thank Alexander Andreopoulos for providing help with the derivatives and discretisation. We acknowledge funding by EU projects Sealife and REWERSE, dresden-exists, BMBF, and Canada's NSERC.
- Engel A, Gaub HE: Structure and mechanics of membrane proteins. Annual review of biochemistry. 2008, 127-48.Google Scholar
- Tsaousis GN, Tsirigos KD, Andrianou XD, Liakopoulos TD, Bagos PG, Hamodrakas SJ: ExTopoDB: A database of experimentally derived topological models of transmembrane proteins. Bioinformatics (Oxford, England). 2010Google Scholar
- Bosshart PD, Casagrande F, Frederix PLTM, Ratera M, Bippes CA, Mueller DJ, Palacin M, Engel A, Fotiadis D: High-throughput single-molecule force spectroscopy for membrane proteins. Nanotechnology. 2008, 19 (38): 384014-10.1088/0957-4484/19/38/384014. http://stacks.iop.org/0957-4484/19/i=38/a=384014 10.1088/0957-4484/19/38/384014PubMedView ArticleGoogle Scholar
- Puech PH, Poole K, Knebel D, Müller DJ: A new technical approach to quantify cell-cell adhesion forces by AFM. Ultramicroscopy. 2006, 106 (8-9): 637-644. 10.1016/j.ultramic.2005.08.003PubMedView ArticleGoogle Scholar
- Müller DJ, Kessler M, Oesterhelt F, Müller C, Oesterhelt D, Gaub H: Stability of bacteriorhodopsin alpha-helices and loops analyzed by single-molecule force spectroscopy. Biophys J. 2002, 83 (6): 3578-3588. 10.1016/S0006-3495(02)75358-7PubMedPubMed CentralView ArticleGoogle Scholar
- Müller DJ, Heymann JB, Oesterhelt F, Müller C, Gaub H, Büldt G, Engel A: Atomic force microscopy of native purple membrane. Biochim Biophys Acta. 2000, 1460: 27-38. 10.1016/S0005-2728(00)00127-4PubMedView ArticleGoogle Scholar
- Dietz H, Rief M: Detecting Molecular Fingerprints in Single Molecule Force Spectroscopy Using Pattern Recognition. Japanese Journal of Applied Physics. 2007, 46: 5540-2. 10.1143/JJAP.46.5540View ArticleGoogle Scholar
- Kuhn M, Janovjak H, Hubain M, Müller DJ: Automated alignment and pattern recognition of single-molecule force spectroscopy data. J Microsc. 2005, 218 (Pt 2): 125-132.PubMedView ArticleGoogle Scholar
- Puchner EM, Franzen G, Gautel M, Gaub HE: Comparing proteins by their unfolding pattern. Biophysical journal. 2008, 95: 426-34. 10.1529/biophysj.108.129999PubMedPubMed CentralView ArticleGoogle Scholar
- Carl P, Dalhaimer P: Protein unfolding and nano-indentation software. 2004, http://site.voila.fr/puniasGoogle Scholar
- Struckmeier J, Wahl R, Leuschner M, Nunes J, Janovjak H, Geisler U, Hofmann G, Jaehnke T, Mueller DJ: Fully automated single-molecule force spectroscopy for screening applications. Nanotechnology. 2008, 19 (38): 384020-10.1088/0957-4484/19/38/384020. http://stacks.iop.org/0957-4484/19/i=38/a=384020 10.1088/0957-4484/19/38/384020PubMedView ArticleGoogle Scholar
- Sandal M, Benedetti F, Brucale M, Gomez-Casado A, Samori B: Hooke: an open software platform for force spectroscopy. Bioinformatics (Oxford, England). 2009, 25 (11): 1428-30. 10.1093/bioinformatics/btp180View ArticleGoogle Scholar
- Müller DJ, Sass HJ, Müller SA, Büldt G, Engel A: Surface structures of native bacteriorhodopsin depend on the molecular packing arrangement in the membrane. J Mol Biol. 1999, 285 (5): 1903-1909. 10.1006/jmbi.1998.2441PubMedView ArticleGoogle Scholar
- Janovjak H, Kessler M, Oesterhelt D, Gaub H, Müller DJ: Unfolding pathways of native bacteriorhodopsin depend on temperature. EMBO J. 2003, 22 (19): 5220-5229. 10.1093/emboj/cdg509PubMedPubMed CentralView ArticleGoogle Scholar
- Janovjak H, Struckmeier J, Hubain M, Kedrov A, Kessler M, Müller DJ: Probing the energy landscape of the membrane protein bacteriorhodopsin. Structure. 2004, 12 (5): 871-879. 10.1016/j.str.2004.03.016PubMedView ArticleGoogle Scholar
- Kessler M, Gaub HE: Unfolding barriers in bacteriorhodopsin probed from the cytoplasmic and the extracellular side by AFM. Structure. 2006, 14 (3): 521-527. 10.1016/j.str.2005.11.023PubMedView ArticleGoogle Scholar
- Cisneros DA, Oberbarnscheidt L, Pannier A, Klare JP, Helenius J, Engelhard M, Oesterhelt F, Muller DJ: Transducer binding establishes localized interactions to tune sensory rhodopsin II. Structure (London, England: 1993). 2008, 16 (8): 1206-13.View ArticleGoogle Scholar
- Kessler M, Gottschalk KE, Janovjak H, Müller DJ, Gaub HE: Bacteriorhodopsin folds into the membrane against an external force. J Mol Biol. 2006, 357 (2): 644-654. 10.1016/j.jmb.2005.12.065PubMedView ArticleGoogle Scholar
- Marsico A, Labudde D, Sapra T, Müller DJ, Schröder M: A novel pattern recognition algorithm to classify membrane protein unfolding pathways with high-throughput single-molecule force spectroscopy. Bioinformatics. 2007, 23 (2): e231-e236. 10.1093/bioinformatics/btl293PubMedView ArticleGoogle Scholar
- Sapra T, Besir H, Oesterhelt D, Müller DJ: Characterizing molecular interactions in different bacteriorhodopsin assemblies by single-molecule force spectroscopy. J Mol Biol. 2006, 355 (4): 640-650. 10.1016/j.jmb.2005.10.080PubMedView ArticleGoogle Scholar
- Oesterhelt F, Oesterhelt D, Pfeiffer M, Engel A, Gaub HE, Müller DJ: Unfolding pathways of individual bacteriorhodopsins. Science. 2000, 288 (5463): 143-146. 10.1126/science.288.5463.143PubMedView ArticleGoogle Scholar
- Sapra T, Balasubramanian P, Labudde D, Bowie J, Müller D: Point mutations in membrane proteins change energy landscape and populate different unfolding pathways. Journal of Molecular Biology. 2008Google Scholar
- Cleveland W: Robust Locally Weighted Regression and Smoothing Scatterplots. Journal of the American Statistical Association. 1979, 74: 829-836. 10.2307/2286407View ArticleGoogle Scholar
- Loeytynoja A, Goldman N: An algorithm for progressive multiple alignment of sequences with insertions. Proceedings of the National Academy of Sciences of the United States of America. 2005, 102 (30): 10557-62. 10.1073/pnas.0409137102View ArticleGoogle Scholar
- Wright CF, Lindorff-Larsen K, Randles LG, Clarke J: Parallel protein-unfolding pathways revealed and mapped. Nature structural biology. 2003, 10 (8): 658-62. 10.1038/nsb947PubMedView ArticleGoogle Scholar
- Cieplak M, Filipek S, Janovjak H, Krzysko KA: Pulling single bacteriorhodopsin out of a membrane: Comparison of simulation and experiment. Biochimica et biophysica acta. 2006, 1758 (4): 537-44. 10.1016/j.bbamem.2006.03.028PubMedView ArticleGoogle Scholar
- Janovjak H, Sapra KT, Kedrov A, Mueller DJ: From valleys to ridges: exploring the dynamic energy landscape of single membrane proteins. Chemphyschem: a European journal of chemical physics and physical chemistry. 2008, 9 (7): 954-66.PubMedView ArticleGoogle Scholar
- Damaghi M, Sapra KT, Köster S, Yildiz O, Kühlbrandt W, Müller DJ: Dual energy landscape: The functional state of the beta-barrel outer membrane protein G molds its unfolding energy landscape. Proteomics. 2010, 10 (23): 4151-62. 10.1002/pmic.201000241PubMedView ArticleGoogle Scholar
- Essen L, Siegert R, Lehmann WD, Oesterhelt D: Lipid patches in membrane protein oligomers: crystal structure of the bacteriorhodopsin-lipid complex. Proceedings of the National Academy of Sciences of the United States of America. 1998, 95 (20): 11673-8. 10.1073/pnas.95.20.11673PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.