Bistructures
We present an RNA secondary structure as an arc diagram, a graph whose vertices are drawn on a horizontal line and the Watson-Crick as well as Wobble base pairs are drawn as arcs in the upper half-plane [42,43,44], see Fig. 4B. The vertices are labeled by \(V=\{1,2,\ldots , n\}\) from left to right, representing the nucleotides. The linear order of the vertices indicates the direction of the RNA strand from \(5'\)-end to \(3'\)-end. Here we consider only the canonical Watson-Crick and Wobble base pairs in an RNA secondary structure. As a result, for any pair of nucleotides, there can be at most one such canonical base pair, each vertex can be only incident to one arc.
An arc, (i, j), represents the base pair between the ith and jth nucleotides. Two arcs (i, j) and (r, s) are called crossing if and only if \(i<r<j<s\) holds. An RNA structure is called a secondary structure, if it does not contain any crossing arcs. Furthermore, the arcs of a secondary structure can be endowed with the partial order: \((r,s) \prec (i,j)\) if and only if \(i<r<s<j\). We shall introduce two “formal” vertices associated with positions 0 and \((n+1)\), respectively and add the formal arc \((0, n + 1)\), referred to as the rainbow. An interval, [i, j], is the set of vertices \(\{i,i+1,\ldots ,j-1,j\}\).
In a loop-based energy model [38, 45], arcs and unpaired vertices are organized in loops contributing to the energy. A loop, L, is a subset of vertices, represented as a disjoint union of S-intervals, \(L={\dot{\bigcup }}_{i=1}^k [a_i,b_i]\), such that \((a_1,b_k)\) and \((b_i,a_{i+1})\), for \(1\le i\le k-1\), are arcs (including the rainbow arc \((0, n+1)\)) and where any other interval-vertices are unpaired, see Fig. 4D. It can be represented by a maximal arc \((a_1,b_k)\) with respect to the partial order \(\prec \). Given a loop, this maximal arc is unique, whence a loop can be represented by \(L_{(a_1, b_k)}\). In particular, the rainbow arc, \((0,n+1)\), represents an exterior loop, that is not nested in any arc in the arc diagram, see Fig. 4C. Furthermore, each non-rainbow arc appears in exactly two loops, being maximal for exactly one of them. Loops correspond to the boundary components of the secondary structure viewed as a fatgraph [46]. In the following, we shall identify loops with their sets of vertices.
Given two secondary structures, R and S, having the same vertex set \(V=\{1,\dots ,n\}\), we draw the vertices on a horizontal line, the arcs of R in the upper and the arcs of S in the lower half-plane. We refer to this arc diagram as a bistructure, B(R, S). The idea of considering a secondary structure pair of given length has been studied in [47, 48]. Here we shall distinguish the R-arcs from the S-arcs even though they might have the exact same endpoints. For example an arc (i, j) in R is denoted by \((i,j)_R\) and an arc (i, j) in S is denoted by \((i,j)_S\). In a loop-based model, the R-loops and the S-loops are distinct since their represented R-arcs from the S-arcs are distinct. Hence, a bistructure B(R, S) can be considered as the set of loops \(B = \{L_{p_i} \mid p_i \in B(R,S), 1\le i \le m\}\), where \(p_i\), \(1\le i \le m\), is an arc in B(R, S), see Fig. 4E.
A substructure of B, denoted by \(B'\), is a subset of loops where \(B' \subseteq B\). The vertex set of \(B'\), denoted by \(V^{B'}\), is the union of vertices in loops that are contained in \(B'\). The complement of \(B'\), \({\overline{B'}} = B \setminus B'\), with its vertex set \(\overline{V^{B'}}\), see Fig. 4. Accordingly, we have (a) \(V^{B'}\cup \overline{V^{B'}}\) contains all vertices in B(R, S) and (b) \(V^{B'}\cap \overline{V^{B'}}\) is not necessarily empty, since paired and unpaired vertices can be contained in the intersection of the \(B'\)- and \({\overline{B'}}\)-loops. Furthermore, for a given substructure \(X=\{L_1,\cdots ,L_k\}\), we define the boundary of X by \(X^C = \{ L\in {\overline{X}}| \exists L_i\in X, L\cap L_i\ne \varnothing \}\), i.e., \(X^C\) is the set of all loops in the complement of X that have nontrivial intersection with X. We call \({\tilde{X}} = X\cup X^C\) the closure of X. A substructure is called reducible if the loop set can be bi-partitioned into two sets of loops \(X_1 = \{L_{i_1}, \ldots , L_{i_m}\}\) and \(X_2 = \{L_{j_1}, \ldots , L_{j_n}\}\), such that \(L_{i_t} \cap L_{j_s} =\varnothing \), \(\forall 1\le t \le m\), \(1\le s \le n\), otherwise we call X irreducible, see Fig. 4E.
The intersection \(E^{B'}=V^{B'}\cap \overline{V^{B'}}\) is called the set of exposed vertices of \(B'\). The exposed vertices are key elements in computing the partition function of a bistructure, since the vertices are contained in multiple loops and their nucleotide information needs to be remembered until the energies of the loops containing the exposed vertices are calculated.
Partition function and Boltzmann sampler
We first recall the notion of a partition function for sequences that are compatible to a single structure R [34].
$$ Q(R) = \sum _{\sigma \in {\mathbb {C}}_n(R)} e^{-\frac{\eta (\sigma , R)}{KT}}. $$
Here \({\mathbb {C}}_n(R)\) denotes the set of R-compatible sequences while \(\eta (\sigma , R)\) is the energy of the sequence-structure pair \((\sigma , R)\). Lastly, K is the Boltzmann constant and T the temperature. In Turner’s model [38, 45], \(\eta (\sigma , R) = \sum _{L\in R} \eta (\sigma , L)\), where L is a loop contained in the secondary structure R. The energy of a loop L is a function of its type and of the nucleotides associated to the arcs and the unpaired bases it contains. In practice, the energy computation takes into account a maximum of two specific arcs and four unpaired vertices, as well as the number of arcs and the number of unpaired bases.
For a bistructure B(R, S) and a sequence \(\sigma \), we set \(\eta (\sigma , B(R,S)) = \frac{1}{2}(\eta (\sigma ,R) + \eta (\sigma ,S))\). Then we define the partition function of sequences bicompatible to R and S by
$$ Q(R ,S) = \sum _{\sigma \in {\mathbb {C}}_n(R,S)} e^{-\frac{\eta (\sigma , B(R,S))}{KT}}, $$
where \({\mathbb {C}}_n(R,S)\) denotes the set of bicompatible sequences to both R and S.
A decomposition of B is a block sequential loop removal of the bistructure. Let us first illustrate the computation of Q(R, S) when a specific decomposition is given. Suppose \(X = \{L_1,\ldots , L_k\}\) is a substructure of B(R, S) with vertex set V, and exposed vertex set \(E^X\). \({\overline{X}} =B \setminus X\) denotes the complement of X. Let \(\sigma _X = (\sigma _v)_v\) denote a subsequence with \(v\in V\), \(\sigma _v \in \{{{\mathbf{A}},{\mathbf{U}},{\mathbf{G}},{\mathbf{C}}}\}\). Then we can compute the energy \(\eta (\sigma _X, X)\) since the nucleotide information of the vertices contained in V is specified. Let further \(\tau _X = (\tau _v)_v\) be a subsequence where \(v\in E^X\), \(\tau _v \in \{{{\mathbf{A}},{\mathbf{U}},{\mathbf{G}},{\mathbf{C}}}\}\). Clearly, \(\tau _X \subseteq \sigma _X\). For \(\ell = |V|\), we define a partition function for X that is parameterized by \(\tau _X\)
$$ Q(X, \tau _X) = \sum _{\sigma _X \in \Sigma _\ell } e^{-\frac{\eta (\sigma _X, X)}{KT}}.$$
Here, \(\Sigma _\ell \) is the collection of RNA sequences of length \(\ell \).
By definition, if X is an irreducible substructure, then removing a loop L from X produces a set of irreducible substructures \(X_1,\ldots , X_k\). We investigate how the exposed vertex set evolves with a loop removal. To this end let \(x\in E^X\) be an exposed vertex. If \(x\in L\), then either (a) \(\not \exists L'\in X, L'\ne L\) such that \(x\in L'\), or, (b) at least one such \(L'\) loop exists. In the first case (a), we have x is no longer exposed, while in the second case (b), we have \(x\in E^{X_i}\) for some \(1\le i\le k\). Finally, if \(x\notin L\) to begin with, then we have \(x\in E^{X_i}\) for some \(1\le i\le k\) after removing L form X.
Let \(\tau _X\) denote a fixed subsequence over \(E^X\), \(\tau _{X_i}\) a subsequence over \(E^{X_i}\), \(1\le i \le k\), and \(\sigma _L\) a subsequence over the loop L. We consider all possible subsequences \((\sigma _v)_v\) where \(\sigma _v \in \{{{\mathbf{A}},{\mathbf{U}},{\mathbf{G}},{\mathbf{C}}}\}\), \(v\in \left( L \cup _{i=1}^k E^{X_i} \right) \setminus E^X\). Then, the partition function \(Q(X, \tau _X)\) can be computed recursively by
$$ Q(X, \tau _X) = \sum _{(\sigma _v)_v} e^{-\frac{\eta ((\sigma _L, L)}{KT}} \prod _i^k Q(X_i, \tau _{X_i}). $$
(2)
For a given decomposition, the terms \(Q(X_i, \tau _{X_i})\), for \(1\le i \le k\), can be computed in parallel. We illustrate the recursion 2 in Fig. 5.
When Q(R, S) is computed, we can Boltzmann sample RNA sequences following the classical stochastic backtracking method introduced by [49], which is of linear time complexity. Given an irreducible substructure X that is decomposed into a loop L and a set of irreducible substructures \(X_1, \ldots , X_k\). Assume the nucleotides in \(X_i\) are sampled, then with a fixed subsequence \(\tau _X\) over the exposed vertex set \(E^X\), the subsequence \((\sigma _v)_v\), \(v\in L \setminus \cup _i E^{X_i}\) is sampled with probability
$$ \frac{ e^{-\frac{\eta ((\sigma _v)_v, L)}{KT}} \prod _i^k Q(X_i, \tau _{X_i}) }{ Q(X, \tau _X) }. $$
Multiplying all inside probabilities of each iteration, we conclude that a sequence is sampled with probability \({\mathbb {P}}(\sigma ) = e^{-\frac{\eta (\sigma , B(R,S))}{KT}}/Q(R,S)\).
The topology of a bistructure
The partition function Q(R, S) is computed recursively based on substructures, where a loop is removed from the substructure to calculate its energy. The removal yields a collection of substructures having fewer loops. In this recursion, a nucleotide \(\tau _i\) at position i needs to be stored until the energy of all loops containing \(\tau _i\) are calculated. For a fixed decomposition \(D = \{X_0,\ldots , X_k\}\), the number of nucleotides that need to be stored at each step of the recursion 2 is \(|L \cup _{i=1}^k E^{X_i}|\). We denote the maximum number of \(|L \cup _{i=1}^k E^{X_i}|\) in all recursion steps by \(\kappa _D(B)\). For a fixed decomposition D, \(\kappa _D(B)\) is a constant.
Assume D is a decomposition of B(R, S), we can implement a dynamic programming (DP) routine to compute Q(R, S) recursively. The time complexity of the algorithm is \(O(4^{\kappa (B)} n)\) since for every \(\sigma _v\), \(v\in L \cup _{i=1}^k E^{X_i}\), we have four nucleotide choices \({{\mathbf{A}},{\mathbf{U}},{\mathbf{G}},{\mathbf{C}}}\). Therefore, the algorithm to compute Q(R, S) is an FPT algorithm, as it can be solved in polynomial time (as a function of n) when assuming \(\kappa (B)\) is a constant. However, \(\kappa _D(B)\) can be very large, thus contributing a significant factor to the time complexity. Thus, a decomposition D that minimizes \(\kappa _D(B)\) is desired. Let further \(\kappa (B) = \min _D \kappa _D(B)\), i.e. the minimum time complexity over all possible decompositions D. Clearly, \(\kappa (B)\) depends only on the bistructure B.
It is impossible to consider all decomposition since there are exponentially many of them. The algorithm introduced in [29] computed the partition function following an analogous DP-routine as Eq. 2. The key question is then to find a “smart” decomposition of the bistructure. The authors in [29] develop a hyper-graph model to interpret the overlapping relationships among all loops. In their hyper-graph model, a labeled vertex represents a nucleotide at a fixed position, and a hyper-edge represents a loop. Then a tree decomposition of the hyper-graph induces a hierarchy tree structure for the loops. Following the tree decomposition, one can derive a removal order of loops in the bistructure, and by construction the tree-width \(w = \kappa _D(B) -1\).
Finding a tree decomposition with minimum tree-width for a general hyper-graph is NP-hard. However, for the simple energy model in [29], the hyper-graph may be simple enough such that a tree decomposition with minimum tree-width can be found by approximation algorithms [37]. It is not clear whether this is still feasible when passing to a full-loop energy model.
Loop intersections are studied in [39] via a simplicial complex, where a loop in a bistructure B(R, S) is represented by an abstract 0-simplex, i.e., a vertex. This line of work goes beyond the hyper-graph approach in that intersections of multiple loops can be consistently expressed and unlocks powerful concepts from algebraic topology. If d loops have nonempty mutual intersection, they are represented by a \((d-1)\)-simplex. The collection of all \(d\ge 0\)-simplices forms a simplicial complex, called a loop nerve, see Fig. 2. A loop removal is tantamount to deleting the corresponding 0-simplex as well as all higher dimensional simplices that contain it. We shall show that understanding the structure of the topological space provides insight into designing an optimal decomposition.
We first give an overview of how to design the decomposition of a bistructure via the topological framework. In [39], the authors show that the induced loop nerve of a bistructure B(R, S) has very specific properties. The topological space is uniquely classified by the rank of its second homology group \(r_2(B)\), which counts the number of 2-dimensional holes. As mentioned before, the geometric realization is comprised of ribbons glued to filled tetrahedra and spheres. Each sphere has a combinatorial interpretation within the bistructure as a crossing component and their number equals the rank of the second homology group. We shall show that the challenge of the decomposition problem stems from the spheres, as the ribbons and tetrahedra are organized in a tree-like fashion. The global tree-like structure induces a tree decomposition naturally, while the sphere can be resolved locally. To resolve the spheres we can map the problem to a known NP-problem such as, for instance, the traveling salesman problem (TSP). This allows us to solve the spheres via approximation algorithms [50]. We illustrate this idea in Fig. 6.
Topological framework
In the following we discuss the results in [39].
Definition 1
Suppose B(R, S) is a bistructure having n loops \(B = \{L_1,\dots ,L_n\}\). We call \(Y =\{ L_{i_0}, \ldots , L_{i_d}\}\) a d-simplex of B if and only if \(\bigcap _{k=0}^d L_{i_k} \ne \varnothing \). Let \(K_d(B)\) be the set of all d-simplices of B. Then the nerve of B is
$$ K(B) = \dot{\bigcup }_{d=0}^{\infty } K_d(B)\subseteq 2^B. $$
The loop nerve K(B) has the topological space T(B) as its quotient space [51], see Fig. 7. The 0-simplices correspond to hyper-edges in [29]. In the loop nerve the collection of d-simplices, encapsulates the information of loop intersections not articulated in the hyper-graph model.
The loop nerve of a bistructure B contains no d-simplex for \(d>3\) [39] and there are only two nontrivial homology groups of T(B), both being free and abelian: \(H_0(T(B)) \cong {\mathbb {Z}}\) and \(H_2(T(B)) \cong \oplus _k^r {\mathbb {Z}}\). In view of connectivity, the rank of \(H_2(T(B))\), \(r_2(B)\), is the only determinant. An R-arc (i, j) and an S-arc (r, s) are called crossing if \(i<r<j<s\) holds. We shall proceed by discussing overlaps and crossing components.
An overlap is a degree four vertex in its arc diagram. An overlap corresponds to a 3-simplex in \(K_3(X)\) in the loop nerve. Assume x is an overlap being the endpoint of the arcs \(p_1 \in R\) and \(p_2\in S\). We split x into two adjacent vertices \(x_1\) and \(x_2\), where \(x_1\) carries the endpoint of \(p_1\) and \(x_2\) the endpoint of \(p_2\) . This is done such that after the split \(p_1\) does not cross \(p_2\), see Fig. 8
We next convince ourselves that the split does not “really” affect the induced topological space, where “really” means “upto homotopy”. Let x be an overlap. This vertex is contained in four loops and is the endpoint of two arcs \(p_1\) and \(p_2\). Let \(L_1\) and \(L_2\) be the loops in R that contain \(p_1\), where \(p_1\) is the maximal arc of \(L_2\). Furthermore let \(L_3\) and \(L_4\) be the loops in S that contain \(p_2\), where \(p_2\) is the maximal arc of \(L_4\). Clearly, \(\cap _{i=1}^4 L_i = \{x\}\). Splitting x into \(x_1\) and \(x_2\) [39] is tantamount to removing an edge of the corresponding filled tetrahedron as well as its interior. We end up with two triangles that are still glued along the opposite edge from the edge we removed, see Fig. 8. This splitting does not change \(r_2(B)\), whence we can restrict ourselves to the non-overlap case.
In the arc diagram of a bistructure we adopt the notion of crossing component as in [39].
Lemma 2
Let
\(X^O\)
be the substructure induced by a crossing component
O,
and let
\(\tilde{X^O}\)
be its closure. Then the induced topological space of
\(\tilde{X^O},\)
\(T(X^O),\)
is homeomorphic to an empty sphere, and thus contributes 1 to the rank of
\(H_2(B).\)
We define the \(*\)-graph of the loop nerve to be the graph \(\Delta (B)=(K_2(B),E)\) with edges given by
$$ E\ni e=(\Delta _1,\Delta _2)\Leftrightarrow \Delta _1\cap \Delta _2\in K_1(B). $$
Each vertex in the \(*\)-graph represents a filled triangle in T(B), and there is an edge between two vertices if their respective triangles have nonempty intersection along an edge. Then we have:
Lemma 3
Let
X
be a substructure without crossing arcs and overlaps, i.e.,
\(H_2(B)=0\)
and
\(K_3(B)=\varnothing. \)
Then its
*-graph
\(\Delta (B)\)
is a tree.
We illustrate the \(*\)-graph of a bistructure without overlaps and crossing arcs in Fig. 9. Note that, by Lemma 3, if B has no crossings, the induced topological space T(B) is a “ribbon tree”. Namely, each ribbon is obtained by gluing a sequence of triangles along their edges such that each triangle has at most two edges glued to other triangles. These ribbons are then glued together along some of the edges of their constituent triangles such that no closed bands appear.
Now we are in position to describe the structure of the topological space T(B). If an irreducible substructure X is induced by a crossing component, then the induced topological space is “sphere”-like. Otherwise if X is noncrossing, the induced topological space is “ribbon tree”-like. T(B) is a ribbon tree modulo edge contraction of spheres [39]. Finally, we have the combinatorial interpretation of \(r_2(B)\) and given a bistructure B(R, S) with r crossing components. then \(r_2(B) = r\).
Scheduling
We next discuss how to design a decomposition based on the properties of the loop nerve. The global tree-like structure induces a tree decomposition naturally, while the sphere will be resolved locally. We first consider the case where B contains no crossing arc. In this case, we extend the partial order \(\prec \) for a bistructure by the following: for any two arcs \((i,j),(r,s)\in B\) we say \((i,j) \prec _B (r,s)\) if and only if \( i<r<s<j\). Then, we show in the SM that for an irreducible substructure \(X\subseteq B\), X contains a unique maximal arc with respect to \(\prec _B\).
We decompose X by removing the loop \(L_{m}\), where m is the maximal arc of X. The loop removal produces a set of irreducible substructure \(X_1, \ldots , X_k\). Repeating this loop removal for any produced irreducible substructures gives a unique loop removal order \(D_0(B)\). We show
Lemma 4
Let B(R, S) be a bistructure without crossing arcs or overlaps. Let \(D_0\) be the loop removal order discussed above. For any loop removal order \(D\ne D_0,\) we have \(\kappa _{D_0}(B) \le \kappa _{D}(B),\) i.e., \(D_0\) is a decomposition that minimizes \(\kappa (B).\)
A bistructure B with overlaps can be mapped to a bistructure \(B'\) without overlaps by the above splitting of overlapping vertices. The decomposition \(D_0\) on \(B'\) induces a natural decomposition D on B by the one-to-one correspondence between the B-arcs and the \(B'\)-arcs. We illustrate in Fig. 10 how to derive a decomposition of a hyper-graph with minimum tree-width for the case where B(R, S) is a bistructure without crossing arcs or overlaps.
We next discuss how to resolve spheres. Recall that the NP-hardness of the decomposition problem stems from the spheres. In this case, we consider mapping the problem to a known NP-problem as, for instance, the traveling salesman problem (TSP). To this end we remove a set of loops from X with a minimum number of exposed vertices, such that X has no crossing arcs. The remaining noncrossing substructure can be decomposed using the optimal algorithm presented before, see Fig. 6. This allows to solve the problem via approximation algorithms of the TSP [50]. The approximation approach is the subject of future work and beyond the scope of this paper, for the analysis presented below, we employ a greedy approach to resolve the spheres.
Structural adaptability
In this section, we introduce three measures for quantifying the structural adaptability using the bicompatible sequence sampler. Given two structures R and S, we first sample sequences with single-compatible and bicompatible constraints respectively, and compare their energy spectrum of sampled sequences paired with R and S. In case of bicompatibility not affecting the energy spectrum significantly, we can conclude that switching from R to S (and vice versa) is feasible. Secondly, we investigate the energy ranking of \((\sigma ,R)\) and \((\sigma ,S)\), within the partition function of \(\sigma \). This shows how stable R and S are in the Boltzmann ensemble of a sampled sequence. Finally, we introduce an index, called the adaptability, measuring the capability of a structure R to transform into S. The adaptability is obtained by comparing the proportion of the partition function for bicompatible sequences w.r.t. the partition function of single-compatible sequences. In case of all sequences being bicompatible, the index equals 1, while in the case of sequences not being bicompatible, the index equals 0.
Energy spectra
Let us begin by introducing the spectrum over a partition function. Let Q(X) be a partition function of sequences compatible with X, where X is a secondary or bistructure and \(Q(X)=\sum _\sigma e^{-\frac{\eta (\sigma , X)}{KT}}\). To simplify notation, we shall write Q instead of Q(X), if we do not need to emphasize the context of the underlying structure X. Naturally, Q induces the discrete probability space \((\Sigma _n,{\mathbb {P}}_{Q})\), where \({\mathbb {P}}_{Q}(\sigma )=e^{-\frac{\eta (\sigma ,X)}{KT}} / Q\). We consider a real-valued random variable \(f :(\Sigma _n,{\mathbb {P}}_{Q}) \longrightarrow {\mathbb {R}}\) and refer to the induced measure \({\mathbb {P}}_{f}\) on \({\mathbb {R}}\), \({\mathbb {P}}_{f}(r)= \sum _{\{\sigma \mid f(\sigma )=r\}} {\mathbb {P}}_{Q} (\sigma )\), as the f-spectrum over Q.
For practical purposes, an f-spectrum, \({\mathbb {P}}_f(r)\), cannot be computed directly, since we have to consider all \(\sigma \in \Sigma _n\) and potentially infinitely many \(r\in {\mathbb {R}}\). To approximate the f-spectrum we first discretize by means of a monotone increasing multiset \((a_s)\), where \(\Delta =a_s-a_{s-1}\), setting \({\mathbb {P}}_f(a_s) = \sum _{a_{s-1}<r \le a_s} {\mathbb {P}}_f(r)\). We employ a Boltzmann sampler to generate sequences of the probability space \((Q_4^n,{\mathbb {P}}_Q)\) and approximate \({\mathbb {P}}_{f}(a_s)\) by \({\mathbb {P}}_{f}(a_s) \approx \frac{1}{m} \mid \{\sigma \mid a_{s-1} < f(\sigma ) \le a_s\} |\), where \(\sigma \) is a sequence sampled from the partition function Q and m denotes the sample size. Here we set \(m=10^4\).
We proceed by introducing some particular choices for the pair (f, Q), which we shall denote by \(f_Q\):
$$ f^R_{Q}(\sigma )=\eta (\sigma ,R) \quad f^S_Q(\sigma )= \eta (\sigma , S). $$
We call \({\mathbb {P}}_{f_Q^R}(r)\) the R-spectrum of Q and \({\mathbb {P}}_{f_Q^S}(r)\) the S-spectrum of Q.
Ranking
Next we investigate how stable R and S are in the Boltzmann ensemble of \(Q(\sigma )\), where \(\sigma \in (\Sigma _n,{\mathbb {P}}_{Q(R,S)})\) is a sequence sampled via Q(R, S). We compare the energies, \(\eta (\sigma , R)\) and \(\eta (\sigma , S)\) to \(\eta (\sigma , M(\sigma ))\), where \(M(\sigma )\) denotes the mfe-structure of \(\sigma \). Then we consider the ratios
$$ r_R = \frac{\eta (\sigma , R)}{\eta (\sigma , M(\sigma ))}, \quad r_S = \frac{\eta (\sigma , S)}{\eta (\sigma , M(\sigma ))}. $$
For a fixed sequence \(\sigma \), the ratios reflect the gap between the energies obtained when considered with R (or with S) and the mfe-structure.
Adaptability
We discuss the energy-spectrum over a partition function, Q as an induced measure of a random variable, f. By construction we normalize, when working with the probability measure \({\mathbb {P}}_{Q}(\sigma )\), the value of Q. As a result, the absolute values of the different partition functions, for instance, when comparing Q(R) and \(Q(R)|_S\) is not a factor.
Comparing a plethora of riboswitches, as well as sequences of various random structure pairs, we end up with relating the partition functions of sequences over an entire spectrum of lengths. The free energy of any sequence is however the sum of loop energies, and each loop has a unique maximal arc. In Jin and Reidys [52] it is shown that the number of arcs in random structures satisfies a central limit theorem, whence its mean scales linearly with n. This implies that the number of loops grows linearly with n, which in turn suggests that the free energy of a sequence grows linearly with n.
Accordingly, we consider the scaled partition function
$$ {\tilde{Q}}(R)=\sum _{\sigma }e^{\frac{1}{n}\frac{\eta (\sigma ,R)}{KT}} \quad {\tilde{Q}}(R)|_S=\sum _{\sigma \in {\mathbb {C}}_n(S)} e^{\frac{1}{n}\frac{\eta (\sigma ,R)}{KT}} $$
and set
$$ w_R = \log \left( \frac{{\tilde{Q}}(R)|_S}{{\tilde{Q}}(R)}\right) , \quad w_S = \log \left( \frac{{\tilde{Q}}(S)|_R}{{\tilde{Q}}(S)}\right) . $$
We call \(w_R\) and \(w_S\) the densities of the structure R and S respectively. The adaptability \(w_R\) is a real number in [0, 1] measuring the proportion of the partition function of R composed by bicompatible sequences relative to that composed by sequences compatible only with R. The closer this adaptability is to 1, the more energy favorable bicompatible sequences there are, suggesting that the structure R can change into the structure S more easily. Note that by construction \(w_R\) and \(w_S\) are asymmetric, namely, the transitions from R to S and those from S to R are not necessarily equal.