An improved FourRussians method and sparsified FourRussians algorithm for RNA folding
 Yelena Frid^{1}Email author and
 Dan Gusfield^{1}
https://doi.org/10.1186/s1301501600819
© The Author(s) 2016
Received: 15 December 2015
Accepted: 28 June 2016
Published: 5 August 2016
Abstract
Background
The basic RNA secondary structure prediction problem or single sequence folding problem (SSF) was solved 35 years ago by a now wellknown \(O(n^3)\)time dynamic programming method. Recently three methodologies—Valiant, FourRussians, and Sparsification—have been applied to speedup RNA secondary structure prediction. The sparsification method exploits two properties of the input: the number of subsequence Z with the endpoints belonging to the optimal folding set and the maximum number basepairs L. These sparsity properties satisfy \(0 \le L \le n / 2\) and \(n \le Z \le n^2 / 2\), and the method reduces the algorithmic running time to O(LZ). While the FourRussians method utilizes tabling partial results.
Results
In this paper, we explore three different algorithmic speedups. We first expand the reformulate the single sequence folding FourRussians \(\Theta \left(\frac{n^3}{\log ^2 n}\right)\)time algorithm, to utilize an ondemand lookup table. Second, we create a framework that combines the fastest Sparsification and new fastest ondemand FourRussians methods. This combined method has worstcase running time of \(O(\tilde{L}\tilde{Z})\), where \(\frac{{L}}{\log n} \le \tilde{L}\le min\left({L},\frac{n}{\log n}\right)\) and \(\frac{{Z}}{\log n}\le \tilde{Z} \le min\left({Z},\frac{n^2}{\log n}\right)\). Third we update the FourRussians formulation to achieve an ondemand \(O( n^2/ \log ^2n )\)time parallel algorithm. This then leads to an asymptotic speedup of \(O(\tilde{L}\tilde{Z_j})\) where \(\frac{{Z_j}}{\log n}\le \tilde{Z_j} \le min\left({Z_j},\frac{n}{\log n}\right)\) and \(Z_j\) the number of subsequence with the endpoint j belonging to the optimal folding set.
Conclusions
The ondemand formulation not only removes all extraneous computation and allows us to incorporate more realistic scoring schemes, but leads us to take advantage of the sparsity properties. Through asymptotic analysis and empirical testing on the basepair maximization variant and a more biologically informative scoring scheme, we show that this Sparse FourRussians framework is able to achieve a speedup on every problem instance, that is asymptotically never worse, and empirically better than achieved by the minimum of the two methods alone.
Keywords
Background
Noncoding RNA (ncRNA) affects many aspects of gene expression, regulation of epigenetic processes, transcription, splicing, and translation [14]. It has been observed that in eukaryotic genomes the ncRNA function is more clearly understood from the structure of the molecule, than from sequence alone. While there have been advances in methods that provide structure experimentally, the need for computational prediction has grown as the gap between sequence availability and structure has widened. In general, RNA folding is a hierarchical process in which tertiary structure folds on top of thermodynamically optimal^{1} secondary structure, secondary structure is a key component of structure prediction [14].
Efficient \(O(n^3)\)time dynamic programming algorithms were developed more than thirty years ago to find noncrossing secondary structure of a single RNA molecule with n bases [22, 23, 27, 29, 38, 39]. We call this basic folding or single sequence folding (SSF) problem. In addition, McCaskill [19] created an \(O(n^3)\)time algorithm for the partition function for RNA secondary structure. Based on these algorithms, software has been developed and widely used [15, 16, 25, 36, 37]. Probabilistic methods, employing Stochastic contextfree grammar (SFCG), were also developed to solve the basic folding problem [7, 8].
The accuracy of all these methods is based on the parameters given by the scoring function. Thermodynamic parameters [17, 18, 28, 33] and statistical parameters [6, 7], or a combination of the two [2, 13] are currently employed.
The Valiant [1, 34], Sparsification [4, 30], and the FourRussians (FR) [9, 24] methods where previously applied to improve on the computation time for secondary structure prediction. For SSF, the Valiant method achieves the asymptotic time bound of \(O\left(\frac{n^3}{2^{\Omega {\log (n)}}}\right)\) by incorporating the current fastest min/maxplus matrix multiplication algorithm [32, 34]. The FourRussians method was applied to single sequence [10, 24], cofolding [11] and pseudoknotted [12] folding problems. The Sparsification method, was developed to improve computation time in practice for a family of RNA folding problems, while retaining the optimal solution matrix [4, 20, 21, 26, 30, 35].
Methods
In this paper, we combine the FourRussians method [24] and the Sparsification method [4]. While the former method reduces the algorithm’s asymptotic running time to \(\Theta \left(\frac{n^3}{\log ^2 n}\right)\), the latter eliminates many redundant computations. To combine these methods, we use an ondemand tabulation (instead of a preprocessing approach which is typically applied in FR algorithms), removing any redundant computation and guaranteeing the combined method is at least as fast as each individual method, and in certain cases even faster. First, we reformulate SSF FourRussians \(\Theta \left(\frac{n^3}{\log ^2 n}\right)\)time algorithm [24] to utilizes ondemand lookup table creation. Second, we combine the fastest Sparsification and FourRussians SSF speedup methods. The Sparse Four Russians speedup presented here leads to a practical and asymptotically fastest combinatorial algorithm (even in the worstcase). The new algorithm has an \(O(\tilde{L}\tilde{Z})\) run time where \(\frac{{LZ}}{\log ^2 n}\le \tilde{L}\tilde{Z} \le \min \left( \frac{n^3}{\log ^2 n}, {LZ}\right) \). In practice, when accounting for every comparison operation the Sparse Four Russians outperforms both the FourRussians and Sparsification methods. Third, we extended the FourRussian SSF algorithm to be computed in \(O(n^2/\log ^2n)\)time. The simulated results for this formulation and O(n) processors achieve a practice speedup on the number of comparison operations performed.
Results
Problem definition and basic algorithm
Let \(s = s_0 s_1 \ldots s_{n1}\) be an RNA string of length n over the fourletter alphabet \(\Sigma = \{A,U,C,G\}\), such that \(s_i \in \Sigma \) for \(0 \le i < n\). Let \(\varvec{ s_{i,j}}\) denote the substring \(s_i s_{i+1} \ldots s_{j1}\). We note that for simplicity of exposition substring \(s_{i,j}\) does not contain the nucleotide j. A folding (or a secondary structure) of s is a set M of position pairs (k, l), such that: (1) \(0 \le k< l < n\); (2) and there are no two different pairs \((k,l),(k', l') \in M\) such that \(k \le k' \le l \le l'\) (i.e. each position participates in at most one pair, and the pairs are noncrossing).
The above recurrence may be efficiently implemented using a dynamic programming (DP) algorithm. Essentially, the DP algorithm computes and maintains values of the form \(L[i, j], L^p[i, j]\) and \(L^c[i, j]\) for every \(0 \le i \le j \le n\) in three \(n+1 \times n+1\) matrices. The algorithm traverses the matrices in increasing column order index j from 1 to n. Within each column, the cell L[k, j] is computed in decreasing index order k from \(j1\) to 0. Once L[k, j] is computed, \(L^p[i,j]\) is updated for all \(i<k\) such that \(L^p[i,j]=max(L^p[i,j],L[i,k]+L[k,j])\). The solution L(s, M) is stored in cell L[0, n]. Clearly, computing \(L^p\) is the bottleneck of the computation, since for a given i, j, there may be \(\Theta (n)\) split points to examine.
Extending the notation and moving towards a vector by vector computation of L
For a matrix A and some integer intervals I, J, denote by A[I, J] the submatrix of A obtained by projecting it onto the row interval I and column interval J. When \(I = [i]\) or \(J = [j]\), we simplify the notation by writing A[i, J] or A[I, j].
Definition 1
We divide the solution matrix L in two ways: \(q \times q\) submatrices (Fig. 1) and size q sub column vectors (the value of q will be determined later). Let \(\varvec{K_g}\) be the gth interval such that \(K_g=\{q \cdot g , q \cdot g+1, \ldots , q \cdot g+q1\}\). We call these sets Kgroups, and use \(K_g\) as the interval starting at index \(g\cdot q\). For an index i, define \(\varvec{g_i} = \left\lfloor \frac{i}{q}\right\rfloor \). It is clear that \(i \in K_{g_i}\).
Similarly, we break up the row indices into groups of size q, denoted by \(\varvec{I_g}\) where \(I_g=\{k= q \cdot g , k+1, ...k+q1\}\). (Clearly, row index set \(I_g\) is equivalent to the Kgroup \(K_g\). We only introduce this extra notation for simplicity of the exposition).
Sparsification of the SSF algorithm
The Sparsification method achieves a speedup by reducing the number of split points examined during the computation of \(L^P[i,j]\). As Fig. 2 shows the focus of Sparsified Four Russians algorithm will narrow down only on those submatrices whose split points are stepoct for a particular \(i,j\) [4, 30].
OCT and STEP subinstances of sequence s
Subinstance \(s_{i,j}\) is optimally coterminus (OCT) if every optimal folding of \(s_{i,j}\) is coterminus. We introduce the extra notation below
if \(L[i,j]=L^c[i,j]>L^p[i,j]\) then we say L[i, j] is OCT.
Subinstance \(s_{i,j}\) is STEP, if \(L[i,j]>L[i+1,j]\) where \(L[i,j]=L(s_{i,j})\) and \(L[i+1,j]=L(s_{i+1,j})\). For ease of exposition we also say L[i, j] is STEP when \(s_{i,j}\) is STEP. A STEP subinstance \(s_{i,j}\) implies that nucleotide i is paired in every optimal folding of \(s_{i,j}\).
Fact 1
For every subinstance \(s_{i,j}\) with \(j>i\) there is an optimal split point \(k\in (i,j)\) such that either \(k=i+1\) or L[i, k] is STEP and L[k, j] is OCT [4].
Notation: For the index set \(K=\{k,k+1, \ldots k'\}\) and column j, let \(\varvec{K^{oct_j}}\) be the set of indices such that \(K^{oct_j}\subset K\) and \(\forall _{ k \in K^{oct_j}} \;\; L[k,j]\) is OCT. Given the row interval \(I=\{i,i+1, \ldots i'\}\), let \(I^{step_k}\) be the set of rows such that \(I^{step_k} \subset I\), and for all \({i \in I^{step_k}}\) L[i, k] is STEP.
Asymptotic time bound of sparsified SSF When computing matrix \(L^p[i,j]\) we examine value L[i, k] only if L[k, j] is OCT. Let Z, be the total number of subinstances in s or cells in matrix L that are OCT. Given that L[k, j] is OCT, \(L^p[i,j]\) must examine the split point k, for all \(i \in \{0,1, \ldots k\}\) such that L[i, k] is STEP. Let \(\varvec{{L}}\) be the total number of STEP subinstances in column k. More precisely \({L}=\{0,1, \ldots k\}^{step_k}\) (Creating the list of splitpoints that correspond to STEP incidence requires no additional computation time [4]). The total time to compute SSF when examining only STEP, OCT combinations (Sparsification method), is O(LZ). As shown in Backofen et al. [4] Z is bounded by \(Z\le n^2\) and L is bounded by \({L} \le \frac{n}{2}\). The overall asymptotic time bound of the Sparsification method is O(LZ) remains \(O(n^3)\).
Ondemand Four Russians speedup
Presented here is an ondemand version of the \(\Omega (\log ^2 n)\)time FourRussians algorithm implied by Pinhas et al. [24].
Observation 1
The scores stored in L[k, j] and \(L[k+1,j]\) differ by the effect of adding only one more nucleotide (i.e., \(s_k\)). Therefore, \(L[k,j]L[k+1,j]\) belongs to a finite set of differences \(\mathbb {D}\), where \(\mathbb {D}\) is the set of scores created as the result of the scoring scheme \(\beta \). The cardinality of the set of differences, \(D=\mathbb {D}\), is O(1) when \(\beta \) is discrete. For the simple \(\beta \) scoring function (+1 for every permitted pair, and 0 otherwise), the set \(\mathbb {D}\) is equal to \(\{0,1\}\) and therefore \(\mathbb {D}=2\) [23].
Let \(\vec {x} = [x_0, x_1, \ldots , x_{q1}]\) be an integer vector of length q. We say that \(\vec {x}\) is Ddiscrete if \(\forall _{ l \in (0,q)} x_{l1}  x_{l} \in \mathbb {D}\). We define the \(\Delta \) encoding of 2discrete vector \(\vec {x}\) to be a pair of integers \((x_0, \Delta _{{x}})\) such that \(x_0\) is the first element in \(\vec {x}\) and \(\Delta _{{x}}\) is the integer representation of the binary vector \([x_0x_1, x_1x_2, \ldots , x_{q2}  x_{q1}]\). Note that \(0 \le \Delta _{{x}} < 2^{q1}\). For simplicity, we will interchangeably use \(\vec {x}\) to imply either \((x_0, \Delta _{x})\) or \([x_0, x_1, \ldots , x_{q1}]\). Clearly, \(\Delta \)encoding takes O(q) time to compute.

Let \( (x_0,\Delta _{\vec {x}})+ c= (x_0+c,\Delta _{\vec {x}})\) be equivalent to \(\vec {x}+c =[x_0+c, x_1+c, \ldots , x_{q1}+c]\).

Let \(B\otimes (x_0,\Delta _{x})\) be equivalent to \(B\otimes \vec {x}\).

Let \(\max ((x_0,\Delta _x),(y_0,\Delta _y))\) be equivalent to \(\max (\vec {x},\vec {y})\).
MUL lookup table
Based on Observation 1, any column vector in matrix L is 2discrete. Given vector \(L[K_g,j]\) and its \(\Delta \) encoding (\(x_0=L[gq,j]\), \(\Delta _x= \Delta _{L[K_g,j]}\)), it is clear that \(\Delta _x \in [0,2^q1]\).
Fact 2
\(L[I_{g'},K_g] \otimes L[K_g,j] \text { is equivalent to }L[I_{g'},K_g] \otimes (0,\Delta _{L[K_g,j]}) + L[gq,j]\) [24].
There are \(O\left(\frac{n^2}{q^2}\right)\) submatrices within L. For each submatrix the maximum number of entries we compute for lookup table MUL is \(2^{q1}\). In total, the asymptotic time bound to populate lookup table MUL is \(O\left(\frac{n^2}{q^2}\cdot 2^{q1}\cdot q^2)=O(n^2 \cdot 2^q\right)\).
MAX lookup table
Let the max of two 2discrete qsize vectors \(\vec {v}\) and \(\vec {w}\), denoted \(max(\vec {v},\vec {w})\), result in a qsize vector \(\vec {z}\), where \(\forall _{0\le k < q}\, z_k=\max (v_k,w_k)\). Without loss of generality, let \(w_0 \ge v_0\). Comparing the first element in each vector there are two possibilities either (1) \(w_0v_0>q1\) or (2) \(w_0v_0\le q1\). In the first case, (\(w_0v_0>q1\)), it is clear that \(\max (\vec {v},\vec {w})\) is equal to \(\vec {w}\). In the second case, we make use of the following fact [24].
Fact 3
Given two vectors \((w_0,\Delta _w)\) and \((v_0,\Delta _v)\), if \(w_0v_0\le q1\) then \(\max (\vec {v},\vec {w})= \max \left( (0,\Delta _v),(w_0v_0,\Delta _w)\right) +v_0\).
Lets define lookup table MAX such that entry
Setting \(q=\epsilon \log n\), where \(\epsilon \in (0,.5)\) [31], the total computation time is equal to \(\Theta (\frac{n^3}{\log ^2 n})\), which achieves a speedup by a factor of \(\Omega {(\log ^2 n)}\), compared to the original \(O(n^3)\)time solution method.
Extending to Ddiscrete vectors
Sparse FourRussian method
With the FourRussians method, a speedup is gained by reducing q split point index comparisons for q subsequences to a single O(1) time lookup. The Sparsification method reduces the comparison to only those indices which correspond to STEP–OCT folds.
STEP–OCT condition for sets of split points
In this section, we achieve a Sparsified FourRussian speedup for the computation of the \(L^p\) matrix. As in the Four Russians method, we will conceptually break up the solution matrix L in two ways: in \(q \times q\) size submatrices, and q size subcolumn vectors. The submatrices are indexed by \(g'\) and g such that the corresponding submatrix is \(L[I_{g'},K_g]\) . The subcolumn vectors are indexed by g and j , such that the corresponding subcolumn vector is \(L[K_g,j]\).
We augment the FourRussians SSF to reduce the number of entries, and lookups into the MUL table. If and only if, the matrix \(L[I_{g'},K_g]\) contains at least one cell L[i, k] that is STEP and within vector \(L[K_g,j]\) the cell L[k, j] is OCT we will lookup \(MUL_{L[I_{g'},K_g]}[\Delta _{L[K_g,j]}]\). If such an entry does not exist we will compute \(L[I_{g'},K_g]\otimes (0,\Delta _{L[K_g,j]})\) and store the result into lookup table MUL.
The following notation will be used to help determine if a split point Kgroup should be examined in the computation.
OCT subcolumn vector
Given the vector \(L[K_g,j]\) let \(\vec {m}\) be a q size binary vector such that \(\forall _{0\le x \le q1} m[x]=1\) if \(L[gq+x,j]\) is OCT. Let the sigOct of the vector \(L[K_g,j]\), written \(sigOct(L[K_g,j])\), be equal to m the integer representation of the binary vector \(\vec {m}\). Clearly \(0 \le m < 2^q\), and if and compute the dot product in\(m>0\) then \(L[K_g,j]\) contains at least one OCT instance. Let \(O(\tilde{Z})\) be the total number of subcolumn vectors which contain an instance that is OCT. Clearly, \( \frac{{Z}}{q} \le \tilde{Z} \le \min \left(\frac{n^2}{q},Z\right)\).
STEP submatrix
Given the submatrix \(L[I_{g'},K_g]\), let \(\vec {m'}\) be a q size binary vector such that \(\forall _{ x\in [0,q) }m'[x]=1\) if \(\exists _{0 \le i \le q1}\) \(L[qg'+i,qg+x]\) is STEP. Let sigStep of a submatrix, written \(sigStep(L[I_{g'},K_g])\), be equal to \(m'\) the integer representation of the binary vector \(\vec {m'}\). Clearly \(0\le m' < 2^q\). Let \(\tilde{L}\) be the total number of submatrices which contain an instance that is STEP within \(L[[0,n],K_g]\). Clearly, \( \frac{{L}}{q} \le \tilde{L} \le \min (\frac{n}{q},L) \).
Observation 2
Suppose that, \(s_{i,k}\) is STEP, and integer
\(m' = sigStep(L[I_{g'},K_g])\) such that \(i \in I_{g'}\) (or \(I_{g'}=I_{g_i}\) ) and \(k \in K_g\) (or \(K_g=K_{g_k}\) ). Then, the corresponding binary vector \(\vec {m'}\) must be set to 1 in position x where x is an index such that \(k=qg+x\). More precisely, if L[i, k] is STEP then \(m'[x]=1\) by the definition of sigStep.
Observation 3
Suppose \(s_{k,j}\) is OCT , and suppose integer
\(m=sigOct(L[K_g,j])\) such that \(k \in K_g\). Then, the corresponding binary vector \(\vec {m}\) must be set to 1 in position x, where x is an index such that \(k=qg+x\). More precisely, if \(s_{k,j}\) is OCT then m[x] = 1 by the definition of sigOct.
Given two binary vectors v and w the dot product of their integer representation is equal to a binary number x such that \(x=v \odot w= v_0 \wedge w_0 \vee v_1 \wedge w_1 \vee ...\vee v_{q1} \wedge w_q\) where \(v=w=q1\)
Theorem 1
For any subinstance \(s_{i,j}\) either \(i+1\) is the optimal split point, or there is an optimal split point \(k \in (i,j)\), such that \(sigStep(L[I_{g_i},K_{g_k}]) \odot sigOct(L[K_{g_k},j])\) equals 1.
Proof
Based on Fact 1 for any subinstance \(s_{i,j}\) there is an optimal split point k such that either \(k=i+1\) or \(s_{i,k}\) is STEP and \(s_{k,j}\) is OCT. If \(s_{i,k}\) is STEP and \(s_{k,j}\) is OCT then L[i, k] is STEP and L[k, j] is OCT. The cell L[i, k] belongs to submatrix \(L[I_{g_i},K_{g_k}]\) and the cell L[k, j] belongs to the vector \(L[K_{g_k},j]\). Let x be an index such that \(k=qg_k+x\). Let \(\vec {m'}\) be a binary vector that corresponds to \(sigStep(L[I_{g_i},K_{g_k}])\). Based on Observation 2, \(m'[x]\) must equal 1. Let \(\vec {m}\) be the binary vector that corresponds to \(sigOct(L[K_{g_k},j])\). Based on Observation 3, m[x] equals 1. Therefore, \(m[x]\wedge m'[x]=1\) and \(sigStep(L[I_{g_i},K_g])\odot sigOct(L[K_g,j])= 1\). \(\square \)
Notation: The index g is STEP–OCT if given the set of rows \(I_{g'}\) and the column j if \(sigStep(\;L[I_{g'},K_g]\;) \varvec{\odot } sigOct(\;L[K_g,j]\;)=1\).
\(sigStep(\;L[I_{g'},K_g]\;) \varvec{\odot } sigOct(\;L[K_g,j]\;)=1\).
Discussion
Asymptotic analysis of sparsified FourRussians.
We assume O(1)time RAM access for \(\log (n)\) bits. The calculation for column j can be broken down into \(L^P_{K=[qg_j,j)}[i,j]\) and \(L^P_{K=[0,qg_j)}[i,j]\) for all \(i<j\). The computation of
\(L^P_{[qg_j,j)}[[0,n],j]\) occurs when Kgroup \(K_{g_j}\) is not full, and follows the Sparsification algorithm maximizing over STEP–OCT split points only. This reduces the comparisons made from \(O(n\cdot q)\) to \(O({L}\tilde{q})\) where \(\tilde{q}<q\) is the total number OCT instances within the interval \([qg_,j)\). The computation of \(L^P_{[0,qg_j)}[[0,n],j]\) employs Sparsified Four Russians speedup. The MUL table entries are created and references only for STEP–OCT submatrix vector combinations. This reduces the comparisons made to \(O(\tilde{L}\tilde{Z})\).
Asymptotic analysis of ondemand lookup tables calculation
We compute the lookup tables G, MUL, and \(M\!A\!X\) ondemand. For each vector \(L[K_g,j]\) containing an OCT instance (where \(m=sigOct(L[K_g,j])\)), if G[g][m] does not exist then we directly compute it. For the computation of a single entry into lookup table G, we iterate through \(O(\tilde{L})\) submatrices and compute the dot product in O(q) time.^{2}In total, an update is called to lookup table G at most \(O(\tilde{C}=min(2^q,\tilde{Z}))\) times. The entire G lookup table ondemand computation takes \(O(\text {ondemand} G)=O(\tilde{L}\tilde{C}\cdot q)\) or \(\varvec{O(G)} \le O\left( \min (\tilde{L} 2^q,\tilde{L}\tilde{Z})\cdot q \right)\le O\left(min\left(\frac{n2^q}{q},\frac{{LZ}}{q}\right)\right)\).
For each vector containing an OCT instance if an entry doesn’t exist in the lookup table MUL it is computed ondemand. Each entry takes \(O(\tilde{L}\cdot q^2)\) time to compute. There are \(min(2^q,\tilde{Z)}\) such computation. In total, lookup table MUL takes \(O(\tilde{L}q^2\cdot min(2^q,\tilde{Z}))\)time. Setting \(q=\epsilon \log {n}\) where \(\epsilon \in (0,.5)\) the asymptotic runtime for ondemand computation is \(O(\tilde{L}\tilde{Z})\).
The entire algorithm takes \(O(\tilde{L}\tilde{Z})\) where \(\frac{{LZ}}{\log ^2 n}\le \tilde{L}\tilde{Z} \le \min \left(\frac{n^3}{\log ^2 n}, {LZ}\right)\).
Empirical results
We tested 20 randomly generated sequences for each size \(N=64,128,256,512\).
Number of all comparisons computed
Size  \(O(n^3)\)  FR  SP  SFR 

64  43,680  12,014  2733  1837 
128  349,504  49,456  13,196  9982 
256  2,796,160  346,692  79,544  41,393 
512  22,500,863  5,746,853  650,691  503,425 
For \(N=128\) the Sparse FourRussians(SFR) algorithm performs 25 % less comparisons than the Sparsified(SP) SSF algorithm and 80 % less comparison than the FourRussians (FR) algorithm. In all test cases, the Sparse FourRussians performed better than the minimum of either method alone.
An \(O(n^2/\log ^2(n))\) simple parallel FourRussians RNA folding algorithm
Lets solve the recurrence relation (Eq. 1–3) in increasing index j order and then move up the column j computing one cell at a time in decreasing i order. Each cell L[i, j] is solved by calculating Eq. 1–3 for all \(i<k\le j\).
Given this j, i, k order, let us reformulate the computation by moving up each column in O(n/q) qsize subcolumn vectors instead of n cells.
Utilizing n processors
Invariant 1
Given \(g_i\) and \(g_j\) \(\forall _{i\in I_{g_i}} \forall _{k \in K_g} L[i,k]=L(s_{i,k})\). In other words, submatrix \(L[I_{g_i},K_g]\) is computed. Similarly \(L[K_g,j]\) is computed or \(\forall _{k \in K_g} L[k,j]=L(s_{k,j})\).
Replacing the \( \max (L^p[I_{g_i},j], L[I_{g_i}, K_g])\otimes L[K_g,j])\) computation with lookups into MUL and MAX tables would reduces the runtime for finding the solution matrix L to \(O(n^2/log^2n)\). As stated in "Extending to Ddiscrete vectors" section it is possible to create lookup tables ondemand and achieve a reduction in computation time of \(\Omega (log^2 n)\) factor.
The preprocessing can also be achieve in parallel reducing the asymptotic cost form \(O(n^3/\log ^2 n)\) to \(O(n^2/\log ^2 n)\). If entry \(MUL_{L[I_{g_i},K_g]}[\Delta _{L[K_g,j]}]\) does not exist we compute \(L[I_{g_i},K_g]\otimes (0,\Delta _{L[K_g,j]})\) directly in \(O(q^2)\).
There are \(O\left(\frac{n^2}{q^2}\right)\) submatrices within L. For each submatrix the maximum number of entries we compute for lookup table MUL is \(D^{q1}\). However, in each iteration at worse O(n) of the entries are computed simultaneously. In total, the asymptotic time bound to populate lookup table MUL is \(O\left(\displaystyle \frac{{\frac{n^2}{q^2}\cdot D^{q1}\cdot q^2} }{n} \right)=O\left(\frac{n^2 \cdot D^q}{n}\right)=O(n \cdot D^q)\).
Parallel sparisified FourRussians single sequence folding algorithm
Let \(Z_x\) be the number of OCT cells in column x. Let \(\forall _{x\in [0,n]}Z_j \ge Z_x\).
The parallel algorithm would take as long as would take as a it takes for the last processor to complete.
In order to extend the parallel FourRussians single sequence folding algorithm to utilize the Sparsification speedup we will limit the call to MUL table only if \(sigSTEP(L[I_{g_i},K_g]) \odot sigOCT(L[K_g,j]) >0\). As result given \(Z_j\) the total time to compute for processor j is \(O(\tilde{L}\tilde{Z_j})\) where \(\frac{{Z_j}}{\log n}\le \tilde{Z_j} \le min \left({Z_j},\frac{n}{\log n}\right)\).
Conclusion
This work combines the asymptotic speedup of FourRussians with the very practical speedup of Sparsification. The ondemand formulation of the FourRussians not only removes all extraneous computation. This approach allows the FourRussians SSF to achieve a speedup in practice for realistic scoring schemes. This also leads us to take advantage of the sparsity properties. Through asymptotic analysis and empirical testing on the basepair maximization variant and a more biologically informative scoring scheme, we show that the Sparse FourRussians framework is able to achieve a speedup on every problem instance, that is asymptotically never worse, and empirically better than achieved by the minimum of the two methods alone. We also showed that through some reorganization we could apply the FourRussians speedup to parallel algorithm and achieve and asymptotic time of \(O(n^2/\log ^2 n)\). The algorithm created here can be implemented in CUDA to compute on multiprocessor GPUs. Because the algorithm allows for memory cell independence one can apply memory and cache optimization without affecting the algorithm. The utility in this framework lies not only on its ability to speedup single sequence folding but its ability to speedup the family of RNA folding problems for which both Sparsification and FourRussians have bene applied separately.
Future work in this area would be to examine the ability to sparsify memory [3], as FourRussians at worst case requires an additional factor of \(2^{log(n)}\) in memory. Another open question is wether it is possible to apply the \(\Omega (\log ^3 n)\) [5] speedup of boolean matrix multiplication to RNA folding.
Declarations
Authors’ contributions
YF contributed to the conception and analysis of the framework. YF and DG jointly contributed to the design and interpretation of the framework, and jointly contributed to the writing and editing of the manuscript. Implementation and testing was done by YF. All authors read and approved the final manuscript.
Acknowledgements
We would like to sincerely thank Shay Zakov and Michal ZivUkelson for their their many helpful comments and suggestions. This research was partially supported by the IIS1219278 Grant. uld like to sincerely thank Shay Zakov and Michal ZivUkelson for their their many helpful comments and suggestions. This research was partially supported by the IIS1219278 Grant.
Competing interests
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Akutsu T. Approximation and exact algorithms for RNA secondary structure prediction and recognition of stochastic contextfree languages. J Comb Optim. 1999;3(2–3):321–36.View ArticleGoogle Scholar
 Andronescu M, Condon A, Hoos H, Mathews D, Murphy KP. Efficient parameter estimation for RNA secondary structure prediction. Bioinformatics. 2007;23(13):i19–28. doi: 10.1093/bioinformatics/btm223. http://bioinformatics.oxfordjournals.org/content/23/13/i19.abstract
 Backofen R, Tsur D, Zakov S, ZivUkelson M (2009) Sparse RNA folding: time and space efficient algorithms. In: CPM09; 2009. p. 249–62Google Scholar
 Backofen R, Tsur D, Zakov S, ZivUkelson M. Sparse RNA folding: time and space efficient algorithms. J Discrete Algorithms. 2011;9(1):12–31. doi:10.1016/j.jda.2010.09.001.View ArticleGoogle Scholar
 Chan T. Speeding up the Four Russians algorithm by about one more logarithmic factor. In: SODA; 2015. p. 212–17Google Scholar
 Do C, Woods D, Batzoglou S. Contrafold: RNA secondary structure prediction without physicsbased models. Bioinformatics. 2006;22(14):e90–8. doi:10.1093/bioinformatics/btl246. http://bioinformatics.oxfordjournals.org/content/22/14/e90.abstract.
 Dowell R, Eddy S. Evaluation of several lightweight stochastic contextfree grammars for RNA secondary structure prediction. BMC Bioinformatics. 2004;5(1):71. doi:10.1186/14712105571. http://www.biomedcentral.com/14712105/5/71.
 Durbin R, Eddy S, Krogh A, Mitchison G . Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge: Cambridge University Press; 1998. http://www.amazon.com/BiologicalSequenceAnalysisProbabilisticProteins/dp/0521629713
 Frid Y, Gusfield D. A simple, practical and complete O(n $$^{\text{3}}$$ 3 /log(n)) time algorithm for RNA folding using the fourRussians speedup. In: WABI; 2009. p. 97–107Google Scholar
 Frid Y, Gusfield D. A simple, practical and complete O(n\(^{\text{3 }}\)/log(n))time algorithm for RNA folding using the [fourrussians] speedup. Algorithms Mol Biol. 2010a;5(1):13.View ArticlePubMedPubMed CentralGoogle Scholar
 Frid Y, Gusfield D. A worstcase and practical speedup for the RNA cofolding problem using the fourRussians idea. In: Moulton V, Singh M, editors. Algorithms in bioinformatics. Heidelberg: Springer; 2010b. p. 1–12.View ArticleGoogle Scholar
 Frid Y, Gusfield D. Speedup of RNA pseudoknotted secondary structure recurrence computation with the fourRussians Method. In: COCOA; 2012. p. 176–87Google Scholar
 Juan V, Wilson C. RNA Secondary structure prediction based on free energy and phylogenetic analysis. J Mol Biol. 1999;289(4):935–47.View ArticlePubMedGoogle Scholar
 Leontis NB, Westhof E. RNA 3D structure analysis and prediction. Berlin: Springer; 2012.View ArticleGoogle Scholar
 Markham NR, Zuker M. UNAFold. In: Keith JM, editor. Bioinformatics, methods in molecular biology. New York: Humana Press; 2008. p. 3–31.Google Scholar
 Mathews D, Andre T, Kim J, Turner D, Zuker M. An updated recursive algorithm for RNA secondary structure prediction with improved thermodynamic parameters. Mol Modeling Nucleic Acids: 246–57Google Scholar
 Mathews DH, Sabina J, Zuker M, Turner D. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol. 1999;288(5):911–40. doi:10.1006/jmbi.1999.2700. http://www.sciencedirect.com/science/article/pii/S0022283699927006.
 Mathews DH, Disney MD, Childs J, Schroeder S, Zuker M, Turner DH. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proceedings of the National Academy of Sciences of the United States of America. 2004;101(19):7287–92. doi:10.1073/pnas.0401799101. http://www.pnas.org/content/101/19/7287.abstract.
 McCaskill JS. The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers. 1990;29(6–7):1105–19. doi:10.1002/bip.360290621. http://dx.doi.org/10.1002/bip.360290621.
 Møhl M, Salari R, Will S, Backofen R, Sahinalp S. Sparsification of RNA structure prediction including pseudoknots. Algorithms Mol Biol. 2010;5:39.View ArticlePubMedPubMed CentralGoogle Scholar
 Salari R, Will S, Backofen R, Sahinalp S, Möhl M. Sparsification of RNA structure prediction including pseudoknots. In: Moulton V, Singh M, editors. WABI. Berlin: Springer; 2010. p. 40–51.Google Scholar
 Nussinov R, Jacobson A. Fast algorithm for predicting the secondary structure of singlestranded RNA. PNAS. 1980;77(11):6309–13. doi:10.1073/pnas.77.11.6309. http://dx.doi.org/10.1073/pnas.77.11.6309.
 Nussinov R, Pieczenik G, Griggs JR, Kleitman DJ. Algorithms for loop matchings. SIAM J Applied Math. 1978;35(1):68–82. doi:10.1137/0135006. http://link.aip.org/link/?SMM/35/68/1.
 Pinhas T, Zakov S, Tsur D, ZivUkelson M. Efficient edit distance with duplications and contractions. Algorithms Mol Biol. 2013;8:27.View ArticlePubMedPubMed CentralGoogle Scholar
 Reuter J, Mathews D. RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics. 2010;11(1):129. doi:10.1186/1471210511129. http://www.biomedcentral.com/14712105/11/129.
 Salari R, Möhl M, Will S, Sahinalp S, Backofen R. Time and space efficient RNA–RNA interaction prediction via sparse folding. In: RECOMB; 2010. p. 473–90Google Scholar
 Sankoff D, Kruskal JB, Mainville S, Cedergreen R. Fast algorithms to determine RNA secondary structures containing multiple loops. In: Sankoff D, Kruskal JB, editors. Time warps, string edits and macromolecules: the theory and practice of sequence comparison. Boston: AddisonWesley; 1983. p. 93–120.Google Scholar
 Tinoco I, Borer P, Dengler B, Levine M, Uhlenbec O, Crothers D, Gralla J. Improved estimation of secondary structure in ribonucleicacids. Nature. 1973;246(150):40–1.Google Scholar
 Waterman MS, Smith TF. RNA secondary structure: a complete mathematical analysis. Math Biosc. 1978;42:257–66.View ArticleGoogle Scholar
 Wexler Y, Zilberstein CBZ, ZivUkelson M. A study of accessible motifs and RNA folding complexity. J Comput Biol. 2007;14(6):856–72.View ArticlePubMedGoogle Scholar
 Williams R. Matrixvector multiplication in subquadratic time: (some preprocessing required). In: Pruhs K, Stein C, editors. Bansal N. SIAM: SODA; 2007. p. 995–1001.Google Scholar
 Williams R. Faster allpairs shortest paths via circuit complexity. In: Symposium on theory of computing. STOC: New York; 2014. p. 664–73. doi:10.1145/2591796.2591811. http://doi.acm.org/10.1145/2591796.2591811
 Xia T, SantaLucia J, Burkard M, Kierzek R, Schroeder S, Jiao X, Cox C, Turner D. Thermodynamic parameters for an expanded nearestneighbor model for formation of RNA duplexes with watsoncrick base pairs. Biochemistry. 1998;37(42):14,719–14,735. doi:10.1021/bi9809425. http://pubs.acs.org/doi/abs/10.1021/bi9809425
 Zakov S, Tsur D, ZivUkelson M. Reducing the worst case running times of a family of RNA and CFG problems, using valiant’s approach. In: WABI; 2010. p. 65–77Google Scholar
 ZivUkelson M, GatViks I, Wexler Y, Shamir R. A faster algorithm for RNA Cofolding. In: Proceedings of the 8th International workshop on algorithms in bioinformatics. Waterville: WABI; 2008. p. 174–85Google Scholar
 Zuker M. The use of dynamic programming algorithms in RNA secondary structure prediction. In: Waterman M, editor. Mathematical methods for DNA sequences. Boca Raton: CRC Press, Inc.; 1989. p. 159–84.Google Scholar
 Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003;31(13):3406–3415. doi:10.1093/nar/gkg595. http://nar.oxfordjournals.org/content/31/13/3406.full.pdf+html
 Zuker M, Sankoff D. RNA secondary structures and their prediction. Bull Math Biol. 1984;46(4):591–621.View ArticleGoogle Scholar
 Zuker M, Stiegler P. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 1981;9(1):133–48.View ArticlePubMedPubMed CentralGoogle Scholar