 Research
 Open Access
 Published:
A lineartime algorithm that avoids inverses and computes Jackknife (leaveoneout) products like convolutions or other operators in commutative semigroups
Algorithms for Molecular Biology volume 15, Article number: 17 (2020)
Abstract
Background
Data about herpesvirus microRNA motifs on human circular RNAs suggested the following statistical question. Consider independent random counts, not necessarily identically distributed. Conditioned on the sum, decide whether one of the counts is unusually large. Exact computation of the pvalue leads to a specific algorithmic problem. Given \(n\) elements \(g_{0} ,g_{1} , \ldots ,g_{n  1}\) in a set \(G\) with the closure and associative properties and a commutative product without inverses, compute the jackknife (leaveoneout) products \(\bar{g}_{j} = g_{0} g_{1} \cdots g_{j  1} g_{j + 1} \cdots g_{n  1}\) (\(0 \le j < n\)).
Results
This article gives a lineartime Jackknife Product algorithm. Its upward phase constructs a standard segment tree for computing segment products like \(g_{{\left[ {i,j} \right)}} = g_{i} g_{i + 1} \cdots g_{j  1}\); its novel downward phase mirrors the upward phase while exploiting the symmetry of \(g_{j}\) and its complement \(\bar{g}_{j}\). The algorithm requires storage for \(2n\) elements of \(G\) and only about \(3n\) products. In contrast, the standard segment tree algorithms require about \(n\) products for construction and \(\log_{2} n\) products for calculating each \(\bar{g}_{j}\), i.e., about \(n\log_{2} n\) products in total; and a naïve quadratic algorithm using \(n  2\) elementbyelement products to compute each \(\bar{g}_{j}\) requires \(n\left( {n  2} \right)\) products.
Conclusions
In the herpesvirus application, the Jackknife Product algorithm required 15 min; standard segment tree algorithms would have taken an estimated 3 h; and the quadratic algorithm, an estimated 1 month. The Jackknife Product algorithm has many possible uses in bioinformatics and statistics.
Background
A biological question
Circular RNAs (circRNAs) are singlestranded noncoding RNAs that can inhibit another RNA molecule by binding to it, mopping it up like a sponge. During herpesvirus infection, human hosts produce circRNAs with target sites that may bind herpesvirus microRNA (miRNA) [1] (see Fig. 1). Given a sequence motif, e.g., a target site for a miRNA, researchers counted how many times the motif occurs in each circRNA sequence. They then posed a question: is the motif unusually enriched in any of the circRNAs, i.e., does any circRNA have too many occurrences of the motif to be explained by chance alone? If “yes”, the researchers could then focus their further experimental efforts on those circRNAs.
A statistical answer
Figure 1 illustrates a set of circRNAs with varying length, with a single miRNA motif occurring as indicated on each circRNA. Let \(i = 0,1 \ldots ,n  1\) index the circRNAs; the random variate \(X_{i}\) count the motif occurrences in the \(i\)th circRNA; \(k\left( i \right)\) equal the observed count for \(X_{i}\); and the sum \(S = \sum\nolimits_{i = 0}^{n  1} {X_{i} }\) count the total motif occurrences among the circRNAs, with observed total \(K = \sum\nolimits_{i = 0}^{n  1} {k\left( i \right)}\).
The following setup provides a general statistical test for deciding the biological question. Let \(\left\{ {X_{i} :i = 0,1, \ldots ,n  1} \right\}\) represent independent random counts (i.e., nonnegative integer random variates), not necessarily identically distributed, with sum \(S = \sum\nolimits_{i = 0}^{n  1} {X_{i} }\). Given observed values \(\left\{ {X_{i} = k\left( i \right):i = 0,1, \ldots ,n  1} \right\}\) with observed sum \(K = \sum\nolimits_{i = 0}^{n  1} {k\left( i \right)}\), consider the computation of the conditional pvalues \({\mathbb{P}}\left\{ {X_{i} \ge k\left( i \right)\left {S = K} \right.} \right\}\) (\(i = 0,1, \ldots ,n  1\)). The conditional pvalues can decide the question: “Is any term in the sum unusually large relative to the others?”
The abstract question in the previous paragraph generalizes some common tests. For example, the standard 2 × 2 Fisher exact test [2, p. 96] answers the question in the special case of \(n = 2\) categories: each \(X_{i}\) has a binomial distribution with common success probability \(p\), conditional on known numbers of trials \(N_{i}\) (\(i = 0,1\)). Although the Fisher exact test generalizes directly to a single exact pvalue for a \(2 \times n\) table [3], the generalization can require prohibitive amounts of computation. The abstract question corresponds to a computationally cheaper alternative that also decides which columns in the \(2 \times n\) table are unusual [4].
To derive an expression for the conditional pvalue, therefore, let \(g_{i} \left[ k \right] = {\mathbb{P}}\left\{ {X_{i} = k} \right\}\) be given, so the array \(g_{i} = \left( {g_{i} \left[ 0 \right],g_{i} \left[ 1 \right], \ldots ,g_{i} \left[ K \right]} \right)\) gives the distribution of \(X_{i}\), truncated at the observed total \(K = \sum\nolimits_{i = 0}^{n  1} {k\left( i \right)}\). Because \(g_{i}\) is a truncated probability distribution, \(g_{i} \in G\), the set of all real (\(K + 1\))tuples \(\left( {g\left[ 0 \right],g\left[ 1 \right], \ldots ,g\left[ K \right]} \right)\) satisfies \(g\left[ k \right] \ge 0\) (\(k = 0,1, \ldots ,K\)) and \(\sum\nolimits_{k = 0}^{K} {g\left[ k \right]} \le 1\). The truncation still permits exact calculation of the probabilities below. To calculate the distribution of the sum \(S = \sum\nolimits_{i = 0}^{n  1} {X_{i} }\) for \(S \le K\), define the truncated convolution operation \(g = g^{\prime} \circ g^{\prime\prime}\), for which \(g\left[ k \right] = \sum\nolimits_{j = 0}^{k} {g^{\prime}\left[ j \right]g^{\prime\prime}\left[ {k  j} \right]}\) (\(k = 0,1, \ldots ,K\)). Hereafter, the operation “\(\circ\)” is often left implicit: \(g^{\prime} \circ g^{\prime\prime} = g^{\prime}g^{\prime\prime}\).
Let \(\bar{g} = g_{0} g_{1} \cdots g_{n  1}\), so \(\bar{g}\left[ k \right] = {\mathbb{P}}\left\{ {S = k} \right\}\) (\(k = 0,1, \ldots ,K\)). Define the “jackknife products” \(\bar{g}_{j} = g_{0} g_{1} \cdots g_{j  1} g_{j + 1} \cdots g_{n  1}\) (\(0 \le j < n\)) (implicitly including the products \(\bar{g}_{0} = g_{1} g_{2} \cdots g_{n  1}\) and \(\bar{g}_{n  1} = g_{0} g_{1} \cdots g_{n  2}\)). The jackknife products contain the same products as \(\bar{g}\), except that in turn each skips over \(g_{j}\) (\(0 \le j < n\)). Like the jackknife procedure in statistics, therefore, jackknife products successively omit each datum in a dataset [5].
With the jackknife products in hand, the conditional pvalues are a straightforward computation:
With respect to Eq. (1) and the biological question in Fig. 1, Appendix B gives the count \(\bar{g}\left[ K \right]\) of the ways that \(n\) circRNAs of known but varying length may contain \(K\) miRNA motifs of equal length, the count \(g_{i} \left[ k \right]\) of the ways that the \(i\) th circRNA may contain \(k\) motifs, and the count \(\bar{g}_{i} \left[ {K  k} \right]\) of the ways that all circRNAs but the \(i\) th may contain \(K  k\) motifs. Appendix B derives the count \(g_{i} \left[ k \right]\) for circRNAs from the easier count for placing motifs on a linear RNA molecule. For combinatorial probabilities like \({\mathbb{P}}\left\{ {X_{i} \ge k\left( i \right)\left {S = K} \right.} \right\}\), Eq. (1) remains relevant, even if \(\left\{ {g_{i} \left[ k \right]} \right\}\) are counts instead of probabilities. The biological question therefore exemplifies a commonplace computational need in applied combinatorial probability.
The Discussion indicates that in our application, transform methods can encounter substantial obstacles when computing Eq. (1) (e.g., see [6]), because the quantities in Eq. (1) can range over many orders of magnitude. This article therefore pursues direct exact calculation of \({\mathbb{P}}\left\{ {X_{i} \ge k\left( i \right)\left {S = K} \right.} \right\}\). The product forms of \(\bar{g}\) and \(\left\{ {\bar{g}_{j} } \right\}\) suggest that any efficient algorithm may be abstracted to broaden its applications, as follows.
Semigroups, groups, and commutative groups
Let \(\left( {G, \circ } \right)\) denote a set \(G\) with a binary product \(g \circ g^{\prime}\) on its elements. Let “\(g \in G\)” denote “\(g\) is an element of \(G\)”, and consider the following properties [7].

(1)
Closure \(g \circ g^{\prime} \in G\) for every \(g,g^{\prime} \in G\)

(2)
Associative \(\left( {g \circ g^{\prime}} \right) \circ g^{\prime\prime} = g \circ \left( {g^{\prime} \circ g^{\prime\prime}} \right)\) for every \(g,g^{\prime},g^{\prime\prime} \in G\)

(3)
Identity There exists an identity element \(e \in G\), such that \(e \circ g = g \circ e = g\) for every \(g \in G\)

(4)
Commutative \(g \circ g^{\prime} = g^{\prime} \circ g\) for every \(g,g^{\prime} \in G\)
If the Closure and Associative properties hold, \(\left( {G, \circ } \right)\) is a semigroup. Without loss of generality, we assume below that the Identity property holds. If not, adjoin an element \(e \in G\), such that \(e \circ g = g \circ e = g\) for every \(g \in G\). In addition, if the Commutative property holds for every \(g,g^{\prime} \in G\), the semigroup \(\left( {G, \circ } \right)\) is commutative. Unless stated otherwise hereafter, \(\left( {G, \circ } \right)\) denotes a commutative semigroup. The Jackknife Product algorithm central to this article is correct in a commutative semigroup.

(5)
Inverse For every \(g \in G\), there exists an inverse \(g^{  1} \in G\), such that \(g \circ g^{  1} = g^{  1} \circ g = e\)
As shown later, the Jackknife Product algorithm does not require the Inverse property. In passing, note that the convolution semigroup relevant to the circRNA–miRNA application lacks the Inverse property, as does any convolution semigroup for calculating pvalues, e.g., the ones relevant to sequence motif matching [6]. To demonstrate, let \(X,Y \ge 0\) be independent integer random variates. The identity \(e\) for convolution corresponds to the variate \(Z = 0\), because \(0 + X = X + 0 = X\) for every variate \(X\). If \(X + Y = 0\), however, the independence of \(X\) and \(Y\) implies that both are constant and therefore \(X = Y = 0\). In the relevant convolution semigroup, therefore, all elements except the identity \(e\) lack an inverse.
The nonzero real numbers under ordinary multiplication form a commutative semigroup \(\left( {G, \circ } \right)\) with the Inverse property. They provide a familiar setting for discussing some algorithmic issues when computing \(\left\{ {\bar{g}_{j} } \right\}\). Let \(\bar{g} = g_{0} g_{1} \cdots g_{n  1}\) be the usual product of \(n\) real numbers, and consider the toy problem of computing all jackknife products \(\left\{ {\bar{g}_{j} } \right\}\) that omit a single factor \(g_{j}\) (\(0 \le j < n\)) from \(\bar{g}\). Inverses \(\left\{ {g_{j}^{  1} } \right\}\) are available, so an obvious algorithm computes \(\bar{g}\) and then \(\left\{ {\bar{g}_{j} = \bar{g}g_{j}^{  1} } \right\}\) with \(n\) inverses and \(2n  1 = \left( {n  1} \right) + n\) products. If the inverses were unavailable, however, the naïve algorithm using \(n  2\) elementbyelement products to compute each \(\left\{ {\bar{g}_{j} } \right\}\) would require \(n\left( {n  2} \right)\) products. The quadratic time renders the naïve algorithm impractical for many applications.
Figure 1 illustrates a standard data structure called a segment tree, omitting the root at the top of the segment tree. Algorithms based solely on a segment tree can calculate the jackknife products \(\left\{ {\bar{g}_{j} } \right\}\) in time \(O\left( {n\log n} \right)\), fast enough for many applications. The segment tree computes segment products like \(g_{{\left[ {i,j} \right)}} = g_{i} g_{i + 1} \cdots g_{j  1}\) without using the commutative property, so it can similarly compute jackknife products like \(\left\{ {\bar{g}_{j} = g_{{\left[ {0,j} \right)}} g_{{\left[ {j + 1,n} \right)}} } \right\}\). If the semigroup \(\left( {G, \circ } \right)\) is commutative, however, a Jackknife Product algorithm can avoid inverses and reduce the computational time further, from \(O\left( {n\log n} \right)\) to \(O\left( n \right)\). With inplace computations requiring only the space for the segment tree, the Jackknife Product algorithm avoids inverses yet still requires only about \(3n\) products and storage for \(2n\) numbers. It is therefore surprisingly economical, even when compared to the obvious algorithm using inverses. Indeed, our application to circular RNA required some economy, with its convolution of \(n = 3086\) distributions, some truncated only after \(K = 997\) terms. In a general statistical setting, convolutions form a commutative semigroup \(\left( {G, \circ } \right)\) without inverses, so our application already indicates that the Jackknife Product algorithm has broad applicability.
Theory
Appendix A proves the correctness of the Jackknife Product algorithm given below.
The Jackknife Product algorithm
Let \(\left( {G, \circ } \right)\) be a commutative semigroup. The Jackknife Product algorithm has three phases: upward, downward, and transposition. Its upward phase simply constructs a segment tree (see Fig. 1); its downward phase exploits the symmetry of \(g_{j}\) and its complement \(\bar{g}_{j}\) to mirror the upward phase while computing \(\left\{ {\bar{g}_{j} } \right\}\) (see Fig. 2); and its final transposition phase then swaps successive pairs in an array (not pictured). As Figs. 1 and 2 suggest, the three phases yield a simpler algorithm if \(n = n^{ * } = 2^{m}\) is a binary power. To recover the \(n^{ * } = 2^{m}\) algorithm from them, pad \(\left\{ {g_{j} } \right\}\) on the right with copies of the identity \(e\) up to \(n^{ * }\) elements, where \(n^{ * } = 2^{m}\) is the smallest binary power greater than or equal to \(n\), i.e., replace \(\left\{ {g_{j} } \right\}\) with \(\left\{ {g_{0} ,g_{1} \ldots ,g_{n  1} ,e, \ldots ,e} \right\}\), with \(n^{ * }  n\) copies of \(e\). The \(n^{ * } = 2^{m}\) algorithm can therefore pad any input of \(n\) elements up to \(n^{ * } = 2^{m}\) elements without loss of generality. The algorithm given below is therefore slightly more intricate than the \(n^{ * } = 2^{m}\) algorithm, but it may save almost a factor of 2 in storage and time by omitting the padded copies of \(e\). In any case, the simpler algorithm can always be recovered from the phases for general \(n\) given here, if desired.
We start with notational preliminaries. Define the floor function \(\left\lfloor x \right\rfloor = \hbox{max} \left\{ {j:j \le x} \right\}\) and the ceiling function \(\left\lceil x \right\rceil = \hbox{min} \left\{ {j:x \le j} \right\}\) (both standard); and the binary rightshift function \(\rho \left( j \right) = \left\lfloor {j/2} \right\rfloor\). Other quantities also smooth our presentation. Given a product \(\bar{g} = g_{0} g_{1} \cdots g_{n  1}\) of interest, define \(m = \left\lceil {\log_{2} n} \right\rceil\) and \(n_{k} = \left\lceil {n2^{  k} } \right\rceil\) for \(0 \le k < m\). Below, the symbol “□” connotes the end of a proof.
The upward phase
The upward phase starts with the initial array \(L_{0} \left[ j \right] = g_{j}\) (\(0 \le j < n\)) and simply computes a standard (but rootless) segment tree consisting of segment products \(L_{k} \left[ j \right]\) for \(j = 0,1, \ldots ,n_{k}  1\) and \(k = 0,1, \ldots ,m  1\).
Comments
(1) If \(n = n^{ * } = 2^{m}\) is a binary power, \(\rho \left( {n_{k  1} } \right) = n_{k} = 2^{m  k}\) and the final line in the upward phase can be omitted. (2) Of some peripheral interest, Laaksonen [8] gives the algorithm in a different context, embedding a binary tree in a single array of length \(O\left( n \right)\). If any \(L_{0} \left[ j \right] = g_{j}\) changes, he also shows how to update the single array with \(O\left( {\log n} \right)\) multiplications. If the downward phase (next) does not overwrite the segment tree \(\left\{ {L_{k} } \right\}\) by using inplace computation, it permits a similar update.
The downward phase
The transposition function \(\tau \left( j \right) = j + \left( {  1} \right)^{j}\) transposes adjacent indices, e.g., \(\left( {L_{\tau \left( 0 \right)} ,L_{\tau \left( 1 \right)} ,L_{\tau \left( 2 \right)} ,L_{\tau \left( 3 \right)} } \right) = \left( {L_{1} ,L_{0} ,L_{3} ,L_{2} } \right)\). We also require \(\alpha_{k} \left( j \right) = \hbox{min} \left\{ {\tau \rho \left( j \right),n_{k}  1} \right\}\) for \(0 \le j < n_{k  1}\) and \(1 \le k < m\), the index of “aunts”, as illustrated by Fig. 2. Just as Fig. 1 illustrates a rootless segment tree in the upward phase, Fig. 2 illustrates the corresponding rootless complementary segment tree in the downward phase.
The downward phase computes complementary segment products \(\bar{L}_{k} \left[ j \right]\) for \(j = 0,1 \ldots ,n_{k}  1\) and \(k = m  1,m  2, \ldots ,0\).
Comments
(1) If \(n = n^{ * } = 2^{m}\) is a binary power, \(\rho \left( {n_{k  1} } \right) = n_{k} = 2^{m  k}\), \(\alpha_{k} \left( j \right) = \tau \rho \left( j \right)\), and the final line in the downward phase can be omitted. (2) The downward phase can be modified in the obvious fashion to permit inplace calculation of \(\bar{L}_{k  1} \left[ j \right]\) from \(L_{k  1} \left[ j \right]\), reducing total memory allocation by about 2.
As Appendix A proves, the final array \(\bar{L}_{0}\) has elements \(\bar{L}_{0} \left[ {\tau \left( j \right)} \right] = \bar{g}_{j}\) (\(0 \le j < 2\rho \left( n \right)\)), with an additional final element \(\bar{L}_{0} \left[ {n  1} \right] = \bar{g}_{n  1}\) if \(n_{0} = n\) is odd, so the Jackknife Product algorithm ends with a straightforward transposition phase.
Comments
The transposition phase can permit an inplace calculation of \(\left\{ {\bar{g}_{j} } \right\}\) to overwrite \(\bar{L}_{0}\).
Computational time and storage
Note \(n_{j} = \left\lceil {n2^{  j} } \right\rceil\), so \(0 \le n_{j}  n2^{  j} < 1\). To compute \(L_{k}\) from \(L_{k  1}\) or to compute \(\bar{L}_{k}\) from \(L_{k}\) and \(\bar{L}_{k + 1}\), the Jackknife Product algorithm requires \(n_{k}\) products. For large \(n\), therefore, the upward phase computing the segment tree requires about \(\sum\nolimits_{j = 1}^{m  1} {n_{j} } \approx \sum\nolimits_{j = 1}^{\infty } {n2^{  j} } = n\) products; the downward phase, about \(\sum\nolimits_{j = 0}^{m  2} {n_{j} } \approx 2n\) products. Likewise, if the downward and transposition phases compute in place by replacing \(L_{k}\) with \(\bar{L}_{k}\) and \(\bar{L}_{0}\) with \(\left\{ {\bar{g}_{j} } \right\}\), the algorithm storage is \(\sum\nolimits_{j = 0}^{m  1} {n_{j} } \approx 2n\) semigroup elements. Each of the three estimates just given for products and storage have an error bounded by \(m = \left\lceil {\log_{2} n} \right\rceil\). Although the case of general \(n\) could be handled by the algorithm for binary powers \(n^{ * } = 2^{m}\) by setting \(m = \left\lceil {\log_{2} n} \right\rceil\) and \(g_{n} = g_{n + 1} = \ldots = g_{{n^{ * }  1}} = e\), the truncated arrays in the Jackknife Product algorithm for general \(n\) save about a factor of \(1 \le n^{ * } /n < 2\) in both products and storage.
As written, the conditional copy statements at the end of the upward and downward phases replicate elements already in storage. If the downward phase of the Jackknife Product algorithm is implemented with inplace computation of \(\bar{L}_{k  1} \left[ j \right]\) from \(L_{k  1} \left[ j \right]\), the copy statements ensure that the algorithm never overwrites any array element it needs later. Some statements may copy some elements more than once (and therefore unnecessarily), but a negligible \(m = \left\lceil {\log_{2} n} \right\rceil\) copies at most are unnecessary.
The complementary segment tree in Fig. 2 implicitly indicates the nodes in the segment tree required to compute \(L_{0} \left[ {\tau \left( j \right)} \right] = \bar{g}_{j}\) for each \(\bar{g}_{j}\), i.e., exactly one node in each row \(L_{k}\) (\(k = 0,1, \ldots ,m  1\)). Alone, the segment tree therefore requires at least \(n\log_{2} n\) multiplications to compute \(\left\{ {\bar{g}_{j} } \right\}\).
Results
Appendix B gives the combinatorics relevant to the circRNAmiRNA application described in “Background” section. As is typical in combinatorial probability, the quantities \(\left\{ {g_{i} \left[ k \right]} \right\}\) were counts of configurations, here, the ways of placing miRNA motifs on circRNAs. The length of each motif was m = 7; the largest circRNA (hsacirc0003473) contained \(I\) = 158,565 nt, and the most abundant motif (CCCAGCU, for the m129star miRNA family) appeared \(K\) = 997 times, so the \(\left\{ {g_{i} \left[ k \right]} \right\}\) spanned thousands of orders of magnitude in Eq. (13) of Appendix B, from \(g_{i} \left[ 0 \right] = 1\) to \(g_{I} \left[ K \right] \approx 10^{2608}\). In Eq. (1), the dimension \(K\) controls the number of terms in the convolutions. In the application, over each miRNA motif examined, the maximum number of motif occurrences on the circRNAs was \(K = 997\). An Intel Core i73770 CPU computed the pvalue relevant to the biological application on June 17, 2015. To compare later with estimated times for competing algorithms, the Jackknife Product algorithm with \(n = 3086\) computed the relevant pvalues in about 45 min, requiring about \(3n\) products. In the application, therefore, \(n\) products required about 15 min.
The application of this article to circRNA–miRNA data appears elsewhere [1].
Discussion
This article has presented a Jackknife Product algorithm, which applies to any commutative semigroup \(\left( {G, \circ } \right)\). The biological application to a circRNA–miRNA system exemplifies a general statistical method in combinatorial probability. In turn, the application in combinatorial probability exemplifies an even more general statistical test for whether a term in a sum of independent counting variates (not necessarily identically distributed) is unusually large.
Many biological contexts lead naturally to sums of independent counting variates. Domain alignments of proteins from cancer patients, e.g., display point mutations in their columns. For a given domain, a column with an excess of mutations might be inferred to cause cancer [9]. The Background section gives the pattern: let \(X_{i}\) represent the mutation count in column \(i = 0,1, \ldots ,n  1\), with total mutations \(S = \sum\nolimits_{i = 0}^{n  1} {X_{i} }\). Given observed mutation counts \(\left\{ {X_{i} = k\left( i \right):i = 0,1, \ldots ,n  1} \right\}\) with observed sum \(K = \sum\nolimits_{i = 0}^{n  1} {k\left( i \right)}\), the conditional pvalues \({\mathbb{P}}\left\{ {X_{i} \ge k\left( i \right)\left {S = K} \right.} \right\}\) (\(i = 0,1, \ldots ,n  1\)) can decide the question: “Does any column have an excess of mutations?” The actual application used other, very different statistical methods [9]. Unlike those methods, however, our methods can incorporate information from control (noncancer) protein sequences to set columnspecific background distributions for \(\left\{ {X_{i} } \right\}\).
The Benjamini–Hochberg procedure for controlling the false discovery rate in multiple tests requires either independent pvalues [10] or dependent pvalues with a positive regression dependency property [11]. Loosely, the positive regression dependency property means that the pvalues tend to be small together, i.e., under the null hypothesis, given that one pvalue is small, then the other pvalues tend to be smaller also. Inconveniently, our null hypothesis posits a fixed sum of independent counting variates, so if one variate is large and has a small pvalue, it tends to reduce the other variates and increase their pvalues. The circRNAmiRNA application therefore violates the statistical hypotheses of the Benjamini–Hochberg procedure. Fortunately, in the circRNAmiRNA application, a Bonferroni multiple test correction [12] sufficed because empirically, any pvalue was either close to 1 or extremely small.
The Results state that for \(n = 3086\), the Jackknifed Product algorithm computed the relevant pvalues in about 45 min, with \(n\) products requiring about 15 min of computation. In contrast, the naïve algorithm avoiding inverses and requiring \(n\left( {n  2} \right)\) products would have taken about \(3086*15\) min, i.e., about 1 month. As explained under the “Computational Time and Storage” heading in the Theory section, without exploiting the special form of the jackknife products \(\left\{ {\bar{g}_{j} } \right\}\), a segment tree requires about \(n\) products for its construction and at least \(n\log_{2} n\) products for the computation of the products \(\left\{ {\bar{g}_{j} } \right\}\). Alone, segment tree algorithms would therefore have taken a minimum of about \(\left( {1 + \log_{2} 3086} \right)*15\) min, i.e., about 3 h.
The convolutions in Eq. (1) might suggest that jackknife products are susceptible to computation with Fourier or Laplace transforms, which convert convolutions into products. “Results” section notes that in the biological application, however, \(\left\{ {g_{i} \left[ k \right]} \right\}\) in Eq. (1) spanned thousands of orders of magnitude, at least from \(g_{i} \left[ 0 \right] = 1\) to \(g_{I} \left[ K \right] \approx 10^{2608}\), obstructing the direct use of transforms (e.g., see [6]). On one hand, the widely varying magnitudes necessitated an internal logarithmic representation of \(\left\{ {g_{i} \left[ k \right]} \right\}\) in the computer, a minor inconvenience for direct computation with the Jackknife Product algorithm. On the other hand, they might have presented a substantial obstacle for transforms. The famous Feynman anecdote about Paul Olum’s \(\tan \left( {10^{100} } \right)\) problem indicates the reason [13]:
So Paul is walking past the lunch place and these guys are all excited. “Hey, Paul!” they call out. “Feynman’s terrific! We give him a problem that can be stated in ten seconds, and in a minute he gets the answer to 10 percent. Why don’t you give him one?” Without hardly stopping, he says, “The tangent of 10 to the 100th.” I was sunk: you have to divide by pi to 100 decimal places! It was hopeless.
The Jackknife Product algorithm also abstracts to any commutative semigroup \(\left( {G, \circ } \right)\), broadening its applicability enormously. As usual, abstraction eases debugging. Consider, e.g., the commutative semigroup consisting of all bit strings of length \(n\) under the bitwise “or” operation. If the bit string \(g_{j}\) has 1 in the jth position and 0 s elsewhere, then the segment product \(g_{{\left[ {i,j} \right)}}\) equals the bit string with 1 s in positions \(\left[ {i,j} \right) = \left\{ {i,i + 1, \ldots ,j  1} \right\}\) and 0 s elsewhere. Similarly, the complementary segment product \(g_{{\overline{{\left[ {i,j} \right)}} }} = g_{{\left[ {0,i} \right)}} g_{{\left[ {j,n} \right)}}\) equals the bit string with 0 s in positions \(\left[ {i,j} \right) = \left\{ {i,i + 1, \ldots ,j  1} \right\}\) and 1 s elsewhere. The Jackknife Product algorithm is easily debugged with output consisting of the segment and complementary segment trees for the bit strings.
As a final note, even if a semigroup \(\left( {G, \circ } \right)\) lacks the Commutative property, the general product algorithm for a segment tree can still compute \(\left\{ {\bar{g}_{j} = g_{{\left[ {0,j} \right)}} g_{{\left[ {j + 1,n} \right)}} } \right\}\) in time \(O\left( {n\log n} \right)\). In a commutative semigroup \(\left( {G, \circ } \right)\), however, the downward phase of the Jackknife Product algorithm exploits the special form of the products \(\left\{ {\bar{g}_{j} } \right\}\) to decrease the time to \(O\left( n \right)\).
Conclusions
This article has presented a Jackknife Product algorithm, which applies to any commutative semigroup \(\left( {G, \circ } \right)\). The biological application to a circRNA–miRNA system uses a commutative semigroup of truncated convolutions to exemplify a specific application to combinatorial probabilities. In turn, the specific application in combinatorial probability exemplifies an even more general statistical test for whether a term in a sum of independent counting variates (not necessarily identically distributed) is unusually large. The general statistical test can evaluate the results of searching for a sequence or structure motif, or several motifs simultaneously. As “Discussion” section explains, the test violates the hypotheses of the Benjamini–Hochberg procedure for estimating false discovery rates, but fortunately the Bonferroni and other multipletest corrections remain available to control familywise errors. Abstraction from convolutions to commutative semigroups broadens the algorithm’s applicability even further. If an application only requires jackknife products \(\left\{ {\bar{g}_{j} } \right\}\) and their number \(n\) is large enough, “Results” and “Theory” sections show that the linear time of the Jackknife Product algorithm can make it well worth the programming effort.
Availability of data and materials
A selftesting, annotated file “jls_jackknifeproduct.py” implementing an inplace Jackknife Product algorithm in Python is available without any restrictions at https://github.com/johnlspouge/jackknifeproduct. Data were previously published elsewhere [1].
References
Tagawa T, Gao SJ, Koparde VN, Gonzalez M, Spouge JL, Serquina AP, Lurain K, Ramaswami R, Uldrick TS, Yarchoan R, et al. Discovery of Kaposi’s sarcoma herpesvirusencoded circular RNAs and a human antiviral circular RNA. Proc Natl Acad Sci USA. 2018;115(50):12805–10.
Siegel S. Nonparametric statistics for the behavioral sciences. 1st ed. New York: MacGrawHill; 1956.
Freeman GH, Halton JH. Note on an exact treatment of contingency, goodness of fit and other problems of significance. Biometrika. 1951;38(1–2):141–9.
Fisher Exact Test Batch Processing. https://tinyurl.com/spougefisher.
Efron B, Stein C. The Jackknife estimate of variance. Ann Stat. 1981;9(3):586–96.
Nagarajan N, Keich U. FAST: Fourier transform based algorithms for significance testing of ungapped multiple alignments. Bioinformatics. 2008;24(4):577–8.
Artin M. Algebra. Eaglewood Cliffs: PrenticeHall; 1991.
Laaksonen A. Competitive Programmer’s Handbook; 2017.
Peterson TA, Gauran IIM, Park J, Park D, Kann MG. Oncodomains: a protein domaincentric framework for analyzing rare variants in tumor samples. PLoS Comput Biol. 2017;13(4):e1005428.
Benjamini Y, Hochberg Y. Controlling the false discovery rate—a practical and powerful approach to multiple testing. J R Stat Soc Ser BMethodol. 1995;57(1):289–300.
Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann Stat. 2001;29(4):1165–88.
Hochberg Y. A sharper Bonferroni procedure for multiple tests of significance. Biometrika. 1988;75(4):800–2.
Feynman R, Leighton R, Hutchings E. Surely your’re joking, Mr Feynman!. New York: W.W. Norton & Company; 1985.
Acknowledgements
JLS would like to acknowledge useful conversations with Dr. Amir Manzour and Dr. DoHwan Park.
Funding
This research was supported by the Intramural Research Program of the NIH, National Library of Medicine, and by the Center for Cancer Research, National Cancer Institute.
Author information
Authors and Affiliations
Contributions
JLS developed the algorithms, performed computational and statistical analysis, and drafted the manuscript. JMZ provided the data stimulating and exemplifying the analysis. MG performed bioinformatics and sequence analysis. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
This appendix proves the correctness of the Jackknife Product algorithm in “Theory” section.
Let \(L_{k}\) have length \(l_{k}\); \(\bar{L}_{k}\), length \(\bar{l}_{k}\). Some observations about \(l_{k}\), \(\bar{l}_{k}\), and \(n_{k}\) facilitate later analysis. Because \(\left\lceil {\left\lceil x \right\rceil /2} \right\rceil = \left\lceil {x/2} \right\rceil\), \(\left\{ {n_{k} } \right\}\) satisfies the recursion \(n_{k} = \left\lceil {n2^{  k} } \right\rceil = \left\lceil {n2^{{  \left( {k  1} \right)}} /2} \right\rceil = \left\lceil {n_{k  1} /2} \right\rceil\), with initial value \(n_{0} = n\) and final value \(n_{m  1} = 2\).
Proposition 1
\(l_{k} = \bar{l}_{k} = n_{k}\) for \(0 \le k < m\).
Proof (by induction)
\(l_{0} = n = n_{0}\). If \(l_{k  1} = n_{k  1}\) for any \(1 \le k < m\), the upward phase of the Jackknife Product algorithm shows: (1) if \(n_{k  1}\) is even, \(l_{k} = \rho \left( {n_{k  1} } \right)\); and (2) if \(n_{k  1}\) is odd, \(l_{k} = \rho \left( {n_{k  1} } \right) + 1\). In either case, \(l_{k} = \left\lceil {n_{k  1} /2} \right\rceil = n_{k}\). Thus, \(l_{k} = n_{k}\) for \(0 \le k < m\).
Similarly, the downward phase shows: (1) if \(n_{k  1}\) is even, \(\bar{l}_{k  1} = 2\rho \left( {n_{k  1} } \right) = n_{k  1}\); if \(n_{k  1}\) is odd, \(\bar{l}_{k  1} = 2\rho \left( {n_{k  1} } \right) + 1 = n_{k  1}\). It therefore initializes \(\bar{L}_{m  1}\) with \(\bar{l}_{m  1} = n_{m  1} = 2\) elements and assigns \(\bar{l}_{k  1} = n_{k  1}\) (\(1 \le k < m\)) elements to \(\bar{L}_{k  1}\). □
Proposition 1 and its proof ensure that with the possible exception of \(\alpha_{k} \left( j \right)\) in the downward phase, all array indices in the Jackknife Product algorithm lie within the array bounds of \(L_{k}\) and \(\bar{L}_{k}\). Moreover, casebycase analysis of the definition of \(\alpha_{k}\) shows that \(\alpha_{k} \left( j \right)\) (\(0 \le j < n_{k  1}\)) always falls within the array bounds \(0 \le \alpha_{k} \left( j \right) < n_{k}\) of \(\bar{L}_{k}\). Inspection of the upward and downward phases shows that they define every array element before using it. With array bounds and definitions in hand, to verify the Jackknife Product algorithm, it therefore suffices to check conditions satisfied by individual elements of \(L_{k}\) and \(\bar{L}_{k}\). We examine first the case of binary powers \(n = n^{ * } = 2^{m}\), and afterwards the case of general \(n\).
Proof of Correctness for Binary Powers \(n^{ * } = 2^{m} \ge 2\)
In this subsection, some entities pertaining to binary powers \(n^{ * }\) receive stars (e.g., \(n^{ * }\), \(n_{k}^{ * }\), \(L_{k}^{ * }\), \(\bar{L}_{k}^{ * }\)), to distinguish them later from the corresponding entities for general \(n\).
For convenience in Appendix A only, drop “\(g\)” in the notation \(g_{{\left[ {i,j} \right)}}\), and abbreviate the segment product \(g_{{\left[ {i,j} \right)}} = g_{i} g_{i + 1} \cdots g_{j  1}\) by the corresponding halfopen interval \(\left[ {i,j} \right)\) and the complementary segment product \(g_{0} g_{1} \cdots g_{i  1} g_{j} \cdots g_{{n^{ * }  1}}\) by \(\overline{{\left[ {i,j} \right)}}\). The notation provides a mnemonic aid, used without comment below: for \(i < j < k\), \(\left[ {i,j} \right) \circ \left[ {j,k} \right) = \left[ {i,k} \right)\), \(\left[ {i,j} \right) \circ \overline{{\left[ {i,k} \right)}} = \overline{{\left[ {j,k} \right)}}\) and \(\left[ {j,k} \right) \circ \overline{{\left[ {i,k} \right)}} = \overline{{\left[ {i,j} \right)}}\), i.e., the product \(\circ\) on subproducts behaves like a settheoretic union of the corresponding intervals, and complementary subproducts behave like the corresponding settheoretic complements of intervals.
The Commutative property is required to justify the correspondence between settheoretic operations and products, e.g., the equality \(\left[ {i,j} \right) \circ \overline{{\left[ {i,k} \right)}} = \overline{{\left[ {j,k} \right)}}\) commutes the segment products: examine, e.g., the second equality in the equation
Proposition 2
For \(0 \le k < m\), \(L_{k}^{ * } \left[ j \right] = \left[ {j \times 2^{k} ,\left( {j + 1} \right) \times 2^{k} } \right)\) (\(0 \le j < n_{k}^{*}\)).
Proof
See Fig. 1. The array \(L_{0}^{ * }\) at generation \(k = 0\) initializes the upward phase, where
Thus, Proposition 2 is true for \(k = 0\). Given the array \(L_{k  1}^{ * }\) at generation \(k  1\) in the upward phase, the array \(L_{k}^{ * }\) at level \(k\) contains products of successive adjacent pairs of elements in \(L_{k  1}^{ * }\):
for \(0 \le j < n_{k}^{ * } = n_{k  1}^{ * } /2\). The upward phase terminates with \(L_{m  1}^{ * } = \left( {\left[ {0,2^{m  1} } \right),\left[ {2^{m  1} ,2^{m} } \right)} \right)\), so Proposition 2 is true for \(0 \le k < m\). □
Proposition 3
For \(0 \le k < m\), \(\bar{L}_{k}^{ * } \left[ j \right] = \overline{{\left[ {\tau \left( j \right) \times 2^{k} ,\left( {\tau \left( j \right) + 1} \right) \times 2^{k} } \right)}}\) (\(0 \le j < n_{k}^{ * }\)).
Comment
Propositions 3 and 2 formalize the previously mentioned complementary symmetry between the upward and downward phases. Because \(\tau \left( {\tau \left( j \right)} \right) = j\) (i.e., transposition is idempotent), \(\bar{L}_{0}^{ * } \left[ {\tau \left( j \right)} \right] = \overline{{\left[ {j,j + 1} \right)}} = \bar{g}_{j}\) for \(0 \le j < n_{0}^{ * } = n^{ * }\). Thus, \(\bar{L}_{0}^{ * }\) contains all jackknife products.
Proof
See Fig. 2. For \(0 \le j < n_{m  1}^{ * } = 2\), the first two lines of pseudocode in the downward phase and Proposition 2 for \(k = m  1\) show that for \(j \in \left\{ {0,1} \right\}\),
so Proposition 3 holds for \(k = m  1\). We proceed by descending induction on \(k\).
For even \(j\) on one hand, \(2\rho \left( j \right) = j < j + 1 = \tau \left( j \right) < \tau \left( j \right) + 1 = 2\left( {\rho \left( j \right) + 1} \right)\). For \(0 \le j < n_{k  1}^{ * }\), therefore,
For odd \(j\) on the other hand, \(2\rho \left( j \right) = \tau \left( j \right) < \tau \left( j \right) + 1 = j < j + 1 = 2\left( {\rho \left( j \right) + 1} \right)\), so Eq. (6) holds with the factors on the left reversed, an irrelevant difference in a commutative semigroup \(\left( {G, \circ } \right)\).
For \(0 \le j < n_{k  1}^{ * }\), then, if \(1 \le k < m\), Proposition 2 for \(k  1\) and Proposition 3 for \(k\) yield
Thus, Proposition 3 for \(k\) implies Proposition 3 for \(k  1\) (\(1 \le k < m\)). □
The Jackknife Product algorithm therefore computes \(\bar{L}_{0}^{ * } \left[ {\tau \left( j \right)} \right] = \overline{{\left[ {j,j + 1} \right)}} = \bar{g}_{j}\), as desired.
Proof of correctness for general \(n \ge 2\): For general \(n \ge 2\), initialize the upward phase with \(L_{0} = \left( {g_{0} ,g_{1} , \ldots ,g_{n  1} } \right)\). To apply the results of the previous subsection, let \(n^{ * } = 2^{m}\) be the smallest binary power greater than or equal to \(n\), i.e., let \(m = \left\lceil {\log_{2} n} \right\rceil\). If \(n < n^{ * }\), set \(g_{n} = g_{n + 1} = \cdots = g_{{n^{ * }  1}} = e\), with the arrays \(L_{k}^{ * }\) and \(\bar{L}_{k}^{ * }\) of length \(n_{k}^{ * }\) (\(0 \le k < m\)) as above. For general \(n \ge 2\), consider the computation of the arrays \(L_{k}\) (\(0 < k < m\)) in the upward phase of the pseudocode above.
We prove Proposition \(P\left( k \right)\) (next) by induction on the level \(0 \le k < m\).
Proposition \(P\left( k \right)\)
\(L_{k} \left[ j \right] = L_{k}^{ * } \left[ j \right]\) for \(0 \le j < n_{k}\), and \(L_{k}^{ * } \left[ j \right] = e\) for \(n_{k} \le j < n_{k}^{ * }\).
Proof
See Fig. 1. By construction, \(P\left( 0 \right)\) holds. With \(P\left( {k  1} \right)\) in hand, the upward phase and Eq. (4) show that \(L_{k} \left[ j \right] = L_{k}^{ * } \left[ j \right]\) for \(0 \le j < \rho \left( {n_{k  1} } \right)\). On one hand, if \(n_{k  1}\) is even, \(\rho \left( {n_{k  1} } \right) = n_{k}\), yielding \(P\left( {k  1} \right)\) immediately. On the other hand, if \(n_{k  1}\) is odd, \(\rho \left( {n_{k  1} } \right) = \left\lceil {n_{k  1} /2} \right\rceil  1 = n_{k}  1\), so
the second equality reflecting the copy of the final element of \(L_{k  1}\) in the pseudocode; the third and fifth, \(P\left( {k  1} \right)\); and the sixth, Eq. (4). Equation (8) completes the proof that \(L_{k} \left[ j \right] = L_{k}^{ * } \left[ j \right]\) for \(0 \le j < n_{k}\). For the remaining indices \(n_{k} \le j < n_{k}^{ * }\) of \(L_{k}^{ * }\), note that \(n_{k  1} \le 2\left\lceil {n_{k  1} /2} \right\rceil = 2n_{k} \le 2j < 2j + 1 < 2n_{k}^{ * } = n_{k  1}^{ * }\). Then, \(P\left( {k  1} \right)\) and Eq. (4) show that \(L_{k}^{ * } \left[ j \right] = L_{k  1}^{ * } \left[ {2j} \right] \circ L_{k  1}^{ * } \left[ {2j + 1} \right] = e \circ e = e\) for \(n_{k} \le j < n_{k}^{ * }\). □
Note: \(n_{m  1} = 2 = n_{m  1}^{ * }\), so \(P\left( {m  1} \right)\) shows that \(L_{m  1} = L_{m  1}^{ * }\).
In the downward phase, the transposition function \(\tau\) in Eq. (7) facilitates inplace computation for \(\bar{L}_{k  1}^{ * } \left[ j \right]\) in Eq. (7). Similarly, the minimization in the accessory index \(\alpha_{k} \left( j \right) = \hbox{min} \left\{ {\tau \rho \left( j \right),n_{k}  1} \right\}\) within \(\bar{L}_{k}\) avoids storing a superfluous element \(\bar{g}\) of \(\bar{L}_{k}^{ * }\) within the penultimate element of any truncated complementary array \(\bar{L}_{k}\) (see \(\bar{L}_{1}^{ * } \left[ 3 \right]\), the dotted circle in Fig. 2).
We prove Proposition \(\bar{P}\left( k \right)\) (next) by descending induction on the level \(0 \le k < m\).
Proposition \(\bar{P}\left( k \right)\)
\(\bar{L}_{k} \left[ j \right] = \bar{L}_{k}^{ * } \left[ j \right]\) for \(0 \le j < n_{k}\), unless \(n_{k}\) is odd and \(j = n_{k}  1\), in which case \(\bar{L}_{k} \left[ {n_{k}  1} \right] = \bar{L}_{k}^{ * } \left[ {n_{k} } \right]\).
Proof
The proposition \(\bar{P}\left( {m  1} \right)\) is true, because \(\bar{L}_{m  1} = L_{m  1} = L_{m  1}^{ * } = \bar{L}_{m  1}^{ * }\), and \(\bar{L}_{m  1}\) has even length \(n_{m  1} = 2\). We now show that \(\bar{P}\left( k \right)\) implies \(\bar{P}\left( {k  1} \right)\) for \(0 < k < m\).
If \(0 \le j < n_{k  1}\), then \(0 \le \rho \left( j \right) < \left\lceil {n_{k  1} /2} \right\rceil = n_{k}\). For every \(j\), either: (1) \(n_{k}\) is odd and \(\rho \left( j \right) = n_{k}  1\); (2) \(n_{k}\) is odd and \(\rho \left( j \right) < n_{k}  1\); or (3) \(n_{k}\) is even. In Case 1, \(\tau \rho \left( j \right) = n_{k}\) but \(\alpha_{k} \left( j \right) = n_{k}  1\). Because \(n_{k}\) is odd, \(\bar{P}\left( k \right)\) implies \(\bar{L}_{k} \left[ {\alpha_{k} \left( j \right)} \right] = \bar{L}_{k} \left[ {n_{k}  1} \right] = \bar{L}_{k}^{ * } \left[ {n_{k} } \right] = \bar{L}_{k}^{ * } \left[ {\tau \rho \left( j \right)} \right]\). In Case 2, \(0 \le \alpha_{k} \left( j \right) = \tau \rho \left( j \right) < n_{k}  1\), or in Case 3, \(0 \le \alpha_{k} \left( j \right) = \tau \rho \left( j \right) < n_{k}\) and \(n_{k}\) is even. In either case, \(\bar{P}\left( k \right)\) implies \(\bar{L}_{k} \left[ {\alpha_{k} \left( j \right)} \right] = \bar{L}_{k} \left[ {\tau \rho \left( j \right)} \right] = \bar{L}_{k}^{ * } \left[ {\tau \rho \left( j \right)} \right]\). Thus, regardless of whether Case 1, 2, or 3 pertains, \(\bar{L}_{k} \left[ {\alpha_{k} \left( j \right)} \right] = \bar{L}_{k}^{ * } \left[ {\tau \rho \left( j \right)} \right]\) for every \(0 \le j < n_{k  1}\).
For \(0 \le j < 2\rho \left( {n_{k  1} } \right)\), the Jackknife Product algorithm in the downward phase and \(P\left( {k  1} \right)\) from the upward phase yield
if \(j < n_{k  1}\). On one hand, if \(n_{k  1}\) is even, \(n_{k  1} = 2\rho \left( {n_{k  1} } \right)\), yielding \(\bar{P}\left( {k  1} \right)\) immediately. On the other hand, if \(n_{k  1}\) is odd, \(n_{k  1} = 2\rho \left( {n_{k  1} } \right) + 1\), so \(\bar{L}_{k  1}\) has an additional, final element copied from \(\bar{L}_{k}\):
\(P\left( {k  1} \right)\) yields \(L_{k  1}^{ * } \left[ {n_{k  1} } \right] = e\), so moreover,
Because \(n_{k  1}  1\) is even, \(\rho \left( {n_{k  1}  1} \right) = \rho \left( {n_{k  1} } \right)\). Equations (10) and (11) therefore yield \(\bar{L}_{k  1} \left[ {n_{k  1}  1} \right] = \bar{L}_{k  1}^{ * } \left[ {n_{k  1} } \right]\), so \(\bar{P}\left( {k  1} \right)\) holds. □
Appendix B
This appendix gives combinatoric calculations for the circRNAmiRNA application in “Background” section. It therefore has some peripheral interest to this article.
Motifs on a single circle
Consider \(r\) points equally spaced around a circle (a “ring”). Call a set of \(m\) consecutive points on the ring a “motif”. The following fixes \(m\), so the notation can leave it implicit. Let \(C_{r,k}\) count the ways of choosing \(k\) nonoverlapping motifs around the ring (i.e., the motifs have no point in common). Note: \(C_{r,k} = 0\) if \(r < mk\) or \(k < 0\). Define the factorial function \(n! = n\left( {n  1} \right) \cdots 1\) and the binomial (combinatorial or Pascal) coefficient
for \(0 \le k \le n\) and 0 otherwise.
Theorem
Clearly, \(C_{r,k} = 1\) when \(r = mk\). For \(r > mk\),
counts the ways of placing \(k\) motifs around the ring. For convenience below and for consistency with Eq. (13), \(C_{r,0} = 1\) for \(r \ge 0\).
Proof
Consider a line segment containing \(r\) equally spaced points, and let \(L_{r,k}\) count the ways of choosing \(k\) nonoverlapping motifs, each of \(m\) consecutive points, on it. First, \(C_{r,k} = L_{r  1,k} + mL_{r  m,k  1}\), proved as follows. Number the ring points arbitrarily as positions \(1,2, \ldots ,r\), and place the \(k\) motifs as follows. Consider position 1, which might have no motif. If so, place the \(k\) motifs in a line consisting of \(r  1\) positions \(2,3, \ldots ,r\) (\(L_{r  1,k}\) ways). Otherwise, place the first motif in one of the \(m\) positions in which it covers position 1, and then place the remaining \(k  1\) motifs in a line consisting of \(r  m\) positions (\(mL_{r  m,k  1}\) ways). Each such configuration corresponds to placing \(k\) leftmost endpositions on the line. For each of the \(k\) motifs, delete \(m  1\) positions on the ring, all but its leftmost endposition. Each of the configurations for \(k\) motifs therefore corresponds to choosing \(k\) positions from the \(r  \left( {m  1} \right)k\) positions remaining, yielding the remaining factors in Eq. (13).
Motifs on several circles
Now, let \(C_{r\left( 1 \right),r\left( 2 \right), \ldots ,r\left( N \right);K}\) count the ways of distributing \(K\) nonoverlapping motifs (all of \(m\) consecutive points) around several rings, the rings’ points numbering \(r\left( n \right)\) (\(n = 1,2 \ldots ,N\)). Without loss of generality, assume \(r\left( n \right) \ge m\) (\(n = 1,2 \ldots ,N\)). (Otherwise, discard all rings with \(r\left( n \right) < m\).) The following recursion holds: \(C_{r\left( 1 \right),r\left( 2 \right), \ldots ,r\left( N \right);K} = 0\) for \(K < 0\) or \(K > \sum\nolimits_{n = 1}^{N} {\left\lfloor {r\left( n \right)/m} \right\rfloor }\), \(C_{r\left( 1 \right),r\left( 2 \right), \ldots ,r\left( N \right);0} = 1\), and otherwise
where on the right, the index of summation \(k\) runs from \(\hbox{max} \left\{ {0,K  \sum\nolimits_{n = 1}^{N  1} {\left\lfloor {r\left( n \right)/m} \right\rfloor } } \right\}\) up to \(\hbox{min} \left\{ {K,\left\lfloor {r\left( N \right)/m} \right\rfloor } \right\}\). Equation (13) initializes the convolution recursion in Eq. (14) with \(C_{r\left( 1 \right),K}\).
Relation to the Jackknife Product algorithm
To apply the Jackknife Product algorithm in the circRNA–miRNA application, let \(g_{i} = \left( {C_{r\left( i \right),0} ,C_{r\left( i \right),1} , \ldots ,C_{r\left( i \right),K} } \right)\) (\(i = 1, \ldots ,N\)) in the (commutative) semigroup \(\left( {G, \circ } \right)\) of nonnegative integer vectors with indices \(k = 0,1, \ldots ,K\), under the convolution operation.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Spouge, J.L., Ziegelbauer, J.M. & Gonzalez, M. A lineartime algorithm that avoids inverses and computes Jackknife (leaveoneout) products like convolutions or other operators in commutative semigroups. Algorithms Mol Biol 15, 17 (2020). https://doi.org/10.1186/s1301502000178x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1301502000178x
Keywords
 Commutative semigroup
 Leaveoneout
 Jackknife products
 Segment tree
 Data structure