In this section we prove hardness of the Longest Run Subsequence problem. More precisely, we show that dLRS, the decision version of the problem is NP-complete. An instance of dLRS is given by a tuple (S, k) and consists in answering the question whether S has a longest run subsequence of length at least k.
Theorem 1
dLRS is NP-complete.
Proof
It is easy to see that dLRS is in NP, because it can be checked in polynomial time whether a string \(s'\) is a solution, that is, \(s'\) is a run subsequence and \(|s'| \ge k\).
To prove NP-hardness, we reduce from the Linear Ordering Problem (LOP), which has been shown to be NP-hard [9]. LOP takes a complete directed graph with edge weights and no self-loops as input and looks for an ordering among the vertices, such that the total weights of edges following this order (i.e., edges leading from lower ordered vertices to higher ordered vertices) is maximized.
We show that dLOP, the decision problem of LOP, that is, the question whether a vertex ordering exists whose weight is at least a given threshold, can be polynomially reduced to dLRS. Let \(G=(V,E)\) be a complete digraph with \(|V |=n\). We denote the weight of \((v_i, v_j) \in E\) with \(w_{ij}\) and the sum of all weights of G as \(w_{\text {sum}}\). Without loss of generality we can assume that all edge weights are positive: The number of edges following a linear order is fixed, so adding a sufficiently large offset to all weights only adds a fixed value to any solution without changing the core problem. This allows us to characterize LOP as finding an acyclic subgraph \(G'\) with maximum weight, because the non-negativity of the weights always forces either \((v_i, v_j)\) or \((v_j, v_i)\) to be in \(G'\) for every pair of vertices \(v_i, v_j \in V\).
The proof consists of two parts. First, we show how to transform G into a string S. Second, we show that G has a LOP solution of weight k if and only if S has a LRS of size
$$\begin{aligned} f_G(k) := (n-1) \cdot M + \frac{n(n-1)(n-2)}{3} \cdot M' + n(n-1) \cdot w_{\text {sum}} + 2k \end{aligned}$$
(1)
with \(M':= 4n^2 \cdot w_{\text {sum}}\) and \(M := M' \cdot n^3\).
For the transformation, we define \(\Sigma\) using three different types of characters:
-
1
Separators \(\$_{i}\) for every vertex \(v_i \in V\).
-
2
Edge signs \(E_{\{i,j\}}\) for every pair \(v_i, v_j \in V\). Note that \(E_{\{i,j\}} = E_{\{j,i\}}\).
-
3
Triangle signs \(\Delta _{(i,j,k)}\) for every triangle in G. Note that triangles between three vertices have an orientation and can be rotated. Therefore \(\Delta _{(i,j,k)} = \Delta _{(j,k,i)} = \Delta _{(k,i,j)} \ne \Delta _{(i,k,j)} = \Delta _{(k,j,i)} = \Delta _{(j,i,k)}\).
On the highest level the string S is constructed as shown in Equation 2. It consists of one large block per vertex, each of them separated by a run of the associated separation sign of length M.
$$\begin{aligned} S = \underbrace{\overbrace{\text {[EB]}_{1,2}}^{\begin{array}{c} \text {edge block} \\ \text {for }(v_1,v_2) \end{array}} \text {[EB]}_{1,3} \ldots \text {[EB]}_{1,n}}_{\text {vertex block for }v_1} \$_{1}^M \text {[EB]}_{2,1} \ldots \text {[EB]}_{2,n} \$_{2}^M \ldots \$_{n-1}^M \text {[EB]}_{n,1} \ldots \text {[EB]}_{n,n-1} \end{aligned}$$
(2)
Each vertex block consists of a series of edge blocks (EB), which we define as follows:
$$\begin{aligned} \text {[EB]}_{i,j} = E_{\{i,j\}}^{w_{ij}+w_{\text {sum}}} \quad \Delta _{(i,j,1)}^{M'} \ldots \Delta _{(i,j,n)}^{M'} \quad E_{\{i,j\}}^{w_{ij}+w_{\text {sum}}} \end{aligned}$$
(3)
In the same way as the i-th vertex block is associated with vertex \(v_i\), the edge substrings in it are associated with the outgoing edges of \(v_i\). Note that there is one EB missing in every vertex block, as self-loops are not allowed. Finally, \(\text {[EB]}_{i,j}\) contains all triangle signs for triangles, in which \((v_i, v_j)\) occurs, i.e., \(\{\Delta _{(i,j,k)} \mid 1 \le k \le n, k \ne i, k \ne j\}\), which, for the sake of notation, is written as \(\Delta _{(i,j,1)}^{M'} \ldots \Delta _{(i,j,n)}^{M'}\) in Eq. 3. The triangle signs are padded by edge signs for \((v_i, v_j)\). Every edge sign \(E_{\{i,j\}}\) occurs only in the two edge blocks \(\text {[EB]}_{i,j}\) and \(\text {[EB]}_{j,i}\). The length of the edge sign runs depends on the weight of the corresponding edge (in either direction), rewarding the higher weighted edge. We also add \(w_{\text {sum}}\) to the length of every edge sign run \(E_{\{i,j\}}\).
As for the numbers M and \(M'\), the latter is chosen to be larger than the combined length of all edge sign runs. This makes a single triangle sign run more profitable than any selection of edge sign runs. In the same manner, M is chosen to be larger than all triangle sign runs combined.
Using this construction, a valid solution \(G'=(V,E')\) for a dLOP instance (G, k), i.e., an acyclic subgraph of G with total weight of at least k, can be transformed into a valid solution for a dLRS instance \((S, f_G(k))\). First, all separation runs are selected, yielding a total length of \((n-1) \cdot M\). Second, for every edge in \(E'\), all edge signs in the corresponding edge blocks are selected. Since \(|E' | = \frac{n(n-1)}{2}\), this adds at least \(2 \cdot \left( \frac{n(n-1)}{2} \cdot w_{\text {sum}} +k \right)\) characters to the solution. Finally, \(G'\) is acyclic, so for every triangle in G, there is at least one edge missing in \(G'\). Thus, by construction of S, one run can be selected for every triangle sign without interfering with the edge sign runs, adding the missing \(\frac{n(n-1)(n-2)}{3} \cdot M'\) characters.
Given a solution \(S'\) for the dLRS instance \((S, f_G(k))\), we show how to obtain a subgraph \(G'\) of total weight at least k for the original dLOP instance. The subsequence \(S'\) must contain all separation runs and a run for every triangle sign, because without all separation and triangle signs selected at some place, it is (by choice of M and \(M'\)) impossible to reach length \(f_G(k)\) for any k. Therefore every selected edge sign run belongs to a single edge block of a solution of dLRS. The idea is that the choice of selecting \(E_{\{i,j\}}\) either in \(\text {[EB]}_{i,j}\) or \(\text {[EB]}_{j,i}\) corresponds to the choice of having either (i, j) or (j, i) in the DAG \(G'\) for the original LOP. Since we added \(w_{\text {sum}}\) to the length of every edge sign run and there are only \(\frac{n(n-1)}{2}\) edge signs in total (with n being the number of vertices in G), \(S'\) must contain both runs inside an edge block, in order to reach length \(n(n-1) \cdot w_{\text {sum}}\) (the third summand in \(f_G(k)\)). Thus, either edge signs or triangle signs may be selected inside an edge block, but not both. \(G'\) is finally obtained by selecting an edge e if and only if the edge sign runs in the corresponding edge block are selected. This yields \(\frac{n(n-1)}{2}\) edges with a total weight of at least k. For every vertex pair \(v_i, v_j\), exactly one of the edges \((v_i,v_j)\) and \((v_j,v_i)\) is selected, because their corresponding edge blocks share the same edge sign.
It remains to be shown that the obtained subgraph \(G'\) is acyclic. We can directly conclude that \(G'\) contains no triangles, since every triangle sign \(\Delta _{(i,j,k)}\) has to be taken, prohibiting either (i, j), (j, k) or (k, i) (or two of them) to be part of \(G'\). Assume that \(G'\) contains a cycle \(v_{i_1}, v_{i_2}, v_{i_3}, \ldots , v_{i_{l}}, v_{i_1}\) of length \(l \ge 4\). Then, either \((v_{i_1},v_{i_3})\) or \((v_{i_3},v_{i_1})\) must be in \(G'\). The latter would lead to a triangle, which we could already exclude from \(G'\). But \((v_{i_1},v_{i_3}) \in G'\) implies that a circle of length \(l-1\) also exists in \(G'\). Repeated use of this argument implies that \(G'\) also has a cycle with length 3, which is a contradiction to triangles being excluded. Thus, \(G'\) cannot contain a cycle of length 4 or greater and must be acyclic.
In summary, the decision problem whether there is a solution for a dLOP instance (G, k) can be reduced to the decision problem whether a solution for the dLRS instance \((S, f_G(k))\) obtained from G exists. \(\square\)