Actually, all definitions and properties for the DCJ distance of balanced genomes presented from the beginning to here work properly for the general case, where genomes can be multichromosomal. However, as we will see in this section, to solve the DCJ distance problem we use an intermediate procedure whose inputs are strings. For this reason we restricted our inputs to *unichromosomal* genomes. Moreover, for the moment we will additionally consider only *linear* unichromosomal genomes, discussing later how to deal with *circular* unichromosomal genomes. The extension to multichromosomal genomes is left as an open problem.

### Approximating the DCJ distance by cycles of length 2

As mentioned above, given two linear unichromosomal balanced genomes *A* and *B*, we have to find a consistent decomposition of \(\textit{AG}(A,B)\) to compute the DCJ distance according to Theorem 1. Recall that this is an NP-hard problem [4].

Given a consistent decomposition \(D \in \mathcal {D}\) of the adjacency graph \(\textit{AG}(A,B)\), we can see that

$$\begin{aligned} d_{D} = n - c_D = n - c_2 - c_>, \end{aligned}$$

where \(n = |adj (A)|= |adj (B)|\), \(c_2\) is the number of cycles of length 2, and \(c_>\) is the number of cycles of length longer than 2 in *D*.

Building a consistent decomposition by maximizing \(c_2\) as a first step is itself an NP-hard problem [12]. Furthermore, this strategy is not able to optimally solve the DCJ distance, as we can see in Fig. 2. Nevertheless, it allows us to approximate the DCJ distance:

###
**Lemma 2**

*A consistent decomposition*
\(D'\)
* of*
\(\textit{AG}(A,B)\)
* containing the maximum number of cycles of length 2 is a 2-approximation for the* DCJ-distance
* problem.*

###
*Proof*

Let \(c^*_2\) and \(c^*_>\) be the number of cycles of length 2 and longer than 2, respectively, of an optimal consistent decomposition \(D^*\) of \(\textit{AG}(A,B)\). Let \(c'_2\) and \(c'_>\) be the numbers analogous to \(c^*_2\) and \(c^*_>\) with respect to the decomposition \(D'\). It it easy to see that \(c^*_2 + 2c^*_> \le n\), thus

$$\begin{aligned} 0&\le n - c^*_2 - 2c^*_> \nonumber \\ n- c^*_2&\le n - c^*_2 - 2c^*_> + n - c^*_2 \nonumber \\ n - c^*_2&\le 2(n - c^*_2 - c^*_>). \end{aligned}$$

(1)

Therefore, we have

$$\begin{aligned} \frac{d_{D'}}{d_{D^*}}&= \frac{n - c'_2 - c'_>}{n - c^*_2 - c^*_>}\nonumber \\&\le \frac{n - c^*_2 - c'_>}{n - c^*_2 - c^*_>} \end{aligned}$$

(2)

$$\begin{aligned}&\le \frac{n - c^*_2}{n - c^*_2 - c^*_>}\nonumber \\&\le \frac{2 (n - c^*_2 - c^*_>)}{n - c^*_2 - c^*_>}\end{aligned}$$

(3)

$$\begin{aligned}&= 2, \end{aligned}$$

(4)

where (2) holds since \(c'_2 \ge c^*_2\), and (3) is true from (1). \(\square \)

### Minimum common string partition

The main result of this work relies on a restricted version of the minimum common string partition (mcsp) problem [6, 9], described briefly as follows.

Given a string *s*, a *partition* of *s* is a sequence \(\mathcal {S} = [\mathcal {S}_1, \mathcal {S}_2, \ldots , \mathcal {S}_m]\) of substrings called *blocks* whose concatenation is *s*, i.e., \(\mathcal {S}_1\mathcal {S}_2 \cdots \mathcal {S}_m = s\), and *m* is the *size* of \(\mathcal {S}\).

Two strings *s* and *t* are *balanced* if any character has the same number of occurrences in *s* and in *t*, disregarding signs. Given two balanced strings *s* and *t* and partitions \(\mathcal {S} = [\mathcal {S}_1, \ldots , \mathcal {S}_m]\) of *s* and \(\mathcal {T} = [\mathcal {T}_1, \ldots , \mathcal {T}_m]\) of *t*, the pair \((\mathcal {S}, \mathcal {T})\) is a *common partition* of *s* and *t* if there exists a permutation *f* on \(\{1, \ldots , m\}\) such that \(\mathcal {S}_i = \mathcal {T}_{f(i)}\) for each \(i = 1, \ldots , m\). The minimum common string partition problem (mcsp) is to find a common partition \((\mathcal {S},\mathcal {T})\) of two balanced strings *s* and *t* with minimum size.

We are interested in a restricted version of mcsp:

###
**Problem**

*k*
-mcsp(*s*, *t*): Given two balanced strings *s* and *t* such that the number of occurrences of any character in *s* and *t* is bounded by *k*, find a common partition \((\mathcal {S},\mathcal {T})\) of *s* and *t* with minimum size.

Now let \(\textit{occ}(A) = \max _{g \in \mathcal {G}(A)} \{m_A(g)\}\) be the maximum number of occurrences of any gene in a genome *A*. If two genomes *A* and *B* are balanced, we have \(\textit{occ}(A) = \textit{occ}(B)\). For simplicity, in this case we use only \(\textit{occ}\).

For a given linear unichromosomal genome *A*, let the *index-free* string \(\widehat{A}\) be the gene sequence of the chromosome of *A* ignoring telomeres and gene indices. For example, for genome \(A = (\circ \;c_1\;\overline{a}_1\;d_1\;b_1\;c_2\;c_3\;\circ )\), we have \(\widehat{A}= c\overline{a}dbcc\).

### Finding consistent decompositions

In this section we present a linear time approximation algorithm Consistent-Decomposition, which receives two linear unichromosomal balanced genomes *A* and *B* with \(\textit{occ}= k\) and returns a consistent decomposition of \(\textit{AG}(A,B)\), which is an *O*(*k*)-approximation for the DCJ distance. The main steps of Consistent-Decomposition can be briefly described as follows.

First, from the input genomes *A* and *B*, we build their adjacency graph \(\textit{AG}(A,B)\). We can then find a consistent decomposition by computing an approximation for *k*
-mcsp(\(\widehat{A},\widehat{B}\)), which gives an approximation for the number of breakpoints between genomes *A* and *B*. After that we remove the chosen cycles of length 2 from \(\textit{AG}(A,B)\). Following, we iteratively collect arbitrary cycles of length longer than 2, removing them from the remaining graph after each iteration. Finally, we return the set of collected cycles as a consistent decomposition of \(\textit{AG}(A,B)\). Pseudocode of Consistent-Decomposition is given in Algorithm 1. The individual steps are detailed in the following.

Step 1 of Consistent-Decomposition consists of building the adjacency graph of the given balanced genomes *A* and *B* as described previously. After that, Step 2 collects cycles of length 2 of \(\textit{AG}(A,B)\) using an *O*(*k*)-approximation algorithm for *k*-mcsp(\(\widehat{A},\widehat{B}\)) [9]. Step 3 removes from \(\textit{AG}(A,B)\) vertices covered by cycles in \(\mathcal {C}_2\) and edges incompatible with edges of cycles in \(\mathcal {C}_2\).

Step 4 constructs the set \(\mathcal {C}_>\) by decomposing the remaining graph into consistent cycles. Iteratively, it chooses a consistent cycle *C* and then removes from the remaining graph vertices covered by *C*. To find *C*, it can start with an empty path, choose some edge *e* from the remaining graph that extends the path and then remove from the remaining graph edges incompatible with *e* (just inspecting edges incident to vertices which are adjacent to *e* and to its sibling), repeating both edge selection and removal steps until the cycle is closed (it is easy to verify that this procedure will always close a consistent cycle). Hence the algorithm does not form an inconsistent cycle nor choose an inconsistent set of cycles. Further, this guarantees that for every edge in the decomposition, its sibling edge will also be in the decomposition. Note that \(\mathcal {C}_>\) may contain cycles of length 2 not collected in \(\mathcal {C}_2\).

A consistent decomposition of \(\textit{AG}(A,B)\) is then the set \(\mathcal {C}_2 \cup \mathcal {C}_>\), which is returned in Step 5.

To conclude this section, we present the following result which, together with the *O*(*k*) approximation algorithm for *k*
-mcsp from [9], establishes an approximation factor for DCJ-distance.

###
**Theorem 3**

*Let A and B be linear unichromosomal balanced genomes such that*
\(\textit{occ}= k.\)
* Let*
\((\mathcal {A}, \mathcal {B})\)
* be a common string partition with approximation factor *
*O*(*k*)* for*
*k*
-mcsp(\(\widehat{A},\widehat{B}\)). *A consistent decomposition D of*
\(\textit{AG}(A, B)\)
*, containing cycles of length 2 reflecting preserved adjacencies in*
\((\mathcal {A}, \mathcal {B})\)
*, is an O*(*k*)*-approximation for the* DCJ-distance
* problem.*

###
*Proof*

Let \(c_2^*\) and \(c_>^*\) be the number of cycles of length 2 and longer than 2, respectively, of an optimal consistent decomposition \(D^*\) of \(\textit{AG}(A, B)\). Let \(\mathcal {C}_2\) be the set of cycles of length 2 reflecting preserved adjacencies in \((\mathcal {A}, \mathcal {B})\), and let \(\mathcal {C}_>\) be an arbitrary consistent decomposition of cycles in \(\textit{AG}(A, B) \setminus \mathcal {C}_2\). Let \(D = \mathcal {C}_2 \cup \mathcal {C}_>\), a consistent decomposition, \(c_2 = |\mathcal {C}_2|\), and \(c_> = |\mathcal {C}_>|\). Since \((\mathcal {A}, \mathcal {B})\) is an *O*(*k*)-approximation of *k*
-mscp, it follows that \(n - c_2 \le \ell (n - c_2')\), where \(\ell = O(k)\) and \(c_2'\) is the number of cycles of length 2 in a consistent decomposition \(D'\) with maximum number of cycles of length 2. Hence,

$$\begin{aligned} \frac{d_{D}}{d_{D^*}}&=\frac{n - c_2 - c_>}{n - c^*_2 - c_>^*}\nonumber \\&\le \frac{\ell \,(n - c'_2) - c_>}{n - c^*_2 - c_>^*}\nonumber \\&\le \frac{\ell \,(n - c'_2)}{n - c^*_2 - c_>^*}\nonumber \\&\le 2\ell \left( \frac{n - c_2' - c_>'}{n - c^*_2 - c_>^*}\right) \end{aligned}$$

(5)

$$\begin{aligned}&\le 4\ell , \end{aligned}$$

(6)

where (5) is analogous to (1) and (6) holds from (4), both in the proof of Lemma 2. \(\square \)

### Running time

Prior to addressing the running time of Consistent-Decomposition, we must consider one implicit but important step in the algorithm, which is to obtain the set \(\mathcal {C}_2\) given the output of the *k*
-mcsp approximation algorithm [9]. This algorithm takes as input \(\widehat{A}\) and \(\widehat{B}\) and outputs a common string partition \((\mathcal {A}, \mathcal {B})\).

Both \(\mathcal {A}\) and \(\mathcal {B}\) are composed of the same set of substrings, in different orders and possibly reversed, e.g., \(\mathcal {A} = [\overline{ba}, a, ab]\) and \(\mathcal {B} = [ab, ab, a]\) for index-free strings \(\widehat{A}= \overline{ba}aab\) and \(\widehat{B}= ababa\). Each substring of length \(l > 1\) in \(\mathcal {A}\) and \(\mathcal {B}\) induces a sequence of \(l - 1\) preserved adjacencies in \(\widehat{A}\) and \(\widehat{B}\). Then we just have to map each substring in \(\mathcal {A}\) to the same substring in \(\mathcal {B}\) (in case of multiple occurrences, we choose any of them). Considering \(\mathcal {A}\) and \(\mathcal {B}\) in the example above, *ab* and \(\overline{ba}\) in \(\mathcal {A}\) could be mapped to the first and second occurrences of *ab* in \(\mathcal {B}\), respectively, since both *ab* and \(\overline{ba}\) contain exactly the same preserved adjacency \(a^hb^t\). We describe carefully in the next paragraphs the algorithm Substring-mapping (Algorithm 2) and how to use it to find such mapping while preserving the linear time complexity of Consistent-Decomposition.

The nontriviality of finding such mapping in linear time comes from the fact that alphabets of strings representing genomes are not constant size alphabets. They can and most likely will be of size *O*(*n*).

Before describing the algorithm, some observations and preprocessing must be addressed. We assume that the value *v*(*g*) of each symbol (gene family) *g* in the alphabet \(\mathcal {G}\) is unique and in the range [1, *n*]. For reversed symbols we define \(v(\overline{g}) = v(g) + n\), therefore their values will be in the range \([n+1,2n]\). Given different strings \(s = s_1,\ldots ,s_\ell \) and \(t = t_1,\ldots ,t_\ell \) of the same length \(\ell \) such that *i* is the first position in which they differ, *s* is *lexicographically smaller* than *t* if \(v(s_i) < v(t_i)\). (Note that \(v(g) < v(\overline{g}\)), therefore *g* comes before \(\overline{g}\) lexicographically for any symbol *g*.)

As preprocessing, we first create normalized versions \(\widetilde{\mathcal {A}}\) of \(\mathcal {A}\) and \(\widetilde{\mathcal {B}}\) of \(\mathcal {B}\), to ensure that for any substring *s*, only *s* or only its reverse \(\overline{s}\) occurs in \(\widetilde{\mathcal {A}} \cup \widetilde{\mathcal {B}}\). Therefore, for each string *s* in \(\mathcal {A}\) (respectively \(\mathcal {B}\)), the normalized partition \(\widetilde{\mathcal {A}}\) (respectively \(\widetilde{\mathcal {B}}\)) contains *s* itself, if *s* is lexicographically smaller than \(\overline{s}\), and \(\overline{s}\) otherwise. For instance, normalizing \(\mathcal {A} = [\overline{ba}, a, ab]\) would change it to \(\widetilde{\mathcal {A}} = [ab, a, ab]\). Also as a preprocessing step, given that we must find the same substrings in \(\mathcal {A}\) and \(\mathcal {B}\), it only makes sense to analyze substrings in both sets of the same length. Then, if there are substrings of multiple lengths in \(\widetilde{\mathcal {A}}\) and \(\widetilde{\mathcal {B}}\), in one pass through them (i.e. linear time) we can gather substrings of same length in buckets. Therefore, we define multisets \(\widetilde{\mathcal {A}}_l = \{s\text { in }\widetilde{\mathcal {A}} : |s| = l\}\) (analogously \(\widetilde{\mathcal {B}}_l\)) and the generic bucket (multiset) \(\widetilde{\mathcal {AB}}_l = \widetilde{\mathcal {A}}_l \cup \widetilde{\mathcal {B}}_l\) (also recording in some manner whether a string in \(\widetilde{\mathcal {AB}}_l\) comes from \(\mathcal {A}\) or \(\mathcal {B}\)), running the algorithm Substring-mapping for each bucket \(\widetilde{\mathcal {AB}}_l\). See Fig. 3 for an example of this preprocessing step.

The main idea of the algorithm Substring-mapping is, given a set of strings of length *l*, to obtain a set of buckets for some value of *i* (from 1 to *l*), each one containing strings which are found to be equal to the *i*th symbol, by splitting buckets for which strings are equal to the \((i-1)\)st symbol. At the end, each bucket holds equal strings and we just have to map them taking into account their origin, \(\mathcal {A}\) or \(\mathcal {B}\). See an example in Fig. 4. Of course, instead of working with the substrings themselves we work just with references.

We shall demonstrate in the following lemma that this implicit mapping step can be performed in *O*(*n*) time:

###
**Lemma 4**

*The running time of *Substring-mapping
* is proportional to the sum of lengths of strings in*
\(\widetilde{\mathcal {AB}}_l\),* for some *
*l*.

###
*Proof*

Operations in lines 5, 7 and 8 can be done in constant time and are performed at most once per symbol of strings in \(\widetilde{\mathcal {AB}}_l\). Operations in line 9 are performed *O*(1) times for each string in \(\widetilde{\mathcal {AB}}_l\). Therefore, the total running time of Substring-mapping is \(O(\sum _{s \in \widetilde{\mathcal {AB}}_l} |s|)\). \(\square \)

Since the buckets \(\widetilde{\mathcal {AB}}_l\) are disjoint, we also have:

###
**Lemma 5**

*The set *
\(\mathcal {C}_2\)
* can be obtained from the output of the*
*k*
-mcsp
* approximation algorithm in*
*O*(*n*)* time.*

###
*Proof*

Let \(\widetilde{\mathcal {S}} = \{\widetilde{\mathcal {AB}}_l {:}\) there exists at least one substring of length *l* in \(\widetilde{\mathcal {A}}\) (and therefore also in \(\widetilde{\mathcal {B}}\))\(\}\). To obtain \(\mathcal {C}_2\), we must call Substring-mapping for each \(\widetilde{\mathcal {AB}}_l \in \widetilde{\mathcal {S}}\), as noted before. The time complexity is the sum of time spent in all calls plus some extra preprocessing time. It is easy to see that \(\widetilde{\mathcal {S}}\) can be obtained in one pass through \(\widetilde{\mathcal {A}}\) and \(\widetilde{\mathcal {B}}\), therefore in linear time. The array of buckets \(w_{1.. 2n}\) can be defined in linear time once before calling Substring-mapping the first time and the buckets are empty at the end of each call. Finally, by Lemma 4 the running time of Substring-mapping for some \(\widetilde{\mathcal {AB}}_l\) is linear in the sum of lengths of strings in \(\widetilde{\mathcal {AB}}_l\), and the total sum of the lengths of strings in buckets \(\widetilde{\mathcal {AB}}_l \in \widetilde{\mathcal {S}}\) is 2*n* (each substring of \(\widetilde{\mathcal {A}}\) or \(\widetilde{\mathcal {B}}\) appears once in exactly one \(\widetilde{\mathcal {AB}}_l\)). Hence, the total time complexity is *O*(*n*). \(\square \)

Having the running time of the implicit step of obtaining \(\mathcal {C}_2\) by the output of the *k*
-mcsp approximation algorithm, we can now analyze the complexity of Consistent-Decomposition.

###
**Theorem 6**

*Given linear unichromosomal balanced genomes*
*A*
* and*
*B*
* such that *
\(|A| = |B| = n\)
* and*
\(\textit{occ}= k\),* the running time of algorithm* Consistent-Decomposition
* is linear in the size of the genomes, i.e.,*
*O*(*n*).

###
*Proof*

First, note that \(\textit{AG}(A,B)\) is a bipartite graph composed of \(2 (n + 1)\) vertices and at most \(2kn + 4\) edges. This worst case occurs if there are \(\lfloor {}n/k\rfloor \) gene families of size *k*, yielding \(2k^2\) edges each (\(k^2\) for the gene heads and \(k^2\) for the gene tails), thus 2*kn* edges in total; plus 4 edges from the capping. Therefore, assuming *k* is a constant, \(\textit{AG}(A,B)\) is of size *O*(*n*).

It is easy to see that Step 1 of Algorithm 1 has linear running time with respect to the size of \(\textit{AG}(A,B)\), i.e. *O*(*n*). Computing the *k*
-mcsp approximation [9] in Step 2 (with suffix trees for integer alphabets [13]) takes *O*(*n*) time. The same holds for the implicit step described above. The running time of Step 3 is *O*(*n*) since we have just to traverse vertices and edges of the remaining adjacency graph. Step 4 consists of collecting cycles arbitrarily and its running time is also linear, since we just have to walk in the remaining graph finding cycles and this can be done looking at each edge and each vertex at most *O*(1) times. The last step (Step 5) has running time *O*(1). Therefore, Consistent-Decomposition has running time *O*(*n*). \(\square \)

### Extending to circular unichromosomal genomes

Meidanis et al. [14] showed that the problem of calculating the reversal distance for signed circular chromosomes without duplicate genes is essentially equivalent to the analogous problem for linear chromosomes (similar for transpositions in the unsigned case [15]). Therefore, any algorithm for the latter works for the former. The main idea is that each reversal on some region of a circular chromosome can be performed in two ways: reversing it directly or reversing everything else (Fig. 5).

In the following we show that similar ideas can also be applied to genomes with duplicate genes.

Let *A* and *B* be circular unichromosomal balanced genomes such that \(\textit{occ}= k\). For some gene family *g*, there are in both *A* and *B* genes \(g_1, g_2, \ldots , g_l\) with \(l \le k\). Gene \(g_1\) of *A* can be associated with *l* genes of *B*. We linearize *A* having \(g_1\) with positive sign in the first position and linearize *B*
*l* times, each one of them having one of the genes \(g_1, g_2, \ldots , g_l\) with positive sign in the first position, associating it with \(g_1\) (and assuming that both already are in the correct position). Next, we run Consistent-Decomposition on each one of the *l* linearizations, taking into account only the sequence of genes from position 2 to position *n*, keeping the best result. Thus, the running time of this strategy is \(l \cdot O(n)\), that is, *O*(*n*) since \(l \le k = \text {const}\).

###
**Corollary 7**

*For circular unichromosomal genomes *
*A*
* and*
*B*,* the strategy of keeping the minimum output of* Consistent-Decomposition
* for one linearization of *
*A*
* and*
*l*
* linearizations of *
*B*
* as described above leads to an *
*O*(*k*)*-approximation for problem *DCJ-distance.

###
*Proof*

Let *d* be the DCJ distance between *A* and *B* and let \(g_c\) be the copy of gene *g* in *B* associated to \(g_1\) in *A* of the correct gene association to obtain *d*. One of the *l* linearizations of *B* associates \(g_c\) in *B* with \(g_1\) in *A*, approximating *d* with an *O*(*k*) factor by the Consistent-Decomposition algorithm. Clearly, the minimum output of all *l* linearizations will not be higher. \(\square \)

### Experimental results

We have implemented our approximation algorithm in C++, with the addition of a linear time greedy heuristic for the decomposition of cycles not induced by the *k*
-mcsp approximation (available at https://git.facom.ufms.br/diego/k-dcj).

We compare our algorithm with Shao et al.’s ILP [4] (GREDU software package) on simulated datasets. Given two genomes, the ILP based experiments first build the adjacency graph, followed by capping of the telomeres, fixing some safe cycles of length two, and finally invoking an ILP solver to obtain an optimal solution with a time limit of 2 h. The experiments for both approaches were performed on an Intel i7 3.4GHz (4 cores) machine.

Following [4], we simulate artificial genomes with segmental duplications and DCJs. We uniformly select a position to start duplicating a segment of the genome and place the new copy to a new position. From a genome of *s* distinct genes, we generate an ancestor genome with 1.5*s* genes by randomly performing *s*/2*l* segmental duplications of length *l*, resulting in an average \(k = 1.5\). Then we simulate two extant genomes from the ancestor by randomly performing *r* DCJs (reversals) independently. Thus, the simulated evolutionary distance between the two extant genomes is 2*r*. For each gene copy in the extant genomes we keep track of which gene copy in the ancestor it corresponds to, establishing the *reference bijection*, allowing us to compute the *true positive rate*, that is, for two genomes *A* and *B*, the rate of matchings of gene occurrences in *A* and *B* corresponding to the same gene occurrence in the ancestor genome.

We first set \(s = 1000\), test three different lengths for segmental duplications (\(l = 1, 2, 5\)) and vary the *r* value over the range \(200, 220, \ldots , 500\). We also simulate genomes having \(s = 5000\), \(l = 1, 2, 5, 10\) and *r* over the range \(1000, 1100, \ldots , 2000\). Figures 6 and 9 show the average difference “*computed number of DCJs minus simulated evolutionary distance*”, taking as input three pairs of genomes for each combination of *l* and *r*, Figs. 7 and 10 show the *true positive rate*, while Figs. 8 and 11 show the average running times. Note that, although the DCJ distance is unknown, it is always less than or equal to the simulated evolutionary distance for these artificial genome pairs.

The difference of the number of DCJs (blue lines in Figs. 6, 9) calculated by our approximation algorithm remains very close to the simulated evolutionary distance for small values of *l*. Moreover, it remains roughly the same for the same value of *l* even for greater values of *r*. The values obtained by the ILP approach (red lines in Figs. 6, 9) are very close to those obtained by the approximation algorithm and to the simulated evolutionary distance from the simulations for \(l \le 2\) and smaller values of *r*. However, beyond some point the DCJ distance calculated by the ILP gets even lower than the simulated evolutionary distance, showing the limitations of parsimony for larger distance ranges.

While the true positive rate is higher than 95% for most of datasets (Figs. 7, 10), the rate remains between 75 and 85% when \(l \ge 5\) for the approximation approach and even for the ILP approach in some cases. For \(s = 5000\) and \(l \ge 5\), the computed number of DCJs increases while the true positive rate decreases significantly beyond some point for the ILP results. Notice that the approximation algorithm results for the same sets have small rates of increase or decrease, even for greater values of *r*.

The running time of our implementation of Consistent-Decomposition increases slowly from \(\approx \)0.03 s (\(2r = 400\)) to \(\approx \)0.08 s (\(2r = 1000\)) on average, when \(s = 1000\), see Fig. 8a. The ILP approach takes \(\approx \)0.3 s for smaller values of *r* (where the preprocessing step fixes a considerable amount of cycles of length 2 in the adjacency graph), while always reaching the time limit of 2 h beyond some point, see Fig. 8b. A similar behavior is observed for \(s = 5000\) (Fig. 11).