Efficient edit distance with duplications and contractions

We propose three algorithms for string edit distance with duplications and contractions. These include an efficient general algorithm and two improvements which apply under certain constraints on the cost function. The new algorithms solve a more general problem variant and obtain better time complexities with respect to previous algorithms. Our general algorithm is based on min-plus multiplication of square matrices and has time and space complexities of O (|Σ|MP (n)) and O (|Σ|n2), respectively, where |Σ| is the alphabet size, n is the length of the strings, and MP (n) is the time bound for the computation of min-plus matrix multiplication of two n × n matrices (currently, MP(n)=On3log3lognlog2n due to an algorithm by Chan). For integer cost functions, the running time is further improved to O|Σ|n3log2n. In addition, this variant of the algorithm is online, in the sense that the input strings may be given letter by letter, and its time complexity bounds the processing time of the first n given letters. This acceleration is based on our efficient matrix-vector min-plus multiplication algorithm, intended for matrices and vectors for which differences between adjacent entries are from a finite integer interval D. Choosing a constant 1log|D|n<λ<1, the algorithm preprocesses an n × n matrix in On2+λ|D| time and On2+λ|D|λ2log|D|2n space. Then, it may multiply the matrix with any given n-length vector in On2λ2log|D|2n time. Under some discreteness assumptions, this matrix-vector min-plus multiplication algorithm applies to several problems from the domains of context-free grammar parsing and RNA folding and, in particular, implies the asymptotically fastest On3log2n time algorithm for single-strand RNA folding with discrete cost functions. Finally, assuming a different constraint on the cost function, we present another version of the algorithm that exploits the run-length encoding of the strings and runs in O|Σ|nMP(ñ)ñ time and O(|Σ|nñ) space, where ñ is the length of the run-length encoding of the strings.

. In addition, this variant of the algorithm is online, in the sense that the input strings may be given letter by letter, and its time complexity bounds the processing time of the first n given letters. This acceleration is based on our efficient matrix-vector min-plus multiplication algorithm, intended for matrices and vectors for which differences between adjacent entries are from a finite integer interval D. Choosing a constant 1 log |D| n < λ < 1, the algorithm preprocesses an n × n matrix in O n 2+λ |D| time and O n 2+λ |D|λ 2 log 2 |D| n space. Then, it may multiply the matrix with any given n-length vector in O n 2 λ 2 log 2 |D| n time. Under some discreteness assumptions, this matrix-vector min-plus multiplication algorithm applies to several problems from the domains of context-free grammar parsing and RNA folding and, in particular, implies the asymptotically fastest O n 3 log 2 n time algorithm for single-strand RNA folding with discrete cost functions. Finally, assuming a different constraint on the cost function, we present another version of the algorithm that exploits the run-length encoding of the strings and runs in O | |nMP(ñ) n time and O(| |nñ) space, whereñ is the length of the run-length encoding of the strings.

Background
Comparing strings is a well-studied problem in computer science as well as in bioinformatics. Traditionally, string similarity is measured in terms of edit distance, which reflects the minimum-cost edit of one string to the other, based on the edit operations of substitutions (including matches) and deletions/insertions (indels). In this paper, we address the problem of string edit distance with the http://www.almob.org/content/8/1/27 individuals in a population. Therefore, pairwise comparisons of minisatellites are typically applied in studying the evolution of populations.
A minisatellite map represents a minisatellite region, where each motif is encoded by a letter and handled as one entity (denoted unit). When comparing minisatellite maps, one has to consider that regions of the map have arisen as a result of duplication events from the neighboring units. The single copy duplication model, where only one unit can duplicate at a time, is the most popular and its biological validation was asserted for the MSY1 minisatellites [1,2]. According to this model, one unit can mutate to another unit via an indel or a mutation of a single nucleotide within it. Also, a unit can be duplicated, that is, an additional copy of the unit may appear next to the original one in the map (tandem repeat). Thus, when comparing minisatellite maps, two additional operations are considered: unit duplication and unit contraction.

The EDDC problem definition
The single copy duplication model of minisatellite maps gave rise to a new variant of the string edit distance problem, Edit Distance with Duplications and Contractions (EDDC), which allows five edit operations: insertion, deletion, mutation, duplication and contraction.
We start with some string notations. Let s be a string. Denote by s i the i-th letter in s, starting at index 0, and by s i,j the substring s i s i+1 . . . s j−1 of s. A substring of the form s i,i is an empty string, which will be denoted by ε. We use superscripts to denote substrings without an explicit indication of their start and end positions, and write e.g. s = s a s b to indicate that s is a concatenation of the two substrings s a and s b .
In the edit distance problem, one is given a source string s and a target string t over a finite alphabet . An edit script from s to t is a series of strings ES = s = u 0 , u 1 , u 2 , . . . , u r = t , where each intermediate string u i is obtained by applying a single edit operation to the preceding string u i−1 . In the standard problem definition [3][4][5], the allowed edit operations are insertion of a letter at some position in an intermediate string u i , deletion of a letter in u i , and mutation of a letter in u i to another letter. The single-copy EDDC problem variant adds two operations: duplication -inserting into u i a letter in a position adjacent to a position that already contains the same letter, and contraction -deleting from u i one copy of a letter where there are two consecutive copies of this letter. Denote by ins(α), dup(α) and del(α) costs associated with the insertion, duplication and deletion operations applied to a letter α in the alphabet, respectively, by cont(α) the cost of contracting two consecutive occurrences of α into a single occurrence, and by mut(α, β) the cost of mutating α to a letter β. Define the cost of ES to be the summation of its implied operation costs, and the length |ES| = r of ES to be the number of operations performed in ES. Clearly, for every pair of strings s and t, there is some script transforming s to t, e.g. the script that first deletes all letters in s and then inserts all letters in t. An optimal edit script from s to t is one which has a minimum cost. The edit distance from s to t, denoted by ed(s, t), is the cost of an optimal edit script from s to t. The goal of the EDDC problem is, given strings s and t, to compute ed (s, t).
This assumption can be made, since in case one of the operation costs violates the assumption, then such an operation can always be replaced by a series of operations that would induce the same modification at lower cost. For example, it cannot be that mut(α, β) < ed(α, β), since ed(α, β) is smaller then or equal to the cost of any script from α to β, among which is the script containing the single mutation operation of α into β. Moreover, if mut(α, β) > ed(α, β), then we can always replace any mutation of α into β by a series of operations that transform α into β at cost ed (α, β). In this case, we may simply assume that mut(α, β) = ed(α, β), and interpret any such a mutation appearing in a script as being implem ented by the corresponding series of operations. In particular, Property 1 implies that mut(α, α) = 0 (since all operation costs are nonnegative, ed(w, w) = 0 for every string w), dup(α) ≤ ins(α) (since the cost of the script from α to αα that applies a single insertion of α is ins(α) ≥ ed(α, αα) = dup(α)), cont(α) ≤ del(α), and mut(α, β) ≤ mut(α, γ ) + mut(γ , β) for every γ ∈ .
Insertions and duplications are considered to be generating operations, increasing by one letter the length of the string. Similarly, deletions and contractions are considered to be reducing operations, decreasing by one letter the length of the string. An edit script containing no reducing operation is called a non-reducing script, and an edit script contain-http://www.almob.org/content/8 /1/27 ing no generating operation is called a non-generating script.

Previous work
The EDDC problem was first defined by Bérard and Rivals [2], who suggested an O(n 4 ) time and O(n 3 ) space algorithm for the problem, where n is the length of the two input strings (for the sake of simplicity, we assume that both strings are of the same length). This was followed by the work of Behzadi and Steyaert [6], who gave an O(| |n 3 ) time and O(| |n 2 ) space algorithm for the problem, where | | is the alphabet size (typically a few tens of unique units). Behzadi and Steyaert [7] improved their algorithms' complexity, based on runlength encoding, to O(n 2 + nñ 2 + | |ñ 3 + | | 2ñ2 ) time and O | |(n +ñ 2 ) + n 2 space, whereñ is the length of the run-length encoding of the input strings. Run-length encoding was also used by Bérard et al. [8], who proposed an O(n 3 + | |ñ 3 ) time and O(n 2 + | |ñ 2 ) space algorithm. Abouelhoda et al. [9] gave an algorithm with an alphabet size independent time and space complexities of O(n 2 + nñ 2 ) and O(n 2 ), respectively. A detailed comparison between the different problem models appears in Section "A comparison with previous works".

Our contribution
This paper presents several algorithms for EDDC which are currently the most general and efficient for the problem.
1. We give an algorithm for EDDC for general nonnegative cost functions that is based on min-plus square matrix multiplication. This algorithm is an adaptation of the framework of [10] (see also [11]). For two input strings over an alphabet and of length n each, the time and space complexities of this algorithm are O(| |MP(n)) and O(| |n 2 ), respectively, where MP(n) is the time complexity of a min-plus multiplication of two n × n matrices. Using the matrix multiplication algorithm of Chan [12], this algorithm runs in O | |n 3 log 3 log n log 2 n time (Section "A matrix multiplication based algorithm for EDDC"). Moreover, our algorithm applies less restrictions on the cost function with respect to previous algorithms and is currently the only algorithm that works for the most general problem settings (Section "A comparison with previous works"). 2. We describe a more efficient algorithm for EDDC when all operation costs are integers. This algorithm can also be applied in an online setting, where in each step a letter is added to one of the input strings. The time complexity of processing n letters in the input is O | |n 3 log 2 n , where the base of the log function is determined by the range of cost values (Section "An online algorithm for EDDC using min-plus matrix-vector multiplication for discrete cost functions"). In order to achieve this, we obtained the following stepping-stone results, which are of interest on their own.
(a) Let A be an n × m matrix for which differences between adjacent entries are within some finite integer interval D. time (Section "An efficient D -discrete min-plus matrix-vector multiplication algorithm"). The algorithm is an adaptation of Williams' matrix-vector multiplication algorithm over a finite semiring [13], with some notions similar to those in Frid and Gusfield's RNA folding algorithm [14].
(b) The manner in which the new matrix-vector multiplication algorithm is integrated into the EDDC algorithm can be generalized to algorithms for a family of related problems, denoted VMT problems [11], under certain discreteness assumptions. This family includes many important problems from the domains of RNA folding and CFG parsing. An example of such a problem is the single strand RNA folding problem [15] under discrete scoring schemes. Our new matrix-vector multiplication algorithm can be integrated into an algorithm for the latter problem to yield an O n 3 log 2 n time algorithm, improving the best previously known asymptotic time bound for the problem (see Section "Online VMT algorithms").
3. We extend our approach to exploit run-length encodings of the input strings, assuming some restrictions on the cost functions. This reduces the time and space complexities of the algorithm to O | |n 2 + | |nMP(ñ) n and O (| |nñ), respectively, whereñ is the length of the run-length encoding http://www.almob.org/content/8/1/27 of the input (Section "Additional acceleration using run-length encoding").
The rest of the paper is organized as follows. In Section "A baseline algorithm for the EDDC problem", a recursive computation for EDDC and its implementation using dynamic programming (DP) is presented. Section "A matrix multiplication based algorithm for EDDC" shows how to accelerate the algorithm by incorporating efficient min-plus matrix multiplication subroutines. In Section "An online algorithm for EDDC using min-plus matrix-vector multiplication for discrete cost functions", an efficient min-plus matrix-vector multiplication algorithm is described for matrices and vectors which differences between adjacent entries are taken from a finite integer interval. This algorithm can be used for obtaining an accelerated online version of the EDDC algorithm, as well as for improving time complexities of several related problems. Section "Additional acceleration using run-length encoding" describes a variant of the EDDC algorithm that exploits run-length encoding. Comparison between this and previous works is given in Section "A comparison with previous works", and Section "Conclusions and discussion" gives a concluding discussion. Additional proofs omitted from the main manuscript are given in the Appendix.

A baseline algorithm for the EDDC problem
In this section, we give a simple algorithm for the EDDC problem. We start by showing some recursive properties of the problem, and then formulate a straightforward dynamic programming implementation for the recursive computation.

The recurrence formula
Our recursive formulas resemble previous formulations [6,9], yet solve a slightly more general variant of the problem (see discussion in Section "A comparison with previous works"). Since the proof of correctness of these recursive formulas is similar to previous ones, we defer it to Appendix "Correctness of the recursive computation".
A (strict) partition of a string w of length at least 2 is a pair of strings (w a , w b ), such that w = w a w b and w a , w b = ε. Denote by P(w) the set of all partitions of w. For example, for w = abac, P(w) = {(a, bac), (ab, ac), (aba, c)}.
For a source string which is either empty or contains a single letter and a target string t, Equations 1 to 3 ( Figure 1) describe a recursive EDDC computation. This computation interleaves, in a mutually recursive manner, the computation of an additional special value ed (α, t), where ed (α, t) is defined to be the minimum cost of a non-reducing edit script from α to t that does not start with a mutation (t is required to contain at least two letters).
Symmetrically, Equations 4 to 6 give the recursive computation for a source string s and a target string which is either empty or contains a single letter. Here, ed (s, α) is defined as the minimum cost of a nongenerating edit script from s to α which does not end with a mutation (s is required to contain at least two letters). In case both source string s and target string t are of length at least 2, the following equation can be used for computing ed(s, t) (Figure 2(a)): For allowing efficient computation, Equation 7 can be replaced by Equations 8 and 9, which are computed in a mutually recursive manner to yield an equivalent computation (Figure 2   All base-cases of the above recursive equations are implied from Property 1 in a straightforward manner.

Theorem 1. EDDC is correctly solved by Equations 1-9.
The proof of Theorem 1 appears in Appendix "Correctness of the recursive computation".

A baseline dynamic-programming algorithm for EDDC
In this section, we describe a DP algorithm implementing the recursive EDDC computation given by Equations 1 to 9, which is the basis for improvements introduced later in this paper.
Let s and t be the input source and target strings, respectively, and for simplicity assume both strings are of length n. The algorithm maintains the following matrices for storing solutions to sub-instances of the input which occur along the recursive computation. All matrices are of size (n + 1) × (n + 1), with row and column indices in the range 0, 1, 2, . . . , n.
• For every α ∈ , the algorithm maintains matrices and T α [l, j] are used for storing the values ed s k,i , α , ed s k,i , α , ed α, t l,j , and ed α, t l,j , respectively. • Two matrices S ε and T ε , whose entries S ε [k, i] and T ε [l, j] are used for storing values of the forms ed s k,i , ε and ed ε, t l,j , respectively. • For every α ∈ , a matrix EDT α whose entries EDT α [i, j] are used for storing the values edt α s 0,i , t 0,j . • A matrix ED, whose entries ED [i, j] are used for storing the values ed s 0,i , t 0,j .
The algorithm consists of two stages: Stage 1 computes solutions to all sub-instances in which one of the substrings is either empty or single-lettered, applying Equations 1 to 6. Stage 2 uses the values computed in Stage 1 in order to compute all prefix-to-prefix solutions ed s 0,i , t 0,j and edt α s 0,i , t 0,j according to Equations 8 and 9. In particular, Stage 2 computes the edit distance ed s 0,n , t 0,n = ed(s, t) between the two complete strings. The entries are traversed in an order which guarantees that upon computing each entry, all solutions to subinstances appearing on the right-hand side of the relevant equation are already computed and contained in the corresponding entries. Algorithm 1 gives the pseudo-code for this computation.

Complexity analysis of Algorithm 1
The running time of Algorithm 1 is dictated by the total time required to compute all entries in the DP matrices.

A matrix multiplication based algorithm for EDDC
In previous work by the authors [11], Vector Multiplication Templates (VMTs) were identified as templates for computational problems sustaining certain properties, such that algorithms for solving these problems can be accelerated using efficient matrix multiplication subroutines (similarly to Valiant's algorithm for CFG recognition [10]). Intuitively, standard algorithms for VMT problems perform computations that can be expressed in terms of vector multiplications, and these computations can be computed and combined more efficiently using efficient matrix multiplications. In this section, we show that EDDC exhibits such VMT properties, and formulate a new algorithm that incorporates matrix-matrix min-plus multiplications. This algorithm yields a better running time than that of the baseline algorithm in the previous section.

Notations for matrices
For two integers p, q such that p ≤ q, I p,q denotes the interval of integers I p,q =[p, p + 1, . . . , q − 1]. We use the notation A n×m to imply that the matrix A has n rows and m columns, and say that A has the dimensions n×m (rows and column indices start at 0). For a subset of row indices I and a subset of column indices J, denote by I × J the region which contains all pairs of indices (i, j), such that i ∈ I and j ∈ J. Define A[I, J] to be the submatrix of A, which is induced by all entries in the region I × J. When I contains a single row i or J contains a single column j, we simplify the notation and write A [i, J] Denote the time complexity of a min-plus multiplication of two n×n matrices by MP(n). At present, the asymptotically fastest algorithm for min-plus square matrix multiplication is that of Chan [12], taking O n 3 log 3 log n log 2 n time.
In the following observation, we point out how matrix multiplication can be computed as a composition of two parts, where each of the items (1-3) in the observation addresses a partitioning in one of the three dimensions. This will be later used by our recursive computation which is based on such partitioning. Observation 1. Let A n×k , B k×m and C n×m be matrices, such that C = A ⊗ B (see Figure 3).

EDDC expressed via min-plus vector multiplications
The key observation that enables a further improvement of the worst-case bounds of EDDC is that Equations 3, 6, 8, and 9 can be expressed in terms of min-plus vector multiplications. Under the assumption that all solutions to sub-instances appearing on the right-hand side of the equations are computed and stored in the corresponding entries, these equations can be written as follows: http://www.almob.org/content/8/1/27 ed α, t l,j The algorithm The new algorithm has the same two stages as the baseline algorithm. It can be observed that the computation of all matrices of the forms S α , S α , S ε , T α , T α , and T ε Figure 4 The tree of recursive calls to COMPUTE-MATRIX. Each call over a region containing more than one entry partitions the region either vertically or horizontally, and performs two recursive calls over the two sub-regions. After the first call was performed (the left or top sub-region for vertical or horizontal partition, respectively), a matrix multiplication involving the computed region is computed in order to meet the precondition required for the computation of the sibling region. http://www.almob.org/content/8/1/27 performed in Stage 1 of the baseline algorithm adhere to the Inside-VMT requirements as given in Definition 1 in [11]. The application of the generic Inside-VMT algorithm [11] to this computation is immediate, and therefore we focus only on adapting the method to the computation of matrices of the form EDT α and ED conducted in Stage 2 of the baseline algorithm. After allocating all dynamic programming matrices and performing Stage 1 of the algorithm, the COMPUTE-MATRIX procedure is used for implementing Stage 2 (see Algorithm 2 and Figure 4). This is a divide-andconquer recursive procedure that accepts a region I × J and computes the values in all entries of ED and EDT α within the region. The procedure partitions the given region into two parts and performs recursive calls on each part. In order to maintain a required precondition, the procedure applies min-plus matrix multiplication subroutines between recursive calls. The correctness proof of Algorithm 2 appears in Appendix "Correctness of Algorithm 2". Algorithm 2: MATRIX-EDDC(s, t) 1 Let n be the length of s and t. Allocate (n + 1) × (n + 1) matrices S α , S α , T α , T α , and EDT α for every α ∈ , and three (n + 1) × (n + 1) matrices S ε , T ε , and ED. // Stage 1 2 Fill all matrices T ε , T α , T α , S ε , S α , and S α applying the generic Inside-VMT algorithm of [11], using Equations 1, 2, 10, 4, 5, and 11, correspondingly. // Stage 2 3 Initialize entries in all matrices EDT α and ED which correspond to base-case instances as describe in Algorithm 1.
This includes all entries in the first two rows and first two columns in these matrices.

Time complexity analysis
The time complexity of Algorithm 2 can be established by an identical analysis to that of the Inside-VMT algorithm of [11] (see Section 3.3.1 of [11]). For completeness, we repeat this analysis here for Stage 2 of the computation, where the time complexity of Stage 1 can be inferred similarly. The complexity is expressed as a function of the bound MP(n) over the running time of a min-plus multiplication of two n × n matrices. Note that MP(n) = (n 2 ), as the input and output matrices of the computation contain O(n 2 ) entries. We assume here that MP(n) = (n 2+δ ) for some constant 0 < δ ≤ 1, which is true for the current best bound over MP(n) [12].
In some of the expressions developed below, we avoid the "big O" notation and give explicit bounds over the number of operations, as constant factors that may be hidden due to this notation cannot be ignored in the analysis.
The initialization time of Stage 2 is dominated by the matrix multiplications and entry-wise min operations performed in line 4 of Algorithm 2. This initialization performs 2| | multiplications of matrices of dimensions (n − 1) × 1 with matrices of dimensions 1×(n−1), which can naively be implemented in O(| |n 2 ) time, and an entry-wise min operation over a set contain- The computation of the remaining entries in ED and EDT α matrices is done within recursive calls to the COMPUTE-MATRIX procedure. Observe that when COMPUTE-MATRIX is called over a region of dimensions r × r for some even integer r ≥ 2, the procedure applies a vertical partitioning and performs two recursive calls over regions of dimensions r × r 2 (lines 6 and 8). For a call over a region of dimensions r × r 2 , the procedure applies a horizontal partitioning and performs two recursive calls over regions of dimensions r 2 × r 2 (lines 11 and 13). For simplicity, assume that n − 1 = 2 p for some integer p ≥ 0, and thus it follows that the dimensions of all regions occurring as inputs in recursive calls are either 1 × 1, or of the form r × r or r × r 2 for some even integer r. Denote by T(x × y) an upper bound over the number of operations conducted when applying COMPUTE-MATRIX over a region of dimensions x × y.
From line 2 of the procedure, . Consider a region of dimensions r × r for an even integer r ≥ 2. For such a region, the code in lines 5-8 of COMPUTE-MATRIX is executed. In order to implement line 7 for some α ∈ , it is necessary to compute first a min-plus matrix multiplication , and the resulting matrix C is of dimensions r × r 2 . Due to Observation 1, it is possible to compute independently the upper and lower halves of The time required to conduct this computation is 2MP( r 2 ). Then, it is required to compute min EDT α [I i,k , I h,l ] , C and to update EDT α [I i,k , I h,l ] to be the result of this operation, a computation which requires at most cr 2 operations for some constant c. Since line 7 is computed for every α ∈ , the total number of applied operations due to this line is at most | | 2MP( r 2 ) + cr 2 . Besides line 7, two recursive calls are made in lines 6 and 8 over regions of dimensions r × r 2 , and therefore we get When the procedure is called over a region of dimensions r × r 2 , the code in lines 10-13 is executed. Similarly as above, it can be shown that the computation in line 12 requires at most | | MP( r 2 ) + cr 2 4 operations, and due to the two recursive calls in lines 11 and 13 over regions of dimensions r 2 × r 2 , we get Therefore, The explicit form of the above recursive equation can be established by the Master Theorem (under the assumption that MP(n) = (n 2+δ ), see Chapter 4 in [16]), yielding the expression T(r×r) = O(| |MP(r)). Thus, the time complexity of Stage 2 of the algorithm is O(| |MP(n)). The time analysis of the Inside-VMT algorithm of [11], applied to implement Stage 1 of the algorithm yields the same bound of O(| |MP(n)), and thus O(| |MP(n)) is the time complexity of the entire algorithm. Using the currently asymptotically fastest algorithm for min-plus matrix multiplication [12] MP(n) =

An online algorithm for EDDC using min-plus matrix-vector multiplication for discrete cost functions
In this section, we present an EDDC algorithm which is based on the general algorithm (given in Section "A matrix multiplication based algorithm for EDDC") and improves its time complexity by a factor of O(log 3 log n). http://www.almob.org/content/8/1/27 This EDDC algorithm is intended for integer cost functions, but can also be applied to rational cost functions after they are scaled. It is an online algorithm; it can process the input strings letter by letter with a guaranteed low time bound for any prefix of the input. The EDDC algorithm presented in this section is based on a Ddiscrete matrix-vector min-plus multiplication algorithm we developed, which is generic and may be applied to other problems as well.

D-discrete matrices and the EDDC problem with integer costs
Given a matrix of integers A n×m and indices 1 ≤ i < n and 0 ≤ j < n, call the pair of entries Consider the EDDC problem in the case of integer costs for all edit operations. In Lemma 1, we show that in this case, all matrix multiplications applied by Algorithm 2 are between D-discrete metrices, with respect to a certain integer interval D. This proof is similar to that of Masek and Paterson for simple edit distance [17]. This would allow conducting such matrix multiplications using a faster algorithm, described in Section "An efficient D-discrete min-plus matrix-vector multiplication algorithm".

Lemma 1. Given strings s and t and an integer cost function for EDDC, all matrix multiplications applied by
Algorithm 2 are over D-discrete matrices, where D = I a,b is determined according to the cost function by a = − max{del(α) | α ∈ } and b = max{ins(α) | α ∈ } + 1.
The proof of Lemma 1 appears in Appendix "Proofs to lemmas corresponding to the EDDC algorithm for discrete cost functions ''.

D-discrete matrices and vectors
Here, we present some properties of D-discrete matrices and vectors that are similar to those previously observed in [14,17]. The following lemmas show that the set of D-discrete matrices is closed under the min-plus multiplication and entry-wise min operations. In what follows, let D = I a,b be some integer interval. The proofs of the following lemmas appear in Appendix "Proofs to lemmas corresponding to the EDDC algorithm for discrete cost functions ''.

Lemma 3. Let X, Y and Z be matrices, such that X and Y contain only integer elements and Z
The following lemma implies that when the absolute difference between the first elements of two q-length D-discrete vectors x and y is sufficiently large, one of the vectors can be immediately taken as the result of the min(x, y) operation.
In what follows, fix an integer q > 1. Let x is called the canonical index of x. Note that for two qlength D-discrete vectors x = (x 0 , x) and y = (y 0 , y), x = y if and only if for every 0 ≤ i < q, x i − y i = c for some constant c. In particular, x and y share the sameencoding if and only if they are identical. Call a D-discrete vector of the form x = (0, x) (with an offset x 0 = 0) a canonical vector.
The next observations show that both operations of entry-wise min and min-plus multiplication, with respect to D-discrete matrices and vectors, can be expressed via canonical vectors.
Observation 3. Let B q×q be a D-discrete matrix, x = (0, x) a q-length canonical D-discrete vector, and y = (y 0 , y) a q-length D-discrete vector, such that B⊗ x = y. Then, for any number c it holds that B ⊗ (c, x) = (y 0 + c, y).

An efficient D-discrete min-plus matrix-vector multiplication algorithm
Let A n×m be a D-discrete matrix, and fix a constant 1 log |D| (n+m) < λ < 1. We give an algorithm for min-plus http://www.almob.org/content/8/1/27 D-discrete matrix-vector multiplication that, after prepro- time under the RAM computational model. Our algorithm is an adaptation of Williams' algorithm [13] for finite semiring matrix-vector multiplications, with some notions similar to Frid and Gusfield's acceleration technique for RNA folding [14]. It follows the concept of the Four-Russians Algorithm [18] (see also [14,17,19]), i.e. preprocessing reoccurring computations, tabulating their results in lookup tables, and retrieving such results in order to accelerate the general computation. Specifically, the algorithm stores preprocessed computations of two kinds: matrix-vector min-plus multiplications, and vector entry-wise minima, where vectors and matrices are of q-length and of q × q dimensions, respectively, for q = λ log |D| (n + m) . For conducting this preprocessing, we will assume that |D| ≤ n + m, otherwise q = 0 and the multiplication cannot be accelerated using the suggested method. In addition, for simplicity of the analysis we assume that q 3 ≤ min(n, m). If this does not hold, a multiplication of the form A ⊗ x can be naively computed in the relatively efficient time complexity of O max(n, m) log 3 |D| (n + m) . The space complexity of the preprocessing phase is higher than the O(nm) space complexity of the standard multiplication algorithm and depends on the constant λ, ranging between O(nm|D|) and O nm(n+m) for λ values between

Preprocessing of matrix-vector ⊗ computations
Let n = q n q and m = q m q , and note that The multiplication of a q × q block with a q-length vector can be done in O(q 2 ) time in a straightforward manner and the encoding of the resulting q-length vector requires additional O(q) time. There are n m q 2 blocks in the decomposition of A , each is multiplied by |D| q−1 canonical vectors, and so the total time required for computing these multiplications is O q 2 |D| q−1 nm Let (y 0 , y) be the -encoding of some result y = B ⊗ x computed in the preprocessing of A as described above. Note that Therefore, the number of bits in the binary representation of y 0 is at most one plus the maximum number of bits required for the representations of |D| , and y can be represented in O(log(n + m)) bits. Thus, under the RAM computational model assumptions, each such encoding (y 0 , y) requires O(1) space units and can be written and read in a constant time, and therefore the overall space complexity for maintaining all MUL B tables . In addition, given a canonical index x, it is possible to retrieve the encod- in a constant time. Let x = (x 0 , x) be some (not necessarily canonical) qlength D-discrete vector, for which we wish to compute B ⊗ x. Due to Observation 3, the multiplication result can be obtained in constant time by retrieving (y 0 , y) = MUL B [ x], and returning the encoding (y 0 + x 0 , y).

Preprocessing of vector entry-wise min computations
The algorithm constructs a lookup table MIN, storing entry-wise min calculations between every canonical qlength D-discrete vector x = (0, x) and every q-length D-discrete vector y = (y 0 , y) such that abs(y 0 ) < q|D| (here abs(y 0 ) denotes the absolute value of y 0 ). For every such x and y, the table entry MIN[ x, y 0 , y] stores the -encoding (z 0 , z) of the vector z = min(x, y) (due to Lemma 3, z is D-discrete and can be encoded accordingly). There are O q|D||D| 2 Given two encoded q-length D-discrete vectors x = (x 0 , x) and y = (y 0 , y), the encoding (z 0 , z) of the vector z = min(x, y) can now be obtained in a constant time as follows:

Computing matrix-vector multiplications
Given an m-length D-discrete vector x and assuming the preprocessing of matrix A n×m was preformed as described above, we next explain how to efficiently compute the vector y = A ⊗ x. Note that y is an n-length D-discrete vector, due to Lemma 2.
Our algorithm computes first the multiplication y = A ⊗ x[ I 0,m ] in parts of length q. First, for every 0 ≤ j < m q , the algorithm computes the encoding x j 0 , x j of the sub-vector x j = x Q j of x. These encodings can be obtained in a total time of O(m). Then, for every 0 ≤ i < n q , the encoding y i 0 , y i of the sub-vector The encoded result z i,j 0 , z i,j of each multiplication z i,j = B i,j ⊗ x j can be obtained in a constant time as explained in Section "Preprocessing of matrix-vector ⊗ computations". As there are m q such terms to compute with respect to y i , their total computation time is O m q . In addition, the entry-wise min over all these terms can be computed by initializing y i 0 , y i ← z i,0 0 , z i,0 , and iteratively updat- .
The above matrix-vector min-plus multiplication algorithm can be used as a fast square matrix-matrix multiplication algorithm in a straightforward manner. For two D-discrete matrices A n×n and B n×n , the computation of C = A ⊗ B can be conducted by first preprocessing A as described in Sections "Preprocessing of matrix-vector

Online preprocessing of D-discrete matrices
In the previous section, we assumed the settings in which a D-discrete matrix is given, and that it is preprocessed once prior to any multiplication operation. Next, we describe how to maintain the required lookup tables for the case the input matrix is dynamic, acquiring additional rows and columns. Consider a streaming computational model, which begins with an initial empty matrix A 0 0×0 . In each step r, the current matrix A r n r ×m r is obtained from the previous matrix A r−1 n r−1 ×m r−1 by either adding an m r−1length vector as the last row or adding an n r−1 -length vector as the last column in the matrix. Note that n r +m r = r, and therefore the preprocessing block length corresponding to A r is q = λ log |D| (n r + m r ) = λ log |D| (r) . For the purpose of this analysis, we assume that λ ≤ 0.5 (note that this does not limit the asymptotic upper bounds of the running time). This assumption implies the following inequality Lookup tables corresponding to intermediate matrices along the series can be maintained as follows. Let r 0 and r 1 be the smallest integers such that the block sizes corresponding to A r 0 and A r 1 are q and q + 1, respectively. Assume that upon reaching A r 0 in the matrix sequence, all required lookup tables with respect to A r 0 are already computed. Along the series of steps r 0 , r 0 + 1, . . . , r 1 , we distribute two kinds of computations: (1) new MUL B tables for accumulated q × q blocks in matrices A r for r 0 ≤ r < r 1 , and (2) a new MIN table, as well as new MUL B tables, with respect to block length q + 1.
(1) Computing MUL B tables for accumulated q × q blocks. Assume that for some r 0 ≤ r < r 1 , a column was added to the matrix at step r so that the number of columns m r in the intermediate matrix A r is divisible by q. Thus, at most n r q ≤ r q new q × q complete blocks are now available for preprocessing. The computation of lookup tables of the form MUL B corresponding to these new blocks will be equally distributed along the series of q consecutive steps r, r + 1, . . . , r + q − 1, during which it is guaranteed that no column addition would introduce new complete q × q blocks in the matrix. As shown in Section "Preprocessing of matrix-vector ⊗ computations", the time required for processing a single q × q block is O q 2 |D| q−1 , and so the total time for processing all O r q blocks is O qr|D| q−1 . Thus, in each step among the q steps, there is a need to perform O r|D| q−1 = O r 1+λ |D| operations due to these computations. Symmetrically, computing lookup tables corresponding to new blocks added due to the accumulation of rows can be performed by conducting O r 1+λ |D| operations per step r.
Eq. 14 ≥ 2r 0 . In particular, r 2 = r 1 2 satisfies r 0 < r 2 < r 1 , and for every r 2 ≤ r < r 1 The MUL B tables are computed similarly as done in (1), starting with (q + 1) × (q + 1) blocks already present in A r 2 , and continuing with blocks accumulated as the sequence progress. The overall preprocessing time of all these blocks is O |D| . All in all, the time complexity due to computations of (1) and (2) for each step r 0 ≤ r < r 1 is O r 1+λ |D| . In particular, the overall time complexity of preprocessing the n-size prefix A 0 , A 1 , . . . , A n of the streamed matrices is O n 2+λ |D| .

The EDDC algorithm based on efficient D-discrete min-plus matrix-vector multiplication
Consider the EDDC problem in cases where all edit operation costs are integers. As explained in Section "D-discrete matrices and the EDDC problem with integer costs", the EDDC DP tables can be considered D-discrete. This property allows for efficient min-plus square D-discrete matrix-vector multiplications, using the algorithm described in Section "An efficient D-discrete min-plus matrix-vector multiplication algorithm" to yield Assume that some pair of prefixes s 0,i and t 0,j−1 was already processed, and all entries in the DP matrices corresponding to these prefixes are computed. We explain how to update the tables in case where the next letter to arrive is the letter t j−1 in t, where the case in which the arriving letter is from s is symmetric. The DP matrices are D-discrete, and assume that lookup tables for efficient min-plus multiplications of these matrices are maintained as explained in the previous section. The addition of t j−1 requires updating all matrices of the forms T ε , T α , and T α , for which the j-th row and column should be added. In addition, it is required to add the j-th column to matrices of the form ED and EDT α .
In the first stage, the algorithm computes rows and columns j in all matrices of the form T α , T α , and T ε . The process is similar to the computation of these entries by the loop in lines 5 to 8 of Algorithm 1, with the following modification. Let q j = λ log |D| (2j) , and let j = q j j q j . The algorithm first initializes the entries [j − 1, j] in all these matrices with the corresponding base-case values. The column is partitioned to intervals of length q j , where as before Q k denotes the interval I kq j ,(k+1)q j . Once an interval Q k is computed (i.e. the loop was executed with respect to index l = kq j ), the -encoding of the subvector T α Q k , j is computed and kept for its later usage as lookup index. In addition, upon starting to compute the entries within an interval Q k (i.e. when l = (k + 1)q j − 1), the following multiplications are computed for every α ∈ : Observe that all required entries for the computation of y α,k are already computed and stored in T α , and that similarly as done in Section "Computing matrixvector multiplications", y α,k can be computed by performing O j q j constant time lookup table queries.
, the number of expressions that need to be examined in line 5 of the loop with respect to l = kq j + x reduces to O(q j ) per entry (considering values of the index h between l and (k + 1)q j , and between j and j). Entries in matrices of the form T α and T ε are computed exactly as done in lines 6 and 7 of Algorithm 1, respectively.
In the second stage, column j is computed in matrices EDT α and ED. This is achieved by extending Equations 13 and 12 to have an entire column on the left-hand side, as follows: This completes the update of the DP tables due to the addition of the letter t j−1 .

Complexity analysis
After receiving n letters, the prefixes s 0,i and t 0,j of the input strings were preprocessed for some i, j such that i + j = n. The maintenance of lookup tables for efficient D-discrete multiplications requires at most O | |n 1+λ |D| operations per step among the first n steps, and O | |n 2+λ |D| operations for all first n steps, as shown in Section "Online preprocessing of D-discrete matrices".
Adding a letter t j−1 to the instance, the time required for processing the entries in column j of the T matrices is as follows. O

Online VMT algorithms
The online algorithm for EDDC presented in the previous section can be generalized for other problems with similar properties. Specifically, VMT problems [11], which utilize min-plus multiplications and for which it can be guaranteed that computed DP matrices are D-discrete, can have their algorithms implemented using the same framework as we have presented above. Thus, in contrast to the general case for VMT problems in which it is required that the complete input be available at the beginning of the algorithm's run, in the D-discrete case the input can be obtained in a streaming model. In addition, the asymptotic time complexity in such cases is slightly reduced with respect to the time complexity of the case of min-plus multiplication of general matrices. A concrete example to such a problem is the RNA base-pairing maximization problem [11,15], in which the difference between adjacent entries (in the single DP matrix the algorithm uses) is either 0 or 1. This property was previously exploited by Frid and Gusfield [14] to obtain an O n 3 log n algorithm for the problem. Using the D-discrete minplus multiplication algorithm presented here, this immediately implies an algorithm having the improved time bound of O n 3 log 2 n . Additional related problems from the domains of RNA folding and Context Free Grammars (CFGs) parsing fall under the VMT framework, and it is likely that D-discreteness can be exploited for accelerating the computation of more problems within this family.

Additional acceleration using run-length encoding
Let w be a string. A maximal substring of w containing multiple repeats of the same letter is called a run in w. The Run Length Encoding (RLE) of w is a representation of the string in which each run is encoded by the corresponding repeating letter α and its repeat count p (denoted α p ). For example, the string w = aabbbaccc is a concatenation of the four runs aa, bbb, a, and ccc, and its RLE is a 2 b 3 a 1 c 3 . Denote byw the compressed form of w, which replaces each run in w by a single occurrence of the corresponding letter. When n denotes the length of w,ñ will denote the length of the compressed form of w. The run indexĩ of a letter w i in w is the index of the run in which w i participates. It can be asserted that the compressed form of the substring w i,j of w is the sub-stringw˜i ,( j−1)+1 ofw. In the above example,w = abac, and thereforeñ = 4 (while n = 9). The run indices of all letters in w are given by the sequence [0, 0, 1, 1, 1, 2, 3, 3, 3], and the compressed form of w 3,8 = bbacc isw3 ,7+1 = w 1,4 = bac.
Previous works [7][8][9] showed how RLE can be exploited for improving the efficiency of EDDC algorithms. In these works it was required that the costs of duplications and contractions be less than the costs of all other operations (the requirement was implicit in [9], see discussion in Section "A comparison with previous works"). This requirement is somewhat unnatural for the application of minisatellite map comparison, since it assumes that mutations, which are typically common events, should cost more than the less common events of duplications and contractions. In this section, we adapt a similar RLE-based acceleration to our EDDC algorithm. The application of this acceleration requires the following constraint over cost functions:

Constraint 1. For every α, β ∈ , dup(α) ≤ dup(β) + mut(β, α) ≤ ins(α), and cont(α) ≤ cont(β)+mut(α, β) ≤ del(α).
The constraint dup(β) + mut(β, α) ≤ ins(α) implies that it never costs more to replace an insertion of some letter α into some nonempty string by the duplication of a letter β adjacent to the insertion position, and its consecutive mutation to α. Thus, we may assume w.l.o.g that optimal edit scripts do not contain insertions (unless applied to empty strings), or in other words, generation of new letters can only be obtained via duplications. Such an assumption is relatively reasonable in the context of minisatellite map comparison, considering the biological mechanisms that describe generative modifications.
The constraint dup(α) ≤ dup(β) + mut(β, α) can be intuitively understood by the example of generating a string of the form ααβ from a string of the form αβ. Due to the constraint, it would cost the same or less if the string ααβ is obtained by duplicating the α letter in αβ, rather than by duplicating the β letter and mutating its left copy into α. Again, such an assumption is relatively reasonable for the minisatellite map application. Symmetric arguments hold with respect to the constraint over contraction and deletion costs. http://www.almob.org/content/8/1/27

Algorithm 3: RL-LETTER-TO-STRING(t)
1 Run Stage 1 of Algorithm 2 over the input stringt. Denote computed matrices T α byT α . 2 For n the length of t, allocate a vector DC of length (n + 1), and set its two first entries DC[0] and DC [1] to zeros. 3 For j = 2, 3, . . . , n do 4
The correctness of Observation 4 follows from the existence of a script from s to wββ whose cost is ed(s, wβ) + dup(β): this script first applies an optimal script to transform s into wβ at cost ed(s, wβ), and then duplicates the last β in wβ at cost dup(β).
Next, we show how to reduce the number of expressions that need to be considered in the EDDC recursive equations, in case Constraint 1 applies. For a string w of length at least 2, denote by R(w) ⊆ P(w) the set of all partitions (w a , w b ) of w such the last letter in w a is different from the first letter in w b . For example, for w = aabbbcdddd, R(w) = {(aa, bbbcdddd), (aabbb, cdddd), (aabbbc, dddd)}. Observe that |R(w)| = n − 1.
We start by describing how to improve the computation efficiency of EDDC for cases in which one of the input strings contains a single letter. Denote by dupcost(w) the cost of the edit script fromw to w which generates each run α p in w by applying p − 1 duplication operations over the corresponding letter α inw. Similarly, denote by contcost(w) the cost of the edit script from w tow which reduces each run α p in w by applying p − 1 contraction operations over α. For example, for w = aabbbbaaccc, dupcost(w) = 2dup(a) + 3dup(b) + 2dup(c) and contcost(w) = 2cont(a) + 3cont(b) + 2cont(c). Note that dupcost(w) ≥ ed(w, w), and contcost(w) ≥ ed(w,w). It is simple to assert the following recursive relations: The following lemma shows that when one of the input strings contains a single letter, the edit distance can be inferred from the edit distance between this letter and the compressed form of the second string.
The following lemma shows that given a certain edit script from string u, its cost is greater than or equal to the cost of its application on a superstring of u.
An algorithm which is symmetric to Algorithm 3 can be described in order to preprocess a string s for queries of the form ed s i,j , α .
We continue to describe the improved computation in the case where both input strings s and t are of length at least 2. To do so, we first add some auxiliary notation. For an interval I x,y of positions within a string w, denote byĨ x,y the subsequence of indices in I x,y which are start positions of runs in w. For example, for w = aabbbaacccc, the interval I 1,7 =[1, 2, . . . , 6] contains all positions of letters within the substring w 1,7 = abbbaa, andĨ 1,7 =[2, 5] contains the start positions in w of the runs bbb and aa that are included in I 1,7 (the first letter a in w 1,7 belongs to a run in w that starts in position 0, and therefore position 1 is not included inĨ 1,7 ). This notation will be used for defining subsequences of rows and columns in DP matrices maintained by the algorithm, where some of these intervals are derived from the source string s, and some from the target string t. We will assume that the string from whichĨ x,y was derived is clear from the context, and will not specify it explicitly. For example, whenĨ x,y defines rows in matrices ED or EDT α , or either rows or columns in matrices S α , then the indices iñ I x,y are derived from the source string s. WhenĨ x,y defines columns in matrices ED or EDT α , or either rows or columns in matrices T α , then the indices inĨ x,y are derived from the target string t. SubsequencesĨ x,y will be used for defining sparse regions in matrices, i.e. regions containing sets of rows or columns which are not necessarily adjacent.
Consider the computation of ed(s, t) as expressed in Equation 8. Assume first the special case where s ends with a run of length at least 2. In this case, s is of the form s = wββ for some string w and a letter β. For every partition (t a , t b ) of t, it is possible to combine an optimal script ES 1 from the prefix wβ of s to t a and an optimal script ES 2 from the suffix β of s to t b , and to obtain a script ES = ES 1 (wββ), ES 2 (t a β) from s to t. Therefore, ed(wββ, t) ≤ cost(ES) = cost(ES 1 (wββ))+ ed(β, t b ). In particular, ed(wββ, t) ≤ min ed(wβ, t a ) + ed (β, t b ) | (t a , t b ) ∈ P(t) Eq.9 = edt β (wβ, t). In addition, it is possible to compose an edit script from wββ to t by first contracting the last two letters to obtain the string wβ, and then applying an optimal script from wβ to t. The cost of such a script is ed(wβ, t) + cont(β), and therefore we get that ed(wββ, t) ≤ min edt β (wβ, t), ed(wβ, t) + cont (β) .

ed(wββ, t) = min edt β (wβ, t), ed(wβ, t) + cont(β)
(20) Formulating Equation 20 with respect to the data structures defined in Section "A baseline dynamic-programming algorithm for EDDC" (under the assumption that all values appearing at the right-hand side of the equation are computed and stored in the corresponding entries), we get the following equation: Now, consider the case where the last run in s is of length 1 (i.e. s is not of the form wββ). Assume first that the term that yields the minimum value of the right-hand side of Equation 8 is of the form edt α (s a , t) + ed s b , α for some partition (s a , s b ) ∈ P(s) and a letter α ∈ . If (s a , s b ) / ∈ R(s), then there is some letter β ∈ which is both the last letter of s a and the first letter of s b . In this case, we can write s a = wβ and s b = βu. Note that u = ε (since s = wββ by definition), and so ed(s, t) = edt α (wβ, t) + ed(βu, α) Eq.9 = min ed (wβ, t a ) + ed α, t b | t a , t b ∈ P(t) + ed(βu, α) http://www.almob.org/content/8/1/27 (u, α). From the optimality of the partition s a , s b = (wβ, βu), it follows that ed(s, t) = edt α (wββ, t) + ed(u, α). If u starts with β this step can be repeated, and inductively we can apply such partition refinements until obtaining a partition s a , s b of s such that ed(s, t) = edt α (s a , t) + ed s b , α and s a , s b ∈ R(s). Now, Equation 8 can be revised as follows: Using the DP formulation, we get Similarly as shown for the computation of ed(s, t), it is possible to revise the computation of edt α (s, t) under Constraint 1 and obtain the following equations: Finally, we present Algorithm 4, which is an efficient version of Algorithm 2. Stage 1 of the new algorithm is accelerated by using Algorithm 3 to compute for every α ∈ distances from α all substrings of s and distances from all substrings of t to α. In Stage 2, we use the equations developed above in order to accelerate the computation. The correctness of the algorithm follows from the correctness of the recursive equations, and can be asserted similarly as done for Algorithm 2.

Complexity analysis
Assume for simplicity that compressed forms of both input strings s and t have the same lengthñ.  The recursive computation of RL-COMPUTE-MATRIX can be visualized as a tree (see Figure 4). Each node in the tree corresponds to a call to RL-COMPUTE-MATRIX over some regions I i,k × I j,l , which is either a leaf in case that k = i + 1 and l = j + 1, or otherwise an internal node. In the latter case, the node has exactly two children, corresponding to the two recursive calls obtained from either a vertical (lines 11 and 13) or a horizontal (lines 16 and 18) partition of the region. For simplicity, assume that the interval length n = Ĩ 2,n+1 = 2 x for some integer x. It can be observed that the algorithm alternates between vertical and horizontal partitions along paths from the root of the tree, where regions of two different nodes in the same depth y are disjoint, and the union of all regions of nodes in depth y covers the entire initial region I 2,n+1 × I 2,n+1 of the root node. For every 0 ≤ y ≤ log(n), there are two series of intervals I y,0 , I y,1 , . . . , I y,2 y −1 and J y,0 , J y,1 , . . . , J y,2 y −1 , such that the set of regions corresponding to all nodes in depth 2y is I y,f × J y,g | 0 ≤ f < 2 y , 0 ≤ g < 2 y , and the set of regions corresponding to all nodes in depth 2y + 1 is I y,f × J y+1,g | 0 ≤ f < 2 y , 0 ≤ g < 2 y+1 . In http://www.almob.org/content/8/1/27 addition, the corresponding subsequencesĨ y,0 , . . . ,Ĩ y,2 y −1 andJ y,0 , . . . ,J y,2 y −1 have all the same size 2 x−y .
Consider a node of depth 2y whose corresponding region is I y,f × J y,g , and the two regions corresponding to its children I y,f × J y+1,2g and I y,f × J y+1,2g+1 . The computation time spent on the node is dominated by the matrix multiplications performed in line 12 of RL-COMPUTE-MATRIX. This includes | | matrix multiplications between pairs of matrices such the dimensions of the first matrix in each pair is I y,f × J y+1,2g = I y,f × 2 x−y−1 , and the dimensions of the second matrix in a pair is J y+1,2g × J y+1,2g+1 = 2 x−y−1 × 2 x−y−1 . Observe that Algorithm 4: RL-MATRIX-EDDC(s, t) // Stage 1 1 Let n denote the lengths of s and t. Run Algorithm 3 to preprocess s and t, and generate (n + 1) × (n + 1) matrices S α and T α for every α ∈ as defined in Section "The algorithm" (applying Equation 19).
// Stage 2 2 Allocate (n + 1) × (n + 1) matrices EDT α for every α ∈ , and an (n + 1) × (n + 1) matrix ED. Initialize base-case corresponding entries in all matrices EDT α and ED as describe in Algorithm 1. This includes all entries in the first two rows and the first two columns in these matrices. Eq.25 If Ĩ j,l ≥ Ĩ i,k then // vertical partitioning 10 Let j < h < l be the smallest index such thath = j +l 2 .

12
Update   (2 x−y−1 ). Therefore, the time required for all matrix multiplications performed within nodes in depth 2y is 0≤f <2 y 0≤g<2 y Similarly, it is possible to show that the total time required for all matrix multiplications performed within nodes in depth 2y + 1 is also 2n| |MP(2 x−y−1 ) n , and so the total computation time of matrix multiplications throughout the entire algorithm run is O | |ñ n 0≤y<x MP(2 y ) . As in Section "The algorithm", using the Master Theorem [16], this summation evaluates to O | |nMP(ñ) n . In addition to matrix multiplications, RL-COMPUTE-MATRIX performs O(| |n 2 ) operations in base computations (lines 2-7), and so the total time complexity of the complete algorithm is O | |n 2 + | |nMP(ñ) n . A simple implementation of Algorithm 4 can be done using the same space complexity of O | |n 2 , as the space complexity of Algorithm 2. A more involved implementation can be applied by observing that in fact the algorithm only examines and updates entries in matrices of dimensions at most n ×ñ orñ × n when performing matrix multiplications, and in addition it examines adjacent entries "to the left" or "above" an entry in a base-case region. This observation can be used in order to reduce the space complexity of the algorithm to O (| |nñ), where the complete details of such an implementation are omitted from this text.

A comparison with previous works
In this section, we review the previous main algorithms for EDDC by Behzadi and Steyaert [7], Bérard et al. [8] and Abouelhoda et al. [9], and point out similarities and improvements made in our current work.
The main contribution of our work is in obtaining sub-cubic algorithms for EDDC, whereas all previous algorithms have cubic time complexities (for | | the alphabet size, n the length of the input strings andñ the length of their RLE compressed forms, the algorithms of [7], [8], and [9] obtain the time complexities O n 2 + nñ 2 + | |ñ 3 + | | 2ñ2 , O n 3 + | |ñ 3 , and O(n 2 + nñ 2 ), respectively).
Notably, the algorithm of [9] eliminates a | | factor that appears in the time complexities of the algorithms given in [7,8] and here. However, this improvement is confined to a constrained model of duplication histories. As we do not assume this model here, we could not use the representation of [9] that allows the elimination of the | | time complexity factor.
In general, the frameworks of all algorithms in [7][8][9] as well as the algorithms presented here are similar. All these algorithms apply two phases, where the first phase computes costs corresponding to all substrings of each one of the input strings separately, and the second phase uses these precomputed costs in order to compute the edit distance between each pair of prefixes of the input strings (our online variant described in Section "An online algorithm for EDDC using min-plus matrixvector multiplication for discrete cost functions" interleaves these two phases, yet each operation it conducts can be conceptually attributed to one of the phases). The recursive formulas are similar as well, where those for the first phase can be viewed as special kinds of Weighted Context Free Grammar derivation rules.
Next, we address the cost function constraints. All algorithms assume that operation costs are nonnegative and apply additional assumptions similarly to those listed in our Property 1, which can be made without loss of generality.
In [8], operation costs were limited so that all duplications and contractions have the same constant cost (regardless of the letter over which they are applied), all deletions and insertions have the same constant cost, and all mutation costs are symmetric (i.e. mut(α, β) = mut(β, α) for every α, β ∈ ). While it was argued that these restrictions allow edit distance to be a metric, they limit the generality of the algorithm of [8], where the rest of the previous algorithms we mentioned can handle scoring schemes that not necessarily abide by these restrictions.
Both in [7] and in [8], it was required that all duplication and contraction costs are lower than the costs of any of the insertion, deletion, or mutation costs. This restriction is not explicitly stated in [9], yet seems to be required there as well. For the application of minisatellite map comparison, this requirement is somewhat unnatural since it assumes that mutations, which are typically common events, should cost more than the less common events of duplications and contractions. Our algorithms can be applied even when this restriction does not hold. However, one of our algorithms, the RLE variant (Section "Additional acceleration using run-length http://www.almob.org/content/8/1/27 encoding") adds a new requirement that was absent from those previous algorithms: it requires that for every α, β ∈ , dup(α) ≤ dup(β) + mut(β, α) ≤ ins(α), and cont(α) ≤ cont(β) + mut(α, β) ≤ del(α) (our Constraint 1). On one hand, our Constraint 1 is more strict than the constraint of [7] and [8], in the sense that it implies nonnegative lower bounds over differences of the form ins(α) − dup(α) and del(α) − cont(α), while in [7] and [8] it was only required that these differences be nonnegative. On the other hand, our Constraint 1 does not require that the cost of mutations be higher than the cost of duplications and contractions.
We showed that our algorithms are more general with respect to the assumed constraints. We also claim that our algorithms are more precise with respect to the formal problem specification. All previous algorithms (excluding the first algorithm by Bérard and Rivals [2], which had an O(n 4 ) running time and assumed a constant cost for all mutations in addition to the restrictions in [8]) might output non-optimal solutions in certain cases, as demonstrated in the following example. Consider the input s = ab, t = ef , and the cost function in which all duplications and contractions cost 1, all deletions and insertions cost 20, and the symmetric mutation costs are as given in Table 1. It can be shown that all three algorithms in [7], [8], and [9] would output the value 18 as the edit distance between the input strings, reflecting one of the edit scripts ab, eb, ef or ab, af , ef . Nevertheless, the correct value is 17, due to the script ab, cb, cc, c, d, dd, ed, ef . Perhaps it could be possible to specify additional restrictions over the cost functions in order to guarantee that the algorithms in [7], [8], and [9] return optimal solutions for all instances.
Conclusions and discussion http://www.almob.org/content/8/1/27 scripts, each script ES i transforms some intermediate string w i−1 into a string w i , and s = w 0 and t = w q .
The correctness of the above observation follows from the fact that for a pair of optimal edit scripts ES 1 from s to w and ES 2 from w to t, the script ES = ES 1 , ES 2 from s to t satisfies ed(s, t) ≤ cost(ES) = cost(ES 1 ) + cost(ES 2 ) = ed(s, w) + ed(w, t). Let s and t be strings. Call a pair of partitions (s a , s b ) ∈ P(s) and (t a , t b ) ∈ P(t) an optimal pairwise partition of s and t if ed(s, t) = ed (s a , t a ) + ed s b , t b . Say that an edit script ES from s to t is a shortest optimal script from s to t if ES is optimal, and for every other optimal script ES from s to t, |ES| ≤ ES . For a script ES = s = u 0 , u 1 , . . . , u r = t from s to t and 0 ≤ i ≤ j ≤ r, denote by ES i,j = u i , u i+1 , . . . , u j the partial script of ES from u i to u j . Observation 6. Let ES = u 0 , u 1 , . . . , u r be a shortest optimal edit script from u 0 to u r . For every 0 ≤ i ≤ j ≤ r, the partial script ES i,j is a shortest optimal edit script from u i to u j . Moreover, for any shortest optimal script ES * i,j from u i to u j within ES, the script ES * = ES 0,i , ES * i,j , ES j,r is a shortest optimal script from u 0 to u r .
The correctness of the above observation is obtained by noting that if ES i,j is not a shortest optimal script from u i to u j , then for some shortest optimal script ES * i,j from u i to u j we get that the script ES * = ES 0,i , ES * i,j , ES j,r either has a lower cost than ES, or is a shorter script of the same cost, in contradiction to ES being a shortest optimal script from u 0 to u r .

Lemma 9.
Let ES = u 0 , u 1 , . . . , u r be a shortest optimal edit script from u 0 to u r . If there are two indices 0 ≤ i < j ≤ r such that u i and u j are strings of length 1, then j = i + 1. In addition, for every 0 < k < r, u k = ε.
Proof. Assume there are two indices 0 ≤ i < j ≤ r such that u i and u j are strings of length 1, i.e u i = α and u j = β for some α, β ∈ . From Observation 6, the partial script ES i,j = α = u i , u i+1 , . . . , u j = β is a shortest optimal script from α to β. Since j > i, it must be that α = β (otherwise the script ES = ES 0,i , ES j,r is a shorter script from u 0 to u r of no greater cost than ES, in contradiction to ES being a shortest optimal script from u 0 to u r ). From Property 1, ed(α, β) = mut(α, β), and so the edit script containing the single operation of mutating α to β is an optimal script from α to β, and it must be that j = i + 1.
In addition, assume by contradiction there is some index 0 < k < r such that u k = ε. The only edit operation which may yield an empty string is a deletion from a single-letter string, and therefore u k−1 = α for some letter α. Similarly, the only edit operation which may be applied over an empty string is an insertion, therefore u k+1 = β for some letter β, in contradiction to the fact that two intermediate strings of length 1 must be consecutive along a shortest optimal script, as shown above.
Call an edit script ES from a string s to a string t simple if ES is a shortest optimal script from s to t, in which no generating operation precedes a reducing operation. The following lemma generalizes Lemma 2 of http://www.almob.org/content/8/1/27 [6], by considering also indels in addition to contractions and duplications.

Lemma 10. For every pair of strings s and t, there exists a simple edit script from s to t.
Proof. Let s and t be two strings, and r the length of a shortest optimal script from s to t. When r ≤ 1, any shortest optimal script from s to t either contains no reducing operation or contains no generating operation, and in particular is a simple script. Otherwise, r > 1, and assume by induction the lemma holds for every pair of strings such that the length of a shortest optimal script from the source string to the target string is less than r. Let ES = s = u 0 , u 1 , . . . , u r = t be a shortest optimal script from s to t.
Case 1: The first operation in ES is not a generating operation. From Observation 6, the partial script ES 1,r is a shortest optimal script from u 1 to u r , whose length is r − 1. From the inductive assumption, there is a simple script ES * 1,r from u 1 to u r , and from Observation 6 the script ES * = ES 0,1 , ES * 1,r is a shortest optimal script from s to t. As the first operation in ES * is non-generating (being the same first operation as in ES), ES * is simple, and the lemma follows.
Case 2: The first operation in ES is a generating operation. Similarly as above, we may assume w.l.o.g. by applying the inductive assumption that the partial script ES 1,r is simple. If this partial script is non-reducing, then ES is non-reducing, and in particular it is simple. Otherwise, let 1 ≤ i < r be the smallest index such that the transformation of u i to u i+1 is by a reducing operation. Since neither generating nor reducing operations may precede this operation in the partial script ES 1,i , it follows that all operations in the partial script ES 1,i (if there are any) are mutations.
The generating operation transforming u 0 to u 1 is either an insertion or a duplication of some letter α in u 0 . In both cases, we can write s = u 0 = vxw and u 1 = vx w (v, x, x and w are strings), where in the former case x = ε and x = α, and in the latter case x = α and x = αα. As all operations in the partial script ES 1,i are mutations, each intermediate string u j , for 1 ≤ j ≤ i, is of the form v j x j w j , where v j , x j , and w j are string obtained by applying zero or more mutations over v, x , and w, respectively. We argue that the reducing operation transforming u i = v i x i w i to u i+1 cannot be the deletion of a letter or a contraction involving at least one letter within the substring x i . This is true, since in such a case it would have been possible to avoid the first generating operation in ES (transforming x to x ), as well as all mutation operations over a reduced letter in x i , and the reducing operation from u i to u i+1 . This would yield a script ES * 0,i+1 from u 0 to u i+1 which is shorter and of no higher coast than ES 0,i+1 , in contradiction to Observation 6. Hence, the reducing operation from u i to u i+1 either deletes a letter or contracts two letters within one of the substrings v i or w i of u i .
Consider first the case where the reducing operation over u i is applied within its prefix v i . Thus, we can write u i+1 = v i+1 x i+1 w i+1 , where v i+1 is the string obtained by applying the corresponding reducing operation over v i , The operations in ES 0,i+1 can be assigned into two independent scripts: a script obtained by merging each multiple occurrence of consecutive identical strings in the series v = v 1 , v 2 , . . . , v i+1 into a single occurrence, and similarly a script ES xw = xw = (xw) 0 , (xw) 1 Each operation in ES 0,i+1 corresponds to exactly one operation in either ES v or ES xw , where the costs of corresponding operations are equal, and therefore cost( is a script from v i+1 xw to u i+1 . Thus, the script ES * 0,i+1 = ES v (u 0 ), ES xw (v i+1 xw) is a script from u 0 to u i+1 . Since ES v contains at least one operation (the reducing operation from v i to v i+1 ) and no generating operation (since besides the reducing operation ES v may contain only mutations), ES * 0,i+1 starts with a non-generating operation. In addition, cost(ES * 0,i+1 ) = . From Observation 6, ES 0,i+1 is a shortest optimal script from u 0 to u i+1 , and so ES * 0,i+1 is a shortest optimal script from u 0 to u i+1 . Applying Observation 6 again, the script ES * = ES * 0,i+1 ES i+1,r is a shortest optimal script from s to t. Now, the lemma follows from Case 1 of this proof and from the fact the first operation in ES * is not a generating operation.
Lemma 11. For every α ∈ and every nonempty string t, any simple script from α to t is non-reducing.
Proof. Let ES be a simple script from α to t, and assume by contradiction ES contains a reducing operation. Since ES is simple, all reducing operations in ES occur prior to any generating operation, and in particular the first reducing operation is applied after applying zero or more http://www.almob.org/content/8/1/27 mutations over α. Such a reducing operation must be a deletion from a string of length 1, resulting with an empty intermediate string, in contradiction to Lemma 9. Lemma 12. Let w and t be strings and β a letter, such that w = ε, t is of length at least 2, and there is a nonreducing simple script from wβ to t. Then, ed(wβ, t) = min ed (w, t a ) + ed β, t b | t a , t b ∈ P(t) .
Proof. Let ES = wβ = u 0 , u 1 , . . . , u r = t be a nonreducing simple script from wβ to t. For every 0 ≤ i ≤ r, construct a partition (u i,a , u i,b ) of u i which sustains that ed wβ, u i ≥ ed w, u i,a + ed β, u i,b , as follows. For i = 0, set (u 0,a , u 0,b ) = (w, β), where by definition ed wβ, u 0 = ed w, u 0,a + ed β, u 0,b = 0. Now, assume inductively for some 0 < i ≤ r and a partition u i−1,a , u i−1,b of u i−1 that ed wβ, u i−1 ≥ ed w, u i−1,a + ed β, u i−1,b . If the non-reducing operation transforming u i−1 to u i is a mutation, an insertion, or a duplication of a letter within the prefix u i−1,a , then set u i,a to be the string obtained by applying this operation over u i−1,a , and set u i,b = u i−1,b . Otherwise, the operation is a mutation, an insertion, or a duplication of a letter within the suffix u i−1,b , and in this case set u i,b to be the string obtained by applying this operation over u i−1,b , and u i,a = u i−1,a . Note that in both cases, cost(ES i−1,i ) = ed u i−1,a , u i,a + ed u i−1,b , u i,b , therefore we get from the inductive assumption that ed wβ, u i Obs ≥ ed w, u i,a +ed β, u i,b . The process above generates a partition t * a , t * b = u r,a , u r,b of t = u r , for which ed u i , t ≥ ed (w, t * a ) + ed β, t * b . In particular, ed(wβ, t) ≥ min ed (w, t a ) + ed β, t b | t a , t b ∈ P(t) . On the other hand, ed(wβ, t) Lem.8 ≤ min ed (w, t a ) + ed β, t b | t a , t b ∈ P(t) , and the lemma follows.
Based on the above observations and lemmas, we now turn to prove the recursive computation given in Section "The recurrence formula", starting with Equation 7. Fix henceforth a pair of input strings s and t, each containing at least two letters. Note that for every α ∈ and every partitions (s a , s b ) ∈ P(s) and (t a , t b ) ∈

P(t), ed(s, t)
Obs.5 ≤ ed(s, α) + ed(α, t), and ed(s, t) Thus, to prove the correctness of Equation 7, it remains to show that From Lemma 10, there is a simple script ES = s = u 0 , u 1 , . . . , u r = t from s to t, and in particular, there is a string u i along ES such that the partial script ES 0,i is non-generating, and the partial script ES i,r is nonreducing. Recall that ed(s, t) = cost(ES) = cost(ES 0,i ) +

cost(ES i,r )
Obs.6 = ed(s, u i ) + ed(u i , t). If u i = β for some letter β, then ed(s, t) = ed(s, β) + ed(β, t) ≥ min {ed(s, α) + ed(α, t) | α ∈ }. Otherwise, u i contains at least two letters. In this case, we can write u i = wβ, where β is the last letter in u i and w is the nonempty prefix of u i containing all letters except for the last one. From Lemma 12,ed(u concluding the proof of Equation 7. We next continue to develop the recursive computation, considering the simpler cases where one of the input strings is either empty or contains a single letter, and the other string contains at least two letters. Let ES be a simple script from ε to t whose length is r. ES must start with an insertion of some letter α, and from Observation 6, the remainder of the script ES 1,r is an optimal script from α to t, implying the correctness of Equation 1. The correctness of Equation 4 is shown symmetrically. Now, consider the computation of ed(α, t), as expressed in the last term of Equation 2. From Lemma 11, a simple script from α to t is non-reducing, and so the first operation in such a script is either the mutation of α, or some generating operation. If there is such a script in which the first operation is generating, then ed(α, t) = ed (α, t) = mut(α, α) + ed (α, t) ≥ min mut(α, β) + ed (β, t) | β ∈ . Else, there is a simple script from α to t in which the first operation is the mutation of α into some letter β. Due to Lemma 9, the following operation must be a generating operation, and so the reminder of the script is an optimal script from β to t http://www.almob.org/content/8/1/27 in which the first operation is generating, implying again that ed(α, t) ≥ min mut(α, β) + ed (β, t) | β ∈ . The other direction of the inequality is shown similarly as done above for Equation 7, concluding the correctness proof of Equation 2.
We now address the correctness of Equation 3. Consider the minimum cost of a script from α to t which starts with a generating operation. Let ES be such a script, and let r denote its length.
The cases where the first operation in ES is the insertion of some letter before α, or the duplication of α, are solved similarly and imply that The other direction of the inequality is shown similarly as shown for Equation 7, concluding the proof for Equation 3. The correctness of Equations 5 and 6 is shown symmetrically.

Correctness of Algorithm 2
We next show that when the precondition of COMPUTE-MATRIX holds with respect to its input region I i,k × I j,l , executing the procedure derives its postcondition, i.e. the procedure computes correctly all entries in the input region within EDT α and ED. The base case of COMPUTE-MATRIX occurs when k = i + 1 and l = j + 1. In this case, I i,k = i and I j,l = j, and from the precondition we get . Thus, all requirements of the precondition with respect to the region I i,k × I j,h are met, and the procedure is called recursively in line 6 over this region. From the postcondition of the recursive call, upon arriving to line 7 all entries in the region I i,k ×I j,h in matrices EDT α and ED contain the solutions for the corresponding sub-instances. In particular, it may be observed that at this point of the run, all requirements for the precondition to hold with respect to the region I i,k × I h,l are met, with the exception of the requirements regarding entries in the region I i,k × I h,l of matrices EDT α . Again, from the precondition and Observation ], and therefore after executing line 7, the precondition holds with respect to the region I i,k × I h,l . After returning from the recursive call in line 8, all entries in the region I i,k × I j,l are computed, and the postcondition of the procedure is met. The correctness of the computation conducted lines 10-13 in the case where l − j < k − i is shown similarly.
Note that the initial call to COMPUTE-MATRIX from line 5 of Algorithm 2 is applied over the complete region I i,k × I j,l = I 2,n+1 × I 2,n+1 . It may be observed that after the initialization in lines 3 and 4 of Algorithm 2, the precondition of COMPUTE-MATRIX is met with respect to this region. Therefore, it follows from the postcondition that once the computation terminates all entries in matrices EDT α and ED contain the solutions for the corresponding sub-instances. In particular, ED[n, n] holds the solution ed(s 0,n , t 0,n ) = ed(s, t), and the returned value in line 6 of Algorithm 2 is correct.

Proofs to lemmas corresponding to the EDDC algorithm for discrete cost functions
Proof of lemma 1: Matrix multiplications computed along the run of Algorithm 2 occur in lines 7 and 12 of Procedure COMPUTE-MATRIX, and additional implicit such multiplications occur when the Inside-VMT algorithm is used in Stage 1 of the algorithm. Note that in such computations, all entries in the multiplied submatrices already contain the computed solutions for the corresponding sub-instances. In addition, matrix multiplications conducted by the Inside-VMT algorithm are applied only over sub-matrices A[I i 1 ,i 2 , I j 1 ,j 2 ] such that i 2 ≤ j 1 (see [11]), and thus, D-discreteness in matrices computed in Stage 1 need to be shown only with respect to adjacent entries A[i, j], A[i − 1, j] such that i ≤ j. In what follows, let 0 < i ≤ n and 0 ≤ j ≤ n be two integers for n the length of s and t.

Proofs to lemmas corresponding to the run-length encoding based EDDC algorithm
Proof of lemma 5: We show that ed(α, βw) ≥ ed(α, w) + dup(β), where the other inequalities are proven similarly.
Let r be the length of a simple script from α to βw. Observe that r ≥ 1, since by definition βw = α. When r = 1, the single operation applied over α must be a generating operation (since w = ε). As discussed in Section "Additional acceleration using run-length encoding", we may assume this operation is the duplication of α, and so β = w = α. In this case, ed(α, βw) = dup(α) = ed(s, w) + dup(β).
Proof of lemma 6: Note that if w = ε or w = β for some β ∈ , thenw = w, dupcost(w) = contcost(w) = 0, and the lemma holds in a straightforward manner. http://www.almob.org/content/8/1/27 Otherwise, w is of length at least 2, and we prove the lemma by induction over the length r of a simple script between α and w. Assume by induction the lemma holds for every pair of input strings such that the length of a simple script from the source to the target string is less than r. We show that ed(α, w) = ed(α,w) + dupcost(w), where the proof that ed(w, α) = contcost(w) + ed(w, α) is symmetric.

dupcost(w)
Obs.5 ≥ ed(α,w) + dupcost(w). Else, (w a , w b ) / ∈ R(w), and so there is some letter β ∈ such that w a ends with β and w b starts with β. In this case, there are some strings u a , u b and integers p, q > 0, such that w a = u a β p , w b = β q u b , u a does not end with β, and u b does not start with β. Moreover,w a =ũ a β,w b = βũ b , andw = ≥ ed(α,w) + dupcost(w).