Listing all sorting reversals in quadratic time
 Krister M Swenson^{1, 2}Email author,
 Ghada Badr^{3, 4} and
 David Sankoff^{1}
https://doi.org/10.1186/17487188611
© Swenson et al; licensee BioMed Central Ltd. 2011
Received: 16 August 2010
Accepted: 19 April 2011
Published: 19 April 2011
Abstract
We describe an averagecase O(n^{2}) algorithm to list all reversals on a signed permutation π that, when applied to π, produce a permutation that is closer to the identity. This algorithm is optimal in the sense that, the time it takes to write the list is Ω(n^{2}) in the worst case.
1 Introduction
In 1995 Hannenhalli and Pevzner [1] presented an algorithm to transform one genome into another in a minimum number of biologically plausible moves. They modeled a genome as a signed permutation and the move that they considered was the reversal: the order of a substring of the permutation is reversed, and the sign of each element in the substring is flipped. Since then many refinements and speed improvements have been developed [2–8].
In 2002 Siepel and Ajana et al. [9, 10] showed how to list every parsimonious scenario of reversals, each scenario being a proposed candidate for the true evolutionary history. Fundamental to their algorithms are O(n^{3}) techniques for finding all sorting reversals; the reversals that at each step produce a permutation that is closer to the target permutation than the last. Ajana et al. [9] used these results to support the replicationdirected reversal hypothesis. Lefebvre et al. [11] and Sankoff et al. [12] used similar methodology to gain insight into the distribution of reversal lengths between genomes. Algorithms that attempt to more succinctly represent all shortestlength scenarios [13, 14] have also been developed.
In this paper we show how to list all sorting reversals in O(n^{2}) time on average. This algorithm is optimal in the sense that there are Ω(n^{2}) safe cyclesplitting reversals in the worst case. We later give a family of permutations that have Ω(n^{2}) unsafe reversals.
We implemented our algorithm in Java, and show experimentally that our algorithm is significantly faster than that of Siepel. This will afford a marked speedup of the aforementioned methods [9–14], since listing all sorting reversals is the kernel of repeated computation in each of them, especially when applied to permutations of sizes 3 × 10^{3} to 3 × 10^{5} (the size of bacterial or mammalian genomes).
After giving background material in Section 2 we introduce ominous substrings in Section 3. Section 4 describes how to detect the set of all ominous substrings of a permutation efficiently while Section 5 presents the algorithm. Section 6 shows the empirical speedup that our implementation affords. Finally, Section 7 gives a family of permutations that have Ω(n^{2}) unsafe reversals and discusses open problems.
2 Background
Given signed permutations π_{1} and π_{2}, the reversal distance d(π_{1}, π_{2}) is the smallest k such that π_{2} = π_{1} ○ ρ_{1} ○ ρ_{2} ○ ··· ○ ρ_{ k }. Since , we consider π_{2} = I = 1, 2,..., n to be the identity permutation.
In this paper, we describe our methods using circular permutations (when written on a line, the leftmost element follows the rightmost element), as any sorting reversal on a circular permutation has its counterpart on a linear version of the permutation. Occasionally, however, we refer to the linearization of a permutation π; this is a linear version of π that maintains the same ordering as the clockwise ordering of π but has a leftmost and a rightmost element.
2.1 All Sorting Reversals
 1.
 2.
 3.
A reversal ρ(i, j) is said to act on the reality edges and because these are the only edges in the breakpoint graph of π that are not in the graph of π ○ ρ(i, j). In Figure 1, the reversal ρ(6, 8) acts on reality edges (3^{}, 1^{+}) and (6^{+}, 8^{}). Two reality edges on the same cycle are convergent if a traversal of their cycle visits each edge in the same direction in the circular embedding; otherwise they are divergent. The following definitions classify the action of a reversal on the cycles of the breakpoint graph [1].
Definition 1 (cyclesplitting reversal) A reversal that acts on a pair of divergent reality edges splits the cycle to which the edges belong, so are called cyclesplitting reversals.
where c(π) is the number of cycles in the breakpoint graph. Most cyclesplitting reversals are sorting reversals [15], but not all sorting reversals are cyclesplitting reversals, which indicates a gap between this lower bound and the reversal distance.
We must further explore structure in the permutation that allows us to predict the reversal distance when the lower bound is not realized.
Definition 2 (FCI [16]) A framed common interval (FCI) of a permutation (made circular by considering the first and last elements as being adjacent) is a substring of the permutation, as_{1}s_{2} ... s_{ k }b or bs_{1}s_{2} ... s_{ k }a such that

for each i, 1 ≤ i ≤ k, a < s_{ i } < b, and

for each l, a < l < b, there exists a j with s_{ j } = l, and

it is not a concatenation of substrings satisfying the previous two properties.
So the substring s_{1}s_{2} ... s_{ k } is a (possibly empty) signed permutation of the integers that are greater than a and less than b; a and b are the frame elements, while those of s_{1} ... s_{ k } are trunk elements if they are not trunk elements of a smaller FCI. The framed interval is said to be common, in that it also exists as an interval (a(a + 1)(a + 2)... b) in the identity permutation.
A component of a permutation is comprised of the trunk elements of an FCI that are not trunk elements of a shorter FCI, plus the frame elements. The permutation of Figure 1(b) has three components: one framed by elements 2 and 7, another framed by 4 and 6. The third is an interval in the circular sense, framed by elements 7 and 2 with the trunk comprised of elements 8 and 1; in the circular sense we have 7 < 8 < 1 < 2 here.
Definition 3 (bad component [16]) A bad component of a permutation is a component with at least 4 elements, where the sign of every element is the same.
In Figure 1(b), the component (2 4 6 3 7) is bad. The existence of one or more bad components in a permutation indicate exactly those situations where the lower bound cannot be met [1]. Siepel's paper [10] describes in detail an O(n^{3}) algorithm for finding the set of sorting reversals when bad components exist. While further exploration of Siepel's characterization of sorting reversals in the presence of bad components could eventually lead to a worstcase O(n^{2}) algorithm, we do not address the issue here. Suffice it to say that the averagecase complexity is O(n^{2}) even when the trivial O(n^{3}) algorithm  which in turn applies each of the O(n^{2}) reversals and checks in linear time [17] if the distance has decreased  is used on permutations with bad components. The probability that a permutation chosen uniformly at random has a bad component is O(n^{2}) [15, 18] and we can detect the presence of bad components in linear time [16, 17].
We focus on the bottleneck of sorting FCIs that do not correspond to bad components: cyclesplitting reversals that create bad components (cyclesplitting reversals that are not sorting reversals).
Definition 4 (bad reversal) A bad reversal is a reversal that creates a bad component.
Definition 5 (unsafe reversal [1]) An unsafe reversal is a cyclesplitting reversal that is bad.
In Figure 1(a), the reversal ρ(6, 8) is unsafe.
2.2 Outline
Known algorithms that list all sorting reversals check, one by one, if each of the potentially Ω(n^{2}) cyclesplitting reversals is unsafe by applying the reversal and then running a linear time check as to whether it produced a bad (unoriented) component [9, 10]. Instead of listing all cyclesplitting reversals and then checking them, we do the inverse: we predict which reversals may be unsafe (whether cyclesplitting or otherwise) and avoid listing them. We first characterize what we call ominous substrings of the permutation, those substrings that could be turned into a bad component with one reversal. Our algorithm searches for ominous substrings by doing the following: for each element of the permutation we posit that it is a smallest element of a potential (after a reversal) bad component and continue by scanning the permutation to detect an ominous substring.
3 Ominous Substrings
Take any unsafe cyclesplitting reversal ρ on permutation π. Since it is unsafe, the permutation π ○ ρ has at least one bad component created by ρ. In this section we will show that there exists in π a particular pattern  an ominous substring of π  indicating that ρ is unsafe. We first describe ominous substring of permutations with a single component.
3.1 Permutations with a Single Component
 1.
(eA XfB): where A, B, and X are substrings of the permutation. A has only positive while B has only negative elements.
 2.
(e ABC f): where A, B, and C are substrings of the permutation. A and C have only positive while B has only negative elements.
and A and B (and C if it exists) are comprised of exactly those elements with absolute value i for e < i < f.
In template 2, there already exists an FCI with frame elements e and f; the reversal that acts on exactly the elements of B fixes the elements of the interval to have the same sign. In the other template, a new interval is created with e and f as the frame elements, and {f} ∪ B ∪ X are the elements reversed. For example, (7 13452 6) matches template 2 with the unsafe reversal acting upon the elements {2,3,4,5}; A and C are empty in this case. (1 2 4 653) matches template 1 with the unsafe reversal acting upon the elements{3, 5, 6}; f = 5, B = {3}, and X = {6} in this case. (2684 1 5 7 93) matches the reverse of template 1 (B f XAe) with the unsafe reversal acting upon the elements {2, 3, 5, 7, 9} (or equivalently on the circular permutation, {1, 4, 6, 8}); A = {6, 8}, B = {5, 7}, and X = {2, 3} in this case.
Lemma 1 There is a one to one correspondence between bad reversals and ominous substrings.
Proof By definition, there exists at least one reversal that creates a bad component from an ominous substring. On the other hand, take a permutation π ○ ρ that has a bad component  with frame elements e and f  created by the reversal ρ. Say that the elements of the bad component are positive, then e is on the left and f is on the right. If ρ includes both e and f, this implies that the bad component already exists in π, which is a contradiction. Now let us examine the other three possibilities. If ρ does not include e and f, then the ominous substring in π corresponds to template 2. If ρ includes only f, then the ominous substring in π corresponds to template 1. If ρ includes only e, then the ominous substring in π corresponds to the reverse of template 1 where ρ acts upon the substring XBf (or equivalently, A eY, Y being the substring of π not matched by the reverse of template 1). If the elements of the bad component are negative then the negative analogue holds for each case. Since each ominous substring implies exactly one reversal dictated by the A, B, C, and X, we have the bijection.
3.2 Permutations with Multiple Components
We described ominous substrings on permutations with a single component. Since sorting reversals act only upon adjacencies in a single component [1], we adapt the techniques for single components to the case of multiple components in the following manner.
Consider a component of a permutation with some frame elements of a smaller FCI contained in it. We obtain the condensed version of the component by doing the following: for each smaller FCI contained in it, with pair of frame elements a and b (or a and b), we replace the FCI by a (resp. a) and change the magnitude m of every element m > b in the component to be m  (b a). The templates can be applied directly to the condensed component. For example, take the component C = (2 4 6 3 7) in Figure 1(b) where the component (4 5 6) is contained in it. The condensed version of C is (2 4 3 5). The condensed version of any component can be computed in linear time.
4 Detecting Ominous Substrings
We now turn to the task of detecting an ominous substring associated with a smallest element e. The following methods can be adapted to detect the negative analogue of each template, so we only describe the detection of the templates as shown in Section 3.1. The general outline used in each of the following algorithms is the same: we visit the permutation starting with element e, proceeding to element e + 1, then e + 2 and so on. At each step we maintain enough information to check whether conditions that indicate we have found an ominous substring hold.
Call the set of elements that we visit through the first i steps S_{ i } (those with absolute value in the interval [e, e + i]). Now consider the linearization of the circular permutation such that e is the leftmost positive element. To check for each template at step i (f = e + i in this case) we maintain the indices of the following elements visited so far:

Rightmost positive element: rp = max ({π^{1}( j)  j ∈ S_{ i }, j > 0})

Leftmost positive element: lp = min ({π^{1}( j)  j ∈ S_{ i }, j > 0})

Rightmost negative element: rn = max ({π^{1}( j)  j ∈ S_{ i }, j > 0})

Leftmost negative element: ln = min ({π^{1}( j)  j ∈ S_{ i }, j > 0})
 1.
lp = π^{1}(e)
(e is the leftmost element visited)
 2.
ln > rp
(the negative elements are to the right of the positive)
 3.
rn  ln + rp  lp = i  1
(the positive and negative elements are all contiguous)
 4.
π^{1}(e + i) = ln
(the last element visited is the leftmost negative element)
 5.
i ≥ 3
(the FCI has at least 4 elements)
 1.
lp = π^{1}(e)
(e is the leftmost element visited)
 2.
ln > lp
(the negative elements are to the right of some positive)
 3.
rp > rn
(the negative elements are to the left of some positive)
 4.
rp  lp = i
(we have visited a contiguous substring)
 5.
rn  ln = neg  1
(the negative elements of B are contiguous)
 6.
π^{1}(e + i) = rp
(the last element visited is the rightmost element visited)
 7.
i ≥ 3
(the FCI has at least 4 elements)
Note that if at some iteration i during our scan conditions 1 or 2 for any of the templates are broken, we know that e can no longer match that template.
5 The Algorithm
We begin by proving the following theorem.
Theorem 1 For a permutation without a bad component, there is an O(n^{2}) algorithm for listing all sorting reversals.
Proof Use the methods of Section 4 to obtain a blacklist of all ominous substrings associated with each possible smallest frame element e. Since the list of all ominous substrings associated with a single smallest frame element is obtained by a linear scan for all possible right endpoints f, the time to build the blacklist is O(n^{2}). Each element of the list is associated with a bad reversal, the indices of which we mark in an n by n matrix; an entry r at row i and column j indicates that the bad reversal r acts on elements from position i to position j in the permutation. Obtain the list of all cyclesplitting reversals in O(n^{2}) time using the standard methods [1]. Finally, examine this list one reversal at a time, removing from the list any reversal that has a corresponding entry marked in the matrix.
The methods described so far are applicable to permutations with no bad components. Permutations with bad components can be easily handled by combining our algorithm with that of Siepel [10] in the following way. First make a linear scan of the permutation to detect bad components [16, 17]. If there are bad components, use the O(n^{3}) algorithm of Siepel, otherwise, use our algorithm.
Theorem 2 Pick a signed permutation uniformly at random, the expected time the above algorithm takes to list all sorting reversals is O(n^{2}).
Proof The probability of seeing a bad component in a permutation taken uniformly at random from the set of all signed permutations is O(n^{2}) [15]. The bound follows since n^{3} × n^{2} < n^{2}.
6 Empirical Results
We implemented our ominous substring algorithm in Java (code available from the authors upon request). Preliminary experiments were done comparing the performance of the Java implementation of Siepel's O(n^{3}) algorithm from the package baobabLuna [19] to our averagecase O(n^{2}) algorithm. All tests were performed using a 2.16 GHz intel core 2 Duo processor with 1 GB of 667 MHz DDR2 SDRAM.
7 Conclusions
There are m  2 reversals that are unsafe where 1 is the left endpoint of a bad component that is created, there are m  3 reversals that are unsafe where 3 is the left endpoint of a bad component that is created, and so on. So there are unsafe reversals for the permutation of length n. This does not discount the possibility of an algorithm that runs in O(n + k) time where k is the number of sorting reversals, although it is currently unclear how to modify our algorithm to obtain this bound.
Declarations
Acknowledgements
A preliminary version of this article appeared in the proceedings of WABI 2010. The authors would like to thank M.D.V. Braga for providing the Java source for the baobabLUNA software [19].
Authors’ Affiliations
References
 Hannenhalli S, Pevzner PA: Transforming Cabbage into Turnip: Polynomial Algorithm for Sorting Signed Permutations by Reversals. J ACM. 1999, 46: 127. 10.1145/300515.300516View ArticleGoogle Scholar
 Bergeron A: A very elementary presentation of the HannenhalliPevzner theory. Discrete Applied Mathematics. 2005, 146 (2): 134145. 10.1016/j.dam.2004.04.010View ArticleGoogle Scholar
 Hannenhalli S, Pevzner P: Transforming mice into men (polynomial algorithm for genomic distance problems). Proc 36th Ann IEEE Symp Foundations of Comput Sci (FOCS'95). IEEE Press, Piscataway, NJ, 1995, 581592.View ArticleGoogle Scholar
 Kaplan H, Shamir R, Tarjan R: Faster and simpler algorithm for sorting signed permutations by reversals. SIAM J Computing. 1999, 29 (3): 880892.View ArticleGoogle Scholar
 Kaplan H, Verbin E: Efficient data structures and a new randomized approach for sorting signed permutations by reversals. Proc 14th Ann Symp Combin Pattern Matching (CPM'03), Volume 2676 of Lecture Notes in Computer Science, Springer Verlag, Berlin. 2003, 170185.Google Scholar
 Swenson KM, Rajan V, Lin Y, Moret BME: Sorting Signed Permutations by Inversions in O(n logn) Time. Proc 13th Ann Int'l Conf Comput Mol Biol (RECOMB'09), Volume 5541 of Lecture Notes in Computer Science, Springer. 2009, 386399.Google Scholar
 Tannier E, Bergeron A, Sagot MF: Advances on sorting by reversals. Disc Appl Math. 2007, 155 (67): 881888. 10.1016/j.dam.2005.02.033View ArticleGoogle Scholar
 Tannier E, Sagot M: Sorting by reversals in subquadratic time. Proc 15th Ann Symp Combin Pattern Matching (CPM'04), Volume 3109 of Lecture Notes in Computer Science, Springer Verlag, Berlin. 2004, 113.Google Scholar
 Ajana Y, Lefebvre JF, Tillier E, ElMabrouk N: Exploring the set of all minimal sequences of reversals  An application to test the replicationdirected reversal hypothesis. WABI '02: Proceedings of the Second International Workshop on Algorithms in Bioinformatics. London, UK: SpringerVerlag, 2002, 300315.View ArticleGoogle Scholar
 Siepel A: An algorithm to find all sorting reversals. Proc 6th Ann Int'l Conf Comput Mol Biol (RECOMB'02). ACM Press, New York, 2002.Google Scholar
 Lefebvre JF, ElMabrouk N, Tillier E, Sankoff D: Detection and validation of single gene inversions. Proc 11th Int'l Conf on Intelligent Systems for Mol Biol (ISMB'03), Volume 19 of Bioinformatics. Oxford U Press, 2003, i190i196.Google Scholar
 Sankoff D, Lefebvre JF, Tillier ERM, Maler A, ElMabrouk N: The Distribution of Inversion Lengths in Bacteria. Proc 1st Workshop Comp Genomics (RECOMBCG'04), Volume 3388 of Lecture Notes in Computer Science, Springer. 2004, 97108.Google Scholar
 Baudet C, Dias Z: An improved algorithm to enumerate all traces that sort a signed permutation by reversals. SIGAPP '10: Proceedings of the Twenty Fifth Symposium On Applied Computing. 2010.Google Scholar
 Braga M, Sagot M, Scornavacca C, Tannier E: The solution space of sorting by reversals. Bioinformatics Research and Applications: Proceedings from ISBRA 2007, Springer. 2007.Google Scholar
 Swenson K, Lin Y, Rajan V, Moret B: Hurdles hardly have to be heeded. Proc 6th Workshop Comp Genomics (RECOMBCG'08), Volume 5267 of Lecture Notes in Computer Science, Springer Verlag, Berlin. 2008, 239249.Google Scholar
 Bergeron A, Heber S, Stoye J: Common intervals and sorting by reversals: a marriage of necessity. Proc 2nd European Conf Comput Biol ECCB'02. 2002, 5463.Google Scholar
 Bader D, Moret B, Yan M: A LinearTime Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental Study. J Comput Biol. 2001, 8 (5): 483491. [A preliminary version appeared in WADS'01, pp. 365376], 10.1089/106652701753216503PubMedView ArticleGoogle Scholar
 Caprara A: On the tightness of the alternatingcycle lower bound for sorting by reversals. J Combin Optimization. 1999, 3: 149182. 10.1023/A:1009838309166View ArticleGoogle Scholar
 Braga MDV: baobabLUNA: the solution space of sorting by reversals. Bioinformatics. 2009, 25 (14): 18331835. 10.1093/bioinformatics/btp285PubMedPubMed CentralView ArticleGoogle Scholar
 Yang Y, Székely LA: On the Expectation and Variance of Reversal Distance. Acta Univ Sapientiae, Mathematica. 2009, 1: 520.Google Scholar
 Sankoff D, Haque L: The Distribution of Genomic Distance between Random Genomes. Journal of Computational Biology. 2006, 13 (5): 10051012. 10.1089/cmb.2006.13.1005PubMedView ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.