On the group theoretical background of assigning stepwise mutations onto phylogenies

  • Mareike Fischer1, 4,

    Affiliated with

    • Steffen Klaere2Email author,

      Affiliated with

      • Minh Anh Thi Nguyen3, 4 and

        Affiliated with

        • Arndt von Haeseler4

          Affiliated with

          Algorithms for Molecular Biology20127:36

          DOI: 10.1186/1748-7188-7-36

          Received: 17 October 2011

          Accepted: 10 December 2012

          Published: 15 December 2012



          Recently one step mutation matrices were introduced to model the impact of substitutions on arbitrary branches of a phylogenetic tree on an alignment site. This concept works nicely for the four-state nucleotide alphabet and provides an efficient procedure conjectured to compute the minimal number of substitutions needed to transform one alignment site into another. The present paper delivers a proof of the validity of this algorithm. Moreover, we provide several mathematical insights into the generalization of the OSM matrix to multi-state alphabets. The construction of the OSM matrix is only possible if the matrices representing the substitution types acting on the character states and the identity matrix form a commutative group with respect to matrix multiplication. We illustrate this approach by looking at Abelian groups over twenty states and critically discuss their biological usefulness when investigating amino acids.


          Maximum likelihood Maximum parsimony Substitution model Tree reconstruction Group theory


          Alignments of homologous sequences provide fundamental materials to the reconstruction of phylogenetic trees and many other sequence-based analyses (see, e.g., [1, 2]). Each alignment column (site) consists of character states that are assumed to have evolved from a common ancestral state by means of substitutions. Any combination of the character states in the aligned sequences at one alignment column represents a so-called character[3], which is sometimes also called site pattern[4]. Given a phylogenetic tree and an alignment that evolved along the tree, Klaere et al. [5] showed, for binary alphabets, how a character changes into another character if a substitution occurs on an arbitrary branch of the tree. The impact of such a substitution is summarized by the so-called One Step Mutation (OSM) matrix. The OSM matrix allows for analytical formulae to compute the posterior probability distribution of the number of substitutions on a given tree that give rise to a character [5].

          Nguyen et al. [4] extended the concept of the OSM matrix to the four-state nucleotide alphabet while developing a method, the Misfits algorithm, to evaluate the goodness of fit between models and data in phylogenetic inference. There, the OSM matrix is constructed based on the Kimura three parameter (K3ST) substitution model [6]. Nguyen et al. [4] illustrated how one can apply the Fitch algorithm [7] to compute the minimal number of substitutions required to change one character into another character under the OSM setting. In the present paper, we deliver a proof of the validity of this algorithm.

          In addition, the OSM matrix can be constructed only if the matrices representing the substitutions, the so-called substitution matrices, and the identity matrix form a commutative or Abelian group (see, e.g., [8]) with respect to matrix multiplication [4]. The link between Abelian groups in phylogenetic models has been studied before, most notably by Hendy et al. [9]. Further, an extension of nucleotide substitution models with an underlying Abelian group to joint states at the leaves of a tree has also been studied by other authors. Bashford et al. [10] introduced an approach very similar to OSM to study the multi-taxon tensor space. Bryant [11] also introduced a very similar framework to study the Hadamard transform of [12] in the light of multi-taxon processes.

          In this work, we first introduce standard phylogenetic notation. We then formalize the construction of the OSM matrix, and which part of its construction is used in the Misfits algorithm. We further present possible extensions of the OSM framework to arbitrary alphabets. We will show that the Misfits algorithm in fact computes the minimal number of substitutions needed to change one character into another character. Moreover, we discuss the extension of the algorithm to substitution models which do not have an underlying Abelian group. Finally, we discuss the Abelian groups available for amino acids.

          Notation and problem recapitulation


          Recall that a rooted binary phylogenetic X-tree is a tree T = ( V ( T ) , E ( T ) ) http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq1_HTML.gif with the following properties: There is one vertex ρ V ( T ) http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq2_HTML.gif with indegree 0 and outdegree 1, which is called the root of T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq3_HTML.gif. All edges e E ( T ) http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq4_HTML.gif are directed away from ρ, and all vertices v V ( T ) { ρ } http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq5_HTML.gif have indegree 1 and outdegree 0 or 2. Vertices with outdegree 0 are usually referred to as leaves of T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq6_HTML.gif. Remember that for an X-tree, there are exactly |X|=n leaves, which is why there is a bijection between the set of leaves of T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq7_HTML.gif and the taxon set X. Thus, when there is no ambiguity, we use the terms leaf and taxon synonymously. Moreover, we often just write “phylogenetic tree” or “tree” when referring to a rooted binary phylogenetic tree.

          Furthermore, recall that a character f is a function f : X C http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq8_HTML.gif for some set C : = { c 1 , c 2 , c 3 , , c r } http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq9_HTML.gif of r character states ( r N http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq10_HTML.gif). We denote by C n http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq11_HTML.gif the set of all r n possible characters on C http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq12_HTML.gif and n taxa. For instance, for the four-state DNA alphabet, C DNA = { A , G , C , T } http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq13_HTML.gif and the set C DNA n http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq14_HTML.gif consists of all 4 n possible characters.

          An extension of f to V ( T ) http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq15_HTML.gif is a map g : V ( T ) C http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq16_HTML.gif such that g(i)=f(i) for all i in X. For such an extension g of f, we denote by l T ( g ) http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq17_HTML.gif the number of edges e={u v} in T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq18_HTML.gif on which a substitution occurs, i.e. where g(u)≠g(v). The parsimony score of f on T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq19_HTML.gif, denoted by l T ( f ) http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq20_HTML.gif, is obtained by minimizing l T ( g ) http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq21_HTML.gif over all possible extensions g. Given a tree T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq22_HTML.gif and a character f on the same taxon set, one can easily calculate the parsimony score of f on T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq23_HTML.gif with the famous Fitch algorithm [7]. Moreover, when a character state changes along one edge of the tree, we refer to this state change as substitution or mutation. As for our purposes only so-called manifest mutations are relevant, i.e. those mutations that can be observed and are not reversed, we do not distinguish between mutations and substitutions, which is why we use these terms synonymously.

          Construction of the OSM matrix

          We now introduce the OSM framework in a stepwise fashion. The aim of the OSM approach is to determine the effects a single mutation occurring on a rooted tree T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq24_HTML.gif has on a character evolving on that tree.

          The first task of this approach is to formalize the term mutation and its effects on a single character state in C http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq25_HTML.gif. A mutation is an operation σ : C C http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq26_HTML.gif which is bijective, i.e. it satisfies the following condition:

          C1. For all c i C http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq27_HTML.gif there is a c j C http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq28_HTML.gif such that σ(c i )=c j , and if σ(c i )=σ(c j ), then c i =c j .

          This guarantees that a mutation affects a character state in a unique fashion. It is well-known that any bijective function on a finite discrete state set is a permutation (e.g., [13]). Thus, a mutation is a specific instance of a permutation applied to a character.

          The next step is to select the set Σ of admissible permutations acting on C http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq29_HTML.gif. It is mathematically convenient to select Σ such that it forms an Abelian group [9] with a regular (transitive and free) action on C http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq30_HTML.gif. Hence, Σ satisfies the following conditions:

          C2. For every pair c i , c j C http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq31_HTML.gif there is exactly one permutation σΣ such that σ (c i ) = c j , i.e., the action of Σ on C http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq32_HTML.gif is regular.

          C3. For all σ1,σ2Σ also the product σ1σ2Σ. Mathematically speaking, Σ is closed with respect to concatenation of its permutations.

          C4. For all σ1,σ2Σ we have σ 1 σ 2 = σ 2 σ 1 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq33_HTML.gif Thus, Σ is commutative, and hence the order in which we assign permutations is irrelevant for the outcome.

          C5. There is an element σ0Σ such that for all σ1Σ we have σ 1 σ 0 = σ 0 σ 1 = σ 1 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq34_HTML.gif, i.e. there exists a so-called neutral element, namely the identity, in Σ. For all c i C http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq35_HTML.gif only σ0(c i ) = c i , i.e. σ i is fixed point free for all σ i σ0.

          C6. For every σ1Σ there exists a σ2Σ such that σ1σ2 = σ0. Mathematically speaking, for every element of Σ there exists an inverse element. This guarantees that every permutation can be reversed within a single step.

          C7. For all σ1,σ2,σ3Σ we have σ1 ∘ (σ2σ3) = (σ1σ2) ∘ σ3 = σ1σ2σ3, i.e. the associative law holds.

          It should be noted that any set of permutations is associative, i.e. satifies C7. Thus, for a set of permutations Σ to be Abelian with a regular action on C http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq36_HTML.gif it only needs to satisfy C1−C6.

          In the following, we consider the matrix representation of permutations. A permutation matrix over C http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq37_HTML.gif is an r×r matrix such that σ c i c j = 1 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq38_HTML.gif if σ(c i )=c j , and 0 otherwise. We consider it equivalent to discuss a permutation or its corresponding matrix. Therefore, concatenation “∘” is equivalent to the matrix multiplication “·”. We use σ to denote a permutation or a permutation matrix, depending on the context.

          Example 1. In genetics, the most commonly used character state set is C DNA = { A , G , C , T } http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq39_HTML.gif. There are two different Abelian groups for four states, namely the Klein-Four-group Z 2 × Z 2 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq40_HTML.gifand the cyclic group Z 4 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq41_HTML.gif. The Klein-Four-group is constructed from the cyclic group Z 2 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq42_HTML.gifover two elements, the identity τ0and the flip τ1. These take the matrix form
          τ 0 = 1 0 0 1 , τ 1 = 0 1 1 0 . http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_Equa_HTML.gif
          The Klein-Four-group consists of the four Kronecker products of these two matrices, i.e. s0 = τ0τ0, s1 = τ1τ0, s2 = τ0τ1, and s3 = τ1τ1. The Kronecker products here yield 4×4 matrices, e.g.,
          s 1 = τ 1 τ 0 = 0 τ 0 τ 0 0 = A C G T ( 0 0 1 0 0 0 0 1 1 0 0 0 0 1 0 0 ) A C G T . http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_Equb_HTML.gif

          The set ΣK3ST:={s0s1s2s3} coincides with the substitution matrices under the Kimura 3ST model[6]. In particular, s1describes transitions within purines (A G) and pyrimidines (C T), s2represents transversions within pairs (A C) and (G T), and s3represents the remaining set of transversions within pairs (A T) and (C G).

          The second Abelian group over four states, the cyclic group Z 4 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq43_HTML.gif, is formed by selecting a 4-cycle, e.g., AGTCA and concatenating this cycle with itself. The resulting set of permutations Σ Z 4 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq44_HTML.gifcontains the following elements:
          s 1 = A C G T ( 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 ) A C G T , s 2 = s 1 2 = s 1 · s 1 , s 3 = s 1 3 , s 0 = s 1 4 . http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_Equc_HTML.gif

          Note that there are actually six different four-cycles for C DNA http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq45_HTML.gif. These result in three distinguishable Abelian groups. Bryant[14]generates his cyclic group with the four-cycle ACGTA, and shows that the resulting set ΣK2STunderlies the Kimura 2ST model[15], where s 2 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq46_HTML.gifcorresponds to the transition within purines and pyrimidines, and s 1 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq47_HTML.gifand s 3 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq48_HTML.gifare the (not further distinguished) transversions.

          The next step in constructing the OSM matrix is to construct a set Σ T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq49_HTML.gif of operations over C n http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq50_HTML.gif governed by T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq51_HTML.gif, and based on the permutation set Σ. To this end, we first define Σ n as a set of operations which work elementwise, i.e. for f = ( f 1 , , f n ) C n http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq52_HTML.gif and σΣ n we have
          σ ( f ) : = ( σ 1 ( f 1 ) , , σ n ( f n ) ) , σ i Σ. http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_Equd_HTML.gif
          This can also be described by the Kronecker product, i.e. equally
          σ ( f ) = σ 1 σ n ( f ) . http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_Equ1_HTML.gif

          This means that there are r n different operators in Σ n =Σ⊗⋯⊗Σ.

          Remark 1. Therefore, for any pair of characters f , g C n http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq53_HTML.gifwe can find an operation σΣ n such that σ (f) = g.

          Another noteworthy consequence of using the Kronecker product is that the elements of Σ n are permutations over C n http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq54_HTML.gif[16, 17], and in fact Σ n satisfies our Conditions C1−C7, i.e. Σ n is an Abelian group over C n http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq55_HTML.gif.

          In the OSM framework we assume that the permutations acting on a character f C n http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq56_HTML.gif are derived from the underlying rooted tree T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq57_HTML.gif. If permutation σ i Σ acts on the pendant edge leading to taxon jX, then the associated permutation matrix σj,iacting on C n http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq58_HTML.gif has the form
          σ j , i : = l = 1 j 1 σ 0 σ i l = j + 1 n σ 0 . http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_Eque_HTML.gif
          If a permutation acts on an interior edge e, then it simultaneously acts on the states of all descendant taxa of e, i.e. all those taxa whose path to the root passes e. E.g., assume Taxa 1 and 2 form a cherry, i.e. their most recent common ancestor, 12, has no other descendants, and permutation σ i Σ, i=1,…,r−1 is acting on the edge leading to this ancestor. Then, we get the permutation
          σ 12 , i : = σ i σ i σ 0 σ 0 = σ 1 , i · σ 2 , i . http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_Equ2_HTML.gif
          This shows in particular that a Kronecker product of some permutations acting on each character state is equivalent to the matrix product of the permutations acting on the entire character. The right hand side equation shows that a single permutation on an internal edge has the same effect as simultaneously applying the same permutation on the pendant edges of all descendant taxa. In other words, if de(e) denotes the set of descendants of edge e, and σ i Σ, then
          σ e , i = j de ( e ) σ j , i . http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_Equ3_HTML.gif
          Note that the set Σ X of all permutations acting on the pendant edges is a generator of Σ n , i.e. the closure of Σ X contains all permutations in Σ n . Since Σ n contains a single permutation to transform character f C n http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq59_HTML.gif into g C n http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq60_HTML.gif, and since Σ X generates Σ n , there is a shortest chain of permutations in Σ X which transforms f into g. Σ X is also the set of permutations implied by the star tree for X. In general, the set of all permutations on tree T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq61_HTML.gif is
          Σ T = σ e , i : e E ( T ) , i { 0 , , r 1 } , http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_Equf_HTML.gif

          where r is the number of states in Σ.

          For every X-tree T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq62_HTML.gif we have Σ T Σ X http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq63_HTML.gif, and therefore Σ T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq64_HTML.gif is a generator for Σ n , too. An illustration of such a generator set Σ T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq65_HTML.gif over the character set C n http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq66_HTML.gif is the so-called Cayley graph[18], which has as vertices the characters of C n http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq67_HTML.gif, and two characters f , g C n http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq68_HTML.gif are connected if there is a permutation σ Σ T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq69_HTML.gif such that σ(f)=g. In [5] Cayley graphs have been presented as alternative illustrations of the tree T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq70_HTML.gif over a binary state set C = { 0 , 1 } http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq71_HTML.gif.

          Example 2. Regard the K3ST model from Example 1 and the rooted two-taxon tree depicted in Figure1a. With this Σ K3ST T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq72_HTML.gifis given by the set
          s e 1 , 1 : = s 1 s 0 , s e 2 , 1 : = s 0 s 1 , s e 12 , 1 : = s 1 s 1 , s e 1 , 2 : = s 2 s 0 , s e 2 , 2 : = s 0 s 2 , s e 12 , 2 : = s 2 s 2 , s e 1 , 3 : = s 3 s 0 , s e 2 , 3 : = s 0 s 3 , s e 12 , 3 : = s 3 s 3 . http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_Equg_HTML.gif
          Each permutation which acts on the characters is thus a symmetric 16×16 permutation matrix depicting a transition (se,1), transversion 1 (se,2), or transversion 2 (se,3) along edge e E ( T ) http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq73_HTML.gif. Figures 1b-d display the permutation matrices for a transition on branch e1( s e 1 , 1 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq74_HTML.gif), e2( s e 2 , 1 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq75_HTML.gif) and e12( s e 12 , 1 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq76_HTML.gif), respectively. Figure 1e shows the Cayley graph associated with Σ K3ST T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq77_HTML.gif.
          Figure 1

          Construction of the OSM matrix. (a) A rooted tree with taxa 1 and 2. (b) A transition s1on the left branch e1(the red branch) changes a character into exactly one new character as depicted by the red horizontal stripe cells of the permutation matrix σ e 1 , s 1 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq78_HTML.gif. The matrix has 16 rows and 16 columns representing the possible characters for the alignment of two nucleotide sequences. The permutation matrices generated by s1for the right branch e2(blue) and for the branch leading to the “root” e12(green) are displayed in (c) and (d), respectively. The corresponding Cayley graph for the tree is illustrated in (e). The convex sum of all the weighted (by the relative branch length and the probability of the substitution type) permutation matrices generated by all substitution types for all branches is the OSM matrix of the tree ( M T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq79_HTML.gif) as shown in (f). Horizontal stripe cells represent the probability of the transition s1; diagonal stripes the transversion s2; and thin reverse diagonal stripes the transversion s3. The colors of these cells indicate the relative branch lengths and follow the colors of the branches as in (a). Thus, these colors also depict the branch origin of the substitutions.

          We are now in a position to recall the definition of the OSM matrix M T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq80_HTML.gif for a rooted binary phylogenetic tree T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq81_HTML.gif as explained in [5] and [19]. For an edge e E ( T ) http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq82_HTML.gif we denote by p e the relative branch length of e, i.e. its actual branch length (expected number of substitutions per site) divided by the length of T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq83_HTML.gif (the sum of all branch lengths). Thus, one can view p e as the probability that a mutation is observed at edge e assuming that a single mutation occurred on T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq84_HTML.gif. Clearly, e E ( T ) p e = 1 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq85_HTML.gif. Further, denote by αe,i the probability that this mutation on e is of type i∈{1,…,r−1} with i 1 r 1 α e , i = 1 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq86_HTML.gif for all e E ( T ) http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq87_HTML.gif. Then the OSM matrix is the convex sum of the elements in Σ T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq88_HTML.gif, where each permutation σe,i is multiplied by αe,ip e , the probability of hitting the edge e with permutation σ i Σ. Thus, we obtain:
          M T = e E ( T ) i = 1 r 1 α e , i p e σ e , i . http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_Equ4_HTML.gif

          M T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq89_HTML.gif can be regarded as the weighted exchangeability matrix for all characters under the K3ST model assuming that a single substitution occurs on the tree T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq90_HTML.gif. Figure 1f depicts the OSM matrix for the tree in Figure 1a. Here, colors indicate relative branch lengths p e , and patterns denote permutation types α i . E.g., a blue square with horizontal lines indicates the product p e 2 α e 2 , 1 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq91_HTML.gif, i.e. the probability of observing a transition s1on edge e2.

          The transformation problem

          With the construction of Σ T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq92_HTML.gif we have generated the tools needed to formally describe the computations in Step 4 of the Misfits algorithm [4]. Given a rooted tree T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq93_HTML.gif and two characters f and f d in C n http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq94_HTML.gif, we want to compute the minimal number of substitutions required on the tree to convert f into f d . [4] presented an efficient procedure to compute this minimal number of substitutions.

          Algorithm 1

          INPUT: rooted binary phylogenetic tree T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq95_HTML.gif on leaf set X, characters f and f d on X, Abelian group Σ.

          STEP 1: Using Remark 1, find the substitution type σ i which translates f j into f j d http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq96_HTML.gif for all positions j=1,…,|X|. Let σΣ n be the resulting operation, i.e. σ(f)=f d .

          STEP 2: Let c:=c1c1be a constant character on X with c 1 C http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq97_HTML.gif. Let h:=σ(c).

          STEP 3: Calculate m : = l T ( h ) http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq98_HTML.gif.

          OUTPUT: m.

          We prove the correctness of our algorithm. In our framework, m corresponds to the minimum number of permutations σ1,…,σ m Σ such that σ1⊗⋯⊗σ m (f)=f d . In this form, m has multiple equivalent interpretations. It is the length of the shortest path between f and f d in the Cayley graph for Σ T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq99_HTML.gif, where this path corresponds to σ1⊗⋯⊗σ m . Further, m corresponds to the minimum power (k) of M T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq100_HTML.gif such that M T j ( f , f d ) = 0 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq101_HTML.gif for j<k and M T k ( f , f d ) > 0 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq102_HTML.gif, because a positive entry in M T k http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq103_HTML.gif means that there is a concatenation of k permutations connecting the associated characters.

          Example 3. Figure2demonstrates how Algorithm 4 works under the K3ST model, i.e. when the group is Σ = Σ K3ST (Figure2a). Consider the rooted five-taxon tree in Figure2band the character GTAGA at the leaves. Assume that the character GTAGA is to be converted into character ACCTC. By comparing the two characters position-wise, we need a substitution s1on the external branch leading to taxon 1 to convert G into A at the first position. Similarly, we need a substitution s1on the external branch leading to taxon 2, and a substitution s2on every external branch leading to taxa 3, 4, and 5. Thus, the operation s:=(s1s1s2s2s2) transfers the character GTAGA into the character ACCTC. As the operation s also translates the constant character AAAAA into GGCCC, converting GTAGA into ACCTC is equivalent to evolving the character state A at the root along the tree to obtain the character GGCCC at the leaves. The Fitch algorithm[7]applied to the character GGCCC with the constraint that the character state at the root is A produces a unique most parsimonious solution of two substitutions as depicted by Figure2c.
          Figure 2

          Computing the minimal number of substitutions to translate a character into another one. (a) depicts the Klein-four group ΣK3ST, which consists of the identity s0and the three substitution types s1,s2,s3from the K3ST model. (b) In order to convert the character GTAGA into ACCTC under ΣK3ST, we need to introduce the operation s:=(s1,s1,s2,s2,s2). As the operation s also translates the constant character AAAAA to GGCCC, converting GTAGA into ACCTC is equivalent to evolving the character state A at the root along the tree to obtain the character GGCCC at the leaves. The Fitch algorithm applied to the latter produces a unique most parsimonious solution of two substitutions as depicted by (c).


          The impact of parsimony on the estimation of substitutions

          In this section, we provide some mathematical insights into the role of maximum parsimony in the estimation of the number of substitutions needed to convert a character into another one as explained above. In particular, we deliver a proof for Algorithm 4.

          Theorem 1. Let T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq104_HTML.gifbe a rooted binary phylogenetic tree on taxon set X and let f be a character that evolved on T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq105_HTML.gifdue to some evolutionary model and let f d be another character on X. Then, the minimum number of substitutions to be put on T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq106_HTML.gifwhich change the evolution of f in such a way that f d is generated can be calculated with Algorithm 4.


          Let f, f d , X, T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq107_HTML.gif and Σ be as required for the input of Algorithm 4. Then, as defined in the algorithm, we have σ ̂ ( f ) = ( σ ̂ 1 ( f 1 ) , σ ̂ 2 ( f 2 ) , , σ ̂ n ( f n ) ) = f d http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq108_HTML.gif, where σ ̂ j Σ http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq109_HTML.gif refers to the substitution type needed to translate f j into f j d http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq110_HTML.gif.

          Considering the underlying tree T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq111_HTML.gif, we may assume σ ̂ 1 , , σ ̂ n http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq112_HTML.gif act on the pending branches leading to taxa 1,…,n, respectively.

          Now we show that it is equivalent to consider σ ̂ ( c ) http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq113_HTML.gif, where c is a constant character, instead of σ ̂ ( f ) http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq114_HTML.gif. Let μ Σ T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq115_HTML.gif be a transformation with μ(f)=f d . Then,
          σ ̂ 1 μ ( f ) = σ ̂ 1 ( f d ) = f. http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_Equ5_HTML.gif
          Next, let σ ~ Σ T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq116_HTML.gif be such that σ ~ ( c ) = f http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq117_HTML.gif. Then, using (5), we have
          σ ̂ 1 μ σ ~ ( c ) = σ ̂ 1 μ ( f ) = f = σ ~ ( c ) . http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_Equh_HTML.gif
          On the other hand, we can use the commutativity of the underlying Abelian group to derive
          σ ̂ 1 μ σ ~ ( c ) = σ ~ σ ̂ 1 μ ( c ) . http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_Equi_HTML.gif
          So altogether we have
          σ ̂ 1 μ σ ~ ( c ) = σ ~ σ ̂ 1 μ ( c ) = σ ~ ( c ) http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_Equj_HTML.gif
          and therefore σ ̂ 1 μ ( c ) = c http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq118_HTML.gif and thus μ ( c ) = σ ̂ ( c ) http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq119_HTML.gif. As μ was arbitrarily chosen, this implies that any transformation which maps f to σ ̂ ( f ) = f d http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq120_HTML.gif also maps c to σ ̂ ( c ) http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq121_HTML.gif. Therefore, we have
          { ρ Σ T : ρ ( f ) = f d } = { ρ Σ T : ρ ( c ) = σ ̂ ( c ) } . http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_Equk_HTML.gif

          The minimum number of substitutions to change f from f d on T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq122_HTML.gif is just an element of the first set consisting of the fewest number of compositions. As the two sets are equal, we can investigate the second set rather than the first. So we need an element of the second set which consists of as few as possible compositions. Assuming that σ=σ1⊗⋯⊗σ n , we can assign σ1,…,σ n to the pending branches of T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq123_HTML.gif and treat them like character states to which we then apply the Fitch algorithm. This completes the proof. □

          Informally speaking, the idea is as follows: As there is exactly one path from the root ρ to any taxon xX, we wish to determine whether we can ‘pull up’ some of the operations along this path in order to affect more than one taxon and still give the same result. This idea has been described above (Equations (2) and (3)), and it coincides precisely with the idea of the parsimony principle.

          However, in order to avoid confusion regarding the operation σ as a character on which to apply parsimony, Algorithm 4 instead acts on the constant character. Clearly, in order to evolve the constant character c:=c1c1 on a tree with root state c1, the corresponding operation would be σ ~ : = σ 0 σ 0 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq124_HTML.gif. Note that σ(c)=h and σ(f)=f d , and that two character states in h are identical if and only if the corresponding substitutions in σ are identical, too. Therefore, it is possible to let MP act on h rather than directly on σ.

          By the definition of maximum parsimony, when applied to h on tree T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq125_HTML.gif with given root state c1, it calculates the minimum number m of substitutions to explain h on T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq126_HTML.gif. This number m is therefore precisely the number of substitutions needed to generate h on T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq127_HTML.gif rather than c. As σ(f)=f d , m also is the number of substitutions needed to generate f d from f on T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq128_HTML.gif.

          The impact of different groups

          For any alphabet C http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq129_HTML.gif, there might be more than one Abelian group. Different groups might result in different numbers of substitutions required to translate a character into another character. We illustrate this observation using the following example.

          Example 4. Recall the starting point of Example 3, i.e. regard the five-taxon tree T from Figure3b, and the characters f = GTAGA and f d = ACCTC. Now, instead of using ΣK3STwe use the permutations from the cyclic group Σ Z 4 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq130_HTML.gif. In this setting, we need a substitution s 3 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq131_HTML.gif(blue in Figure3a) on the external edge leading to taxon 1 to convert G into A at the first position, and so on. Thus, we get the operation s 2 : = ( s 3 , s 1 , s 3 , s 1 , s 3 ) http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq132_HTML.gifsuch that s (f) = f d . We immediately see, that s transforms the constant character c = AAAAA into h = CGCGC. The Fitch algorithm applied to the character CGCGC with the constraint that the character state at the root is A produces a unique most parsimonious solution of three substitutions as depicted by Figure3c. Thus, under the Σ c group we need one substitution more than under the ΣK3STgroup (cf. Example 3).
          Figure 3

          Converting one character into another character using the cyclic group. (a) depicts the cyclic group Σ c , which consists of the identity s 0 s 0 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq133_HTML.gif and the three substitution types s 1 , s 2 , s 3 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq134_HTML.gif for nucleotide character states. (b) In order to convert the character GTAGA into ACCTC using this group, we need to introduce the operation s : = ( s 3 , s 1 , s 3 , s 1 , s 3 ) http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq135_HTML.gif. As the operation s also transforms the constant character AAAAA to CGCGC, converting GTAGA into ACCTC is equivalent to evolving the character state A at the root along the tree such that the character CGCGC is attained at the leaves. The Fitch algorithm applied to the latter produces a unique most parsimonious solution of three substitutions as depicted by (c).

          Note that variation of the minimum number of substitutions needed to translate a character into another one between different groups is not surprising: As different substitution types are needed to translate one pattern into the other one, depending solely on the underlying group, one group might need the same substitution type for some neighboring branches in the tree and another group different ones. Informally speaking, this would imply that in the first case, the substitution could be “pulled up” by the Fitch algorithm to happen on an ancestral branch, whereas in the second case this would not be possible.

          The link between substitution models and permutation matrices

          In Examples 1 and 2 we have shown that the K3ST substitution model can be included into our framework. The connection between the Klein-Four-group and the K3ST model (as well as the one between the Z 2 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq136_HTML.gif group and symmetric 2-state model) were described in-depth in [9]. This section aims at discussing alternative models and how to identify their use (or lack thereof) for our approach.

          Most substitution models assume the independence of the different branches of a tree to compute the joint probability of the characters in C n http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq137_HTML.gif. Therefore, they use the probabilities for substitutions among the character states in C http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq138_HTML.gif along the edges of the tree T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq139_HTML.gif. We now establish a probabilistic link between Σ T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq140_HTML.gif and C n http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq141_HTML.gif. This link is provided by Birkhoff’s theorem:

          Theorem 2 (Birkhoff’s theorem, e.g.,[20], Theorem 8.7.1). A matrix M is doubly stochastic, i.e., each column and each row of M sum to 1, if and only if for some N < ∞ there are permutation matrices σ1, …, σ N and positive scalars α1, …, α N [0,1] such that α1 + ⋯ + α n = 1 and M = α1σ1 + ⋯ + α N σ N .

          Therefore, the weighted sum of the permutation matrices in Σ T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq142_HTML.gif yields a doubly stochastic matrix M T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq143_HTML.gif as introduced above. M T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq144_HTML.gif also describes a random walk on C n http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq145_HTML.gif governed by T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq146_HTML.gif where the single step in C n http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq147_HTML.gif is illustrated by the associated Cayley graph. Its stationary distribution is uniform, i.e. when we throw sufficiently many mutations on T http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq148_HTML.gif then we expect to see each character with probability 1/r n .

          Another, even more useful consequence of Birkhoff’s theorem is the fact that it tells us which substitution models are suited for the OSM approach. If the transition matrix associated with the substitution model is doubly stochastic, then we find a set of permutations which give rise to the model.

          Let us see how this influences the symmetric form of the general time reversible model (sGTR) with uniform stationary distribution. It has the transition probability matrix
          P sGTR = A C G T ( 1 a b c a b c A a 1 a d e d e C b d 1 b d f f G c e f 1 c e f ) T . http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_Equl_HTML.gif
          Assigning permutation matrices to the respective parameters yields the set ΣsGTR with elements s0 (identity) and
          s a = 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 1 , s b = 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 s c = 0 0 0 1 0 1 0 0 0 0 1 0 1 0 0 0 , s d = 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 , s e = 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 , s f = 1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 . http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_Equm_HTML.gif
          The weighted sum of the non-identity elements yields
          a s a + b s b + c s c + d s d + e s e + f s f = d + e + f a b c a b + c + f d e b d a + c + e f c e f a + b + d , http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_Equn_HTML.gif

          which is equal to PsGTRif a + b + c + d + e + f=1. Thus, the set ΣsGTRis to sGTR what ΣK3ST is to K3ST. However, ΣsGTR does not satisfy condition C5, because s a ,⋯, s f are not fixed point free. This can be seen as the main diagonal of s a ,⋯, s f does not only contain zeros. It is also not commutative (condition C4) as e.g. s a ·s c s c ·s a . And it is not closed under matrix multiplication (condition C3), which means that a concatenation of permutations in ΣsGTR might lead to a new permutation not in ΣsGTR, e.g., s a ·s f ΣsGTR. Other complex models like Tamura-Nei [21] do not even permit the decomposition of its transition matrix into the convex sum of permutation matrices. All of this shows why the overall applicability of complex models to the OSM approach is rather limited.

          There are other approaches to describe phylogenetic models based on the group structure of their substitution matrices. In particular, Sumner et al. [22] use Lie algebra to construct OSM type matrices for the general Markov model, and discuss shortcomings of the group structure for the general GTR model [23].

          Application to other biologically interesting sets

          As stated above, OSM-type models require an underlying Abelian group. Thus, the OSM setting is applicable not only to binary data or four-state (DNA or RNA) data, but also to alphabets of 16 (doublets), 64 (codons), and 20 characters (amino acids) respectively. We compare such extensions to existing biologically motivated binning approaches and discuss their relevance.

          As we have shown in the previous sections, the symmetric form of the Klein-Four-Group Z 2 × Z 2 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq149_HTML.gif is mathematically beautiful, computationally convenient and biologically relevant. Similar statements can be made about all powers of Z 2 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq150_HTML.gif, including the biologically relevant alphabets of 16 (doublets) and 64 (codons) letters.

          There are four Abelian groups for twenty-state alphabets, namely Z 2 × Z 2 × Z 5 , Z 4 × Z 5 , Z 2 × Z 10 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq151_HTML.gif, and the cyclic group Z 20 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq152_HTML.gif (see e.g., [24] for a complete list of all groups with up to 35 elements). Their construction is analogous to the construction of the Klein-Four-group in Example 1. For example, the elements of Z 4 × Z 5 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq153_HTML.gif are Kronecker products of one of the four permutations in the cyclic group Z 4 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq154_HTML.gif with one of the five permutations of the cyclic group Z 5 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq155_HTML.gif.

          Figure 4 shows a heat-map type visualization of an OSM-type matrix on a single-leaf tree where the coloring of the cells corresponds to the weights given to the 20 permutations in the respective groups. We see that the coloring pattern nicely reflects the four cosets of the subgroup Z 5 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq156_HTML.gif in Z 2 × Z 2 × Z 5 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq157_HTML.gif. This can also be interpreted as a binning of the 20 states in the underlying alphabet into four sets of five elements each. If the weighting corresponds to a convex combination of operations, then the visualized matrix is doubly stochastic.
          Figure 4

          Matrices illustrate the four Abelian groups for a twenty-state alphabet. (a) the Z 2 × Z 2 × Z 5 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq158_HTML.gif group, (b) Z 4 × Z 5 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq159_HTML.gif, (c) Z 2 × Z 10 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq160_HTML.gif, and (d) Z 20 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq161_HTML.gif. Each matrix visualizes the cosets of the subgroups of the depicted group and suggests an associated grouping of the 20 states.

          Binnings are also done for amino acids, using either biochemical properties or evolutionary divergence. An example of a biochemical binning is the hydrophobic index, where the 20 amino acids are binned into four groups, very hydrophobic, hydrophobic, neutral, and hydrophilic. Unfortunately, this binning does not correspond to any of the proposed Abelian groups. Moreover, it is difficult to derive transitions between these groups just from the biochemical properties.

          Transition matrices for evolutionary models for amino acid substitutions are usually generated by counting mutation types in the alignments (see, e.g., [25] for an overview). From these, optimal groupings can be obtained using clustering approaches [26]. The existence of estimates for the transition probability between all amino acids provides the possibility to get further information about between-group operations. These groupings could be forced to fit Abelian groups. However, as indicated in [26] a grouping into four groups of five amino acids each is rarely optimal.


          In this paper, we provide the necessary mathematical background for the OSM setting which was introduced and used previously [4, 19], but had not been analyzed mathematically for more than two character states. Moreover, the present paper also delivers new insight concerning the requirements for the OSM model to work: In fact, we were able to show that mathematically, it is sufficient to have an underlying Abelian group – which shows a generalization of the OSM concept that was believed to be impossible previously [4]. Therefore, we show that OSM is applicable to any number of states.

          However, note that the original intuition of the authors in [4] was biologically motivated: The authors supposed that the group not only has to be Abelian, but also symmetric in the sense that each operation can be undone by being applied a second time. Thinking about DNA, for instance, this works: For example, the transition from A to G can be reverted by another substitution of the same type, namely a transition from G to A. This symmetry condition is fulfilled by the Klein-Four-group, but not by the cyclic group on four states.

          While the OSM approach can be extended to any number of states, its biological relevance becomes somewhat obscure when there is no corresponding group which is a power of Z 2 http://static-content.springer.com/image/art%3A10.1186%2F1748-7188-7-36/MediaObjects/13015_2011_Article_171_IEq162_HTML.gif. In particular, there are four distinct Abelian groups for 20 states, but none fits a biologically meaningful binning of the 20 amino acids.



          SK thanks Marston Conder for fruitful discussions on the group theoretical background and Jessica Leigh for enlightening discussions on biochemical and evolutionary binnings of amino acids. This work is financially supported by the Wiener Wissenschafts-, Forschungs- and Technologiefonds (WWTF). AvH also acknowledges the funding from the DFG Deep Metazoan Phylogeny project, SPP (HA1628/9) and the support from the Austrian GEN-AU project Bioinformatics Integration Network III.

          Authors’ Affiliations

          Department for Mathematics und Computer Science, Ernst-Moritz-Arndt-University Greifswald
          Department of Statistics and School of Biological Sciences, University of Auckland
          Groningen Bioinformatics Centre, University of Groningen
          Center for Integrative Bioinformatics ViennaMax F. Perutz Laboratories, University of Vienna, Medical University of Vienna, University of Veterinary Medicine Vienna


          1. Durbin R, Eddy SR, Krogh A, Mitchison G: Biological sequence analysis - Probabilistic models of proteins and nucleic acids. Cambridge: Cambridge University Press, 1998.View Article
          2. Mount DW: Bioinformatics: Sequence and Genome Analysis. New York: Cold Spring Harbor, 2004.
          3. Semple C, Steel M: Phylogenetics. New York: Oxford University Press, 2003.
          4. Nguyen MAT, Klaere S, von Haeseler A: MISFITS: evaluating the goodness of fit between a phylogenetic model and an alignment. Mol Biol Evol. 2011, 28: 143-152. 10.1093/molbev/msq180PubMedView Article
          5. Klaere S, Gesell T, von Haeseler A: The impact of single substitutions on multiple sequence alignments. Philos T R Soc B. 2008, 363: 4041-4047. 10.1098/rstb.2008.0140View Article
          6. Kimura M: Estimation of Evolutionary Distances between Homologous Nucleotide Sequences. P Natl Acad Sci USA. 1981, 78: 454-458. 10.1073/pnas.78.1.454View Article
          7. Fitch WM: Toward defining the course of evolution: Minimum change for a specific tree topology. Syst Zool. 1971, 20: 406-416. 10.2307/2412116View Article
          8. Humphreys JF: A course in group theory. New York: Oxford University Press, 1996.
          9. Hendy M, Penny D, Steel M: A discrete Fourier analysis for evolutionary trees. P Natl Acad Sci USA. 1994, 91: 3339-3343. 10.1073/pnas.91.8.3339View Article
          10. Bashford JD, Jarvis PD, Sumner JG, Steel MA: U(1)×U(1)×U(1) symmetry of the Kimura 3ST model and phylogenetic branching processes. J Phys A: Math Gen. 2004, 37: L81—L89.View Article
          11. Bryant D: Hadamard Phylogenetic Methods and the n-taxon process. Bull Math Biol. 2009, 71 (2): 339-351. 10.1007/s11538-008-9364-8PubMedView Article
          12. Hendy MD, Penny D: A framework for the quantitative study of evolutionary trees. Syst Zool. 1989, 38 (4): 297-309. 10.2307/2992396View Article
          13. MacLane S, Birkhoff G: Algebra. Chelsea: American Mathematical Society, 1999.
          14. Bryant D: Extending Tree Models to Split Networks. Algebraic Statistics for Computational Biology. Edited by: Pachter L, Sturmfels B. Cambridge: Cambridge University Press, 2005.
          15. Kimura M: A Simple Method for Estimating Evolutionary Rates of Base Substitutions through Comparative Studies of Nucleotide Sequences. J Mol Evol. 1980, 16: 111-120. 10.1007/BF01731581PubMedView Article
          16. Horn RA, Johnson CR: Topics in matrix analysis. New York: Oxford University Press, 1991.View Article
          17. Steeb WH, Hardy Y: Matrix Calculus and Kronecker Product: A Practical Approach to Linear and Multilinear Algebra. Singapore: World Scientific Publishing, 2011.View Article
          18. Cayley A: Desiderata and Suggestions: No. 2. The Theory of Groups: Graphical Representation. Am J Math. 1878, 1 (2): 174-176. 10.2307/2369306View Article
          19. Nguyen MAT, Gesell T, von Haeseler A: ImOSM: Intermittent Evolution and Robustness of Phylogenetic Methods. Mol Biol Evol. 2012, 29 (2): 663-673. 10.1093/molbev/msr220View Article
          20. Horn RA, Johnson CR: Matrix analysis. Cambridge: Cambridge University Press, 1990.
          21. Tamura K, Nei M: Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol. 1993, 10 (3): 512-526.PubMed
          22. Sumner JG, Holland BR, Jarvis PD: The Algebra of the General Markov Model on Phylogenetic Trees and Networks. Bull Math Biol. 2012, 74 (4): 858-880. 10.1007/s11538-011-9691-zPubMedView Article
          23. Sumner JG, Jarvis PD, Fernandez-Sanchez J, Kaine B, Woodhams M, Holland BR: Is the general time-reversible model bad for molecular phylogenetics?. Syst Biol. 2012, 61 (6): 1069-1074. 10.1093/sysbio/sys042PubMedView Article
          24. Keilen T: Endliche Gruppen. Eine Einführung mit dem Ziel der Klassifikation von Gruppen kleiner Ordnung. 2000, [http://​www.​mathematik.​uni-kl.​de/​wwwagag/​download/​scripts/​Endliche.​Gruppen.​pdf].
          25. Kosiol C, Goldman N: Different Versions of the Dayhoff Rate Matrix. Mol Biol Evol. 2005, 22 (2): 193-199.PubMedView Article
          26. Susko E, Roger AJ: On Reduced Amino Acid Alphabets for Phylogenetic Inference. Mol Biol Evol. 2007, 24 (9): 2139-2150. 10.1093/molbev/msm144PubMedView Article


          © Fischer et al.; licensee BioMed Central Ltd. 2012

          This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://​creativecommons.​org/​licenses/​by/​2.​0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.