Skip to main content

Characterizing compatibility and agreement of unrooted trees via cuts in graphs

Abstract

Background

Deciding whether there is a single tree —a supertree— that summarizes the evolutionary information in a collection of unrooted trees is a fundamental problem in phylogenetics. We consider two versions of this question: agreement and compatibility. In the first, the supertree is required to reflect precisely the relationships among the species exhibited by the input trees. In the second, the supertree can be more refined than the input trees.

Testing for compatibility is an NP-complete problem; however, the problem is solvable in polynomial time when the number of input trees is fixed. Testing for agreement is also NP-complete, but it is not known whether it is fixed-parameter tractable. Compatibility can be characterized in terms of the existence of a specific kind of triangulation in a structure known as the display graph. Alternatively, it can be characterized as a chordal graph sandwich problem in a structure known as the edge label intersection graph. No characterization of agreement was known.

Results

We present a simple and natural characterization of compatibility in terms of minimal cuts in the display graph, which is closely related to compatibility of splits. We then derive a characterization for agreement.

Conclusions

Explicit characterizations of tree compatibility and agreement are essential to finding practical algorithms for these problems. The simplicity of the characterizations presented here could help to achieve this goal.

Background

A phylogenetic tree T is an unrooted tree whose leaves are bijectively mapped to a label set (T). Labels represent species and T represents the evolutionary history of these species. Let be a collection of phylogenetic trees. We call a profile, refer to the trees in as input trees, and denote the combined label set of the input trees, T P (T), by (P). A supertree of is a phylogenetic tree whose label set is (P). The goal of constructing a supertree for a profile is to synthesize the information in the input trees in a larger, more comprehensive, phylogeny [1]. Ideally, a supertree should faithfully reflect the relationships among the species implied by the input trees. In reality, this is rarely achievable, because of conflicts among the input trees due to errors in constructing them or to biological processes such as lateral gene transfer and gene duplication.

We consider two classic versions of the supertree problem, based on the closely related notions of compatibility and agreement. Let S and T be two phylogenetic trees where (T)(S) —for our purposes, T would be an input tree and S a supertree. Let S be the tree obtained by suppressing any degree-two vertices in the minimal subtree of S connecting the labels in (T). We say that S displays T, or that T and S are compatible, if T can be derived from S by contracting edges. We say that tree T is an induced subtree of S, or that T and S agree, if S is isomorphic to T.

Let be a profile. The tree compatibility problem asks if there exists a supertree for that displays all the trees in . If such a supertree S exists, we say that is compatible and S is a compatible supertree for . The agreement supertree problem asks if there exists a supertree for that agrees with all the trees in . If such a supertree S exists, we say that S is an agreement supertree (AST) for .

Compatibility and agreement embody different philosophies about conflict. An agreement supertree must reflect precisely the evolutionary relationships exhibited by the input trees. In contrast, a compatible supertree is allowed to exhibit more fine-grained relationships among certain labels than those exhibited by an input tree. From a biological viewpoint, the differences between compatibility and agreement reflect different ways to treat polytomies —i.e., nodes of degree greater than three. Compatibility treats polytomies as soft facts: if an input tree node has degree four or more, it is not because there were multiple simultaneous speciation events, but because there is not enough information to resolve the sequence of speciation. Thus, if another input tree provides more refined information about speciation order, we can use it, provided the information is not contradicted by the remaining input trees. Agreement, in contrast, treats polytomies as hard facts. Note that compatibility and agreement are equivalent when the input trees are binary.

If all the input trees share a common label (which can be viewed as a root node), both tree compatibility and agreement are solvable in polynomial time [2, 3]. In general, however, the two problems are NP-complete, and remain so even when the trees are quartets; i.e., binary trees with exactly four leaves [4]. Nevertheless, Bryant and Lagergren showed that the tree compatibility problem is fixed-parameter tractable when parametrized by number of trees [5]. It is unknown whether or not the agreement supertree problem has the same property.

To prove the fixed-parameter tractability of tree compatibility, Bryant and Lagergren first showed that a necessary (but not sufficient) condition for a profile to be compatible is that the tree-width of a certain graph —the display graph of the profile (see Section ‘Display graphs and edge label intersection graphs’)— be bounded by the number of trees. They then showed how to express compatibility as a bounded-size monadic second-order formula on the display graph. By Courcelle’s Theorem [6, 7], these two facts imply that compatibility can be decided in time linear in the size of the display graph. Unfortunately, Bryant and Lagergren’s argument amounts essentially to only an existential proof, as it is not clear how to obtain an explicit algorithm for unrooted compatibility from it.

A necessary step towards finding a practical algorithm for compatibility —and indeed for agreement— is to develop an explicit characterization of the problem. In earlier work [8], we made some progress in this direction, characterizing tree compatibility in terms of the existence of a legal triangulation of the display graph of the profile. Gysel et al. [9] provided an alternative characterization, based on a structure they call the edge label intersection graph (ELIG) (see Section ‘Display graphs and edge label intersection graphs’). Their formulation is in some ways simpler than that of [8], allowing Gysel et al. to express tree compatibility as a chordal sandwich problem. Neither [8] nor [9] deal with agreement.

Here, we show that the connection between separators in the ELIG and cuts in the display graph (explored in Section ‘Display graphs and edge label intersection graphs’) leads to a new, and natural, characterization of compatibility in terms of minimal cuts in the display graph (Section ‘Characterizing compatibility via cuts’). We then show how such cuts are closely related to the splits of the compatible supertree (Section ‘Splits and cuts’). Next, we give a characterization of the agreement in terms of minimal cuts of the display graph (Section ‘Characterizing agreement via cuts’). To our knowledge, there was no previous characterization of the agreement supertree problem for unrooted trees. Lastly, we examine the connection between the triangulation-based and the cut-based perspectives on compatibility (Section ‘Relationship to legal triangulations’).

Preliminaries

Splits, compatibility, and agreement

A split of a label set L is a bipartition of L consisting of non-empty sets. We denote a split {X,Y} by X|Y. A split is non-trivial if neither of its sets is a singleton; otherwise, it is trivial. Let T be a phylogenetic tree. Let e be an edge of T. Deletion of e disconnects T into two subtrees T1 and T2. If L1 and L2 denote the set of all labels in T1 and T2, respectively, then L1|L2 is a split of (T). We denote by σ e (T) the split corresponding to edge e of T; if e is a leaf edge, then σ e (T) is a trivial split. Let Σ(T) denote the set of all splits corresponding to internal edges of T and Σ t r i v (T) denote the set of all (trivial) splits corresponding to leaf edges of T.

A tree T displays a split X|Y if there exists an internal edge e of T where σ e (T)=X|Y. A set of splits of a label set L is compatible if there exists a tree that displays all the splits in the set. It is well-known that two splits A1|A2 and B1|B2 are compatible if and only if at least one of A1B1, A1B2, A2B1 and A2B2 is empty [10]. Note that a trivial split of L is compatible with every split of L.

Theorem 1 (Splits-Equivalence Theorem [10, 11]).

Let Σ be a collection of splits of a label set X that includes all trivial splits. Then, Σ=Σ(T)Σ t r i v (T) for some phylogenetic tree T with label set X if and only if the splits in Σ are pairwise compatible. Tree T is unique up to isomorphism.

Let S be a phylogenetic tree and let Y be a subset of (S). Then, S|Y denotes the tree obtained by suppressing any degree-two vertices in the minimal subtree of S connecting the labels in Y. Now, let T be a phylogenetic tree such that (T)(S). Then, S displays T if and only if Σ(T)Σ( S | ( T ) ); T and S agree if and only if Σ(T)=Σ( S | ( T ) ).

Cliques, separators, cuts, and triangulations

Let G be a graph. We represent the vertices and edges of G by V(G) and E(G) respectively. A clique of G is a complete subgraph of G. A clique H of G is maximal if there is no other clique H of G where V(H)V(H). For any UV(G), GU is the graph derived by removing vertices of U and their incident edges from G. For any FE(G), GF is the graph with vertex set V(G) and edge set E(G) F.

For any two nonadjacent vertices a and b of G, an a-b separator of G is a set U of vertices where UV(G) and a and b are in different connected components of GU. An a-b separator U is minimal if for every UU, U is not an a-b separator. A set UV(G) is a minimal separator if U is a minimal a-b separator for some nonadjacent vertices a and b of G. We represent the set of all minimal separators of graph G by G . Two minimal separators U and U are parallel if GU contains at most one component H where V(H)∩U.

A connected component H of GU is full if for every uU there exists some vertex vH where {u,v}E(G).

Lemma 1 ([12]).

For a graph G and any UV(G), U is a minimal separator of G if and only if GU has at least two full components.

A chord is an edge between two nonadjacent vertices of a cycle. A graph H is chordal if and only if every cycle of length four or greater in H has a chord. A chordal graph H is a triangulation of graph G if V(G)=V(H) and E(G)E(H). The edges in E(H) E(G) are called fill-in edges of G. A triangulation is minimal if removing any fill-in edge yields a non-chordal graph.

A clique tree of a chordal graph H is a pair (T,B) where (i) T is a tree, (ii) B is a bijective function from vertices of T to maximal cliques of H, and (iii) for every vertex vH, the set of all vertices x of T where vB(x) induces a subtree in T. Property (iii) is called coherence.

Let be a collection of subsets of V(G). We represent by G F the graph derived from G by making the set of vertices of X a clique for every XF. The next result summarizes basic facts about separators and triangulations (see [1214]).

Theorem 2.

Let be a maximal set of pairwise parallel minimal separators of G and H be a minimal triangulation of G. Then, the following statements hold.

  1. 1.

    G F is a minimal triangulation of G.

  2. 2.

    Let (T,B) be a clique tree of G F . There exists a minimal separator FF if and only if there exist two adjacent vertices x and y in T where B(x)∩B(y)=F.

  3. 3.

    H is a maximal set of pairwise parallel minimal separators of G and G H =H.

A cut in a connected graph G is a subset F of edges of G such that GF is disconnected. A cut F is minimal if there does not exist FF where GF is disconnected. Note that if F is minimal, GF has exactly be two connected components. Two minimal cuts F and F are parallel if GF has at most one connected component H where E(H)∩F.

Display graphs and edge label intersection graphs

We now introduce the two main notions that we use to characterize compatibility and agreement: the display graph and the edge label intersection graph. We then present some known results about these graphs, along with new results on the relationships between them. Here and in the rest of the paper, [m] denotes the set {1,…,m}, where m is a positive integer. Since for any phylogenetic tree T there is a bijection between the leaves of T and (T), we refer to the leaves of T by their labels.

Let P={ T 1 , T 2 ,, T k } be a profile. We assume that for any i,j[k] such that ij, the sets of internal vertices of input trees T i and T j are disjoint. The display graph of , denoted by G(P), is a graph whose vertex set is i [ k ] V( T i ) and edge set is j [ k ] E( T j ) (see Figure 1). A vertex v of G(P) is a leaf if v(P). Every other vertex of G(P) is an internal. An edge of G(P) is internal if its endpoints are both internal.

Figure 1
figure 1

Compatible trees. (i) First input tree. (ii) A second input tree, compatible with the first. (iii) Display graph of the input trees. (iv) Edge label intersection graph of the input trees; for each vertex, uv represents edge {u,v} of the display graph.

A triangulation G of G(P) is legal if it satisfies the following conditions.

  1. 1.

    For every clique C of G , if C contains an internal edge, then it contains no other edge of G(P).

  2. 2.

    No fill-in edge in G has a leaf as an endpoint.

Theorem 3 (Vakati, Fernández-Baca [8]).

A profile of unrooted phylogenetic trees is compatible if and only if G(P) has a legal triangulation.

In what follows, we assume that G(P) is connected. If it is not, the connected components of G(P) induce a partition of into sub-profiles such that for each sub-profile P , G( P ) is a connected component of G(P). It is easy to see that is compatible if and only if each sub-profile is compatible.

The edge label intersection graph of, denoted LG(P), is the line graph of G(P)[9]. That is, the vertex set of LG(G) is E(G(P)) and two vertices of LG(P) are adjacent if the corresponding edges in G(P) share an endpoint. (We should note that Gysel et al. [9] refer to LG(P) as the modified edge label intersection graph.) For an unrooted tree T, LG(T) denotes LG({T}).

Observation 1.

Let F be a set of edges of G(P) and let { v 1 , v 2 ,, v m }V(G(P)) where m≥2. Then, v1,v2,…,v m is a path in G(P)F if and only if {v1,v2},…,{vm−1,v m } is a path in in LG(P)F.

Thus, if G(P) is connected, so is LG(P). Hence, in what follows, we assume that LG(P) is connected.

A fill-in edge for LG(P) is valid if for every TP, at least one of the endpoints of the edge is not in LG(T). A triangulation H of LG(P) is restricted if every fill-in edge of H is valid.

Theorem 4 (Gysel et al. [9]).

A profile of unrooted phylogenetic trees is compatible if and only if LG(P) has a restricted triangulation.

A minimal separator F of LG(P) is legal if for every TP, all the edges of T in F share a common endpoint; i.e., FE(T) is a clique in LG(T). The following theorem was mentioned in [9]. For future reference, we formally state it and prove it here.

Theorem 5.

A profile is compatible if and only if there exists a maximal set of pairwise parallel minimal separators in LG(P) where every separator in is legal.

Proof.

Our approach is similar to the one used by Gusfield in [15]. Assume that is compatible. From Theorem 4, there exists a restricted triangulation H of LG(P). We can assume that H is minimal (if it is not, simply delete fill-in edges repeatedly from H until it is minimal). Let F= H . From Theorem 2, is a maximal set of pairwise parallel minimal separators of LG(P) and LG ( P ) F =H. Suppose contains a separator F that is not legal. Let {e,e}F where {e,e}E(T) for some input tree T and ee=. The vertices of F form a clique in H. Thus, H contains the edge {e,e}. Since {e,e} is not a valid edge, H is not a restricted triangulation, a contradiction. Hence, every separator in is legal.

Let be a maximal set of pairwise parallel minimal separators of LG(P) where every separator in is legal. From Theorem 2, LG ( P ) F is a minimal triangulation of LG(F). If {e, e }E(LG ( P ) F ) is a fill-in edge, then ee= and there exists a minimal separator FF where {e,e}F. Since F is legal, if {e,e}E(T) for some input tree T then ee. Thus, e and e are not both from LG(T) for any input tree T. Hence, every fill-in edge in LG ( P ) F is valid, and LG ( P ) F is a restricted triangulation.

Let u of be a vertex of some input tree, We write Inc(u) to denote the set of all edges of G(P) incident on u. Equivalently, Inc(u) is the set of all vertices e of LG(P) such that ue.

Let F be a cut of the display graph G(P). F is legal if for every tree TP, the edges of T in F are incident on a common vertex; i.e., if FE(T)Inc(u) for some uV(T). F is nice if F is legal and each connected component of G(P)F has at least one edge.

Lemma 2.

Let F be a subset of E(G(P)). Then, F is a legal minimal separator of LG(P) if and only if F is a nice minimal cut of G(P).

To prove the Lemma 2, we need two auxiliary lemmas and a corollary.

Lemma 3.

Let F be any minimal separator of LG(P) and u be any vertex of any input tree. Then, Inc(u)̸F.

Proof.

Suppose F is a minimal a-b separator of LG(P) and u is a vertex of some input tree such that Inc(u)F. Consider any vertex eInc(u). Then, there exists a path π from a to b in LG(P) where e is the only vertex of F in π. If such a path π did not exist, then Fe would still be an a-b separator, and F would not be minimal, a contradiction. Let e1 and e2 be the neighbors of e in π and let e={u,v}. Since Inc(u)F, π does not contain any other vertex e where ue. Thus, ee1={v} and ee2={v}. Let π=a,…,e1,e,e2,…,b. Then π=a,…,e1,e2,…,b is also a path from a to b. But π does not contain any vertex of F, contradicting the assumption that F is a separator of LG(P). Hence, neither such a minimal separator F nor such a vertex u exist.

Lemma 4.

If F is a minimal separator of LG(P), then LG(P)F has exactly two connected components.

Proof.

Assume that LG(P)F has more than two connected components. By Lemma 1, LG(P)F has at least two full components. Let H1 and H2 be two full components of LG(P)F. Let H3 be a connected component of LG(P)F different from H1 and H2. By assumption LG(P) is connected. Thus, there exists an edge {e,e3} in LG(P) where eF and e3H3. Since H1 and H2 are full components, there exist edges {e,e1} and {e,e2} in LG(P) where e1V(H1) and e2V(H2).

Let e={u,v}, and assume without loss of generality that uee3. Then, there is no vertex fV(H1) where uef. Thus, vee1. Similarly, there is no vertex fV(H2) such that ufe or vfe. But then H2 does not contain a vertex adjacent to e, so H2 is not a full component, a contradiction.

Corollary 1.

If F is a minimal separator of LG(P), then LG(P) F is connected for any FF.

Proof of Lemma 2.

We prove that if F is a legal minimal separator of LG(P) then F is a nice minimal cut of G(P). The proof for the other direction is similar and is omitted.

First, we show that F is a cut of G(P). Assume the contrary. Let {u,v} and {p,q} be vertices in different components of LG(P)F. Since G(P)F is connected, there is a path between vertices u and q in this graph. Also, {u,v}F and {p,q}F. Thus, by Observation 1 there is also a path between vertices {u,v} and {p,q} of LG(P)F. This implies that {u,v} and {p,q} are in the same connected component of LG(P)F, a contradiction. Thus F is a cut.

Next we show that F is a nice cut of G(P). For every TP all the vertices of LG(T) in F form a clique in LG(T). Thus, all the edges of T in F are incident on a common vertex, so F is legal. To complete the proof, assume that G(P)F has a connected component with no edge and let u be the vertex in one such component. Then, Inc(u)F. But F is a minimal separator of LG(P), and by Lemma 3, Inc(u) F, a contradiction. Thus, F is a nice cut.

Lastly, we show that F is a minimal cut of G(P). Assume, on the contrary, that there exists FF where G(P) F is disconnected. Since FF and every connected component of G(P)F has at least one edge, every connected component of G(P) F also has at least one edge. Let {u,v} and {p,q} be the edges in different components of G(P) F . By Corollary 1, LG(P) F is connected and thus, there is a path between {u,v} and {p,q} in LG(P) F . By Observation 1 there must also be a path between vertices u and p in G(P) F . Hence, edges {u,v} and {p,q} are in the same connected component of GF, a contradiction. Thus, F is a minimal cut.

Lemma 5.

Two legal minimal separators F and F of LG(P) are parallel if and only if the nice minimal cuts F and F are parallel in G(P).

Proof.

Assume that separators F and F of LG(P) are parallel, but cuts F and F of G(P) are not. Then, there exists a set {{u,v},{p,q}}F where {u,v} and {p,q} are in different components of G(P)F. Since F and F are parallel separators in LG(P), and F does not contain {u,v} and {p,q}, there exists a path between vertices {u,v} and {p,q} in LG(P)F. Then, by Observation 1 there also exists a path between vertices u and q in G(P)F. Thus, {u,v} and {p,q} are in the same connected component of G(P)F, a contradiction.

The other direction can be proved similarly, using Observation 1.

The next lemma, from [9], follows from the definition of restricted triangulation.

Lemma 6

Let H be a restricted triangulation of LG(P) and let (T,B) be a clique tree of H. Let e={u,v} be any vertex in LG(P). Then, there does not exist a node xV(T) where B(x) contains vertices from both Inc(u) e and Inc(v) e.

Lemma 7.

Let T be a tree in and suppose F is a minimal cut of G(P) that contains precisely one edge e of T. Then, the edges of the two subtrees of Te are in different connected components of G(P)F.

Proof.

Let e={u,v}. For each xe, let T x denote the subtree containing vertex x in Te. For each vertex xe, all the edges of T x are in the same connected component of G(P)F as x, because e is the only edge of T in F. Since F is a minimal cut of G(P), the endpoints of e are in different connected components of G(P)F. Hence, the edges of T u and T v are also in different connected components of G(P)F.

Characterizing compatibility via cuts

A set of cuts of G(P) is complete if, for every input tree TP and every internal edge e of T, there is a cut FF where e is the only edge of T in F.

Lemma 8.

G(P) has a complete set of pairwise parallel nice minimal cuts if and only if it has a complete set of pairwise parallel legal minimal cuts.

Proof.

The “only if part” follows from the definition of a nice cut. Let be a complete set of pairwise parallel legal minimal cuts. Consider any minimal subset F of that is also complete. Let F be a legal minimal cut of F . Since F is minimal, there exists an edge eF of some input tree T such that e is the only edge of T in F. Also, since e is an internal edge, both subtrees of Te have at least one edge each. Thus by Lemma 7, both connected components of G(P)F have at least one edge each. Hence, F is a nice minimal cut of G(P). It follows that F is a complete set of pairwise parallel nice minimal cuts of G(P).

We now characterize the compatibility of a profile in terms of minimal cuts in the display graph of the profile.

Theorem 6.

A profile of unrooted phylogenetic trees is compatible if and only if there exists a complete set of pairwise parallel legal minimal cuts for G(P).

Example 1.

For the display graph of Figure 1, let F={ F 1 , F 2 , F 3 , F 4 }, where F1={{1,2},{5,6}}, F2={{2,3},{6,7},{5,6}}, F3={{4,5},{1,2},{1,c}} and F4={{6,7},{2,f}}. Then, is a complete set of pairwise parallel nice minimal cuts.

Theorem 6 has an analog in terms of LG(P). Let us say that a set of legal minimal separators of LG(P) is complete if for every internal edge e of an input tree T, there exists a separator FF where e is the only vertex of LG(T) in F.

Theorem 7.

A profile of unrooted phylogenetic trees is compatible if and only if there exists a complete set of pairwise parallel legal minimal separators for LG(P).

This result is a direct consequence of Theorem 6 and Lemmas 2, 5, and 8, so we omit its proof. Instead, we focus on the proof of Theorem 6, for which we need the next fact.

Lemma 9.

The following two statements are equivalent.

  1. 1.

    There exists a maximal set of pairwise parallel minimal separators of LG(P) where every separator in is legal.

  2. 2.

    There exists a complete set of pairwise parallel nice minimal cuts for G(P).

Proof.

(i) (ii): We show that for every internal edge e={u,v} of an input tree T there exists a minimal separator in that contains only vertex e from LG(T). Then it follows from Lemmas 2 and 5 that is a complete set of pairwise parallel nice minimal cuts for G(P).

As shown in the proof of Theorem 5, LG ( P ) F is a restricted minimal triangulation of LG(P). Let (S,B) be a clique tree of LG ( P ) F . By definition, the vertices in each of the sets Inc(u) and Inc(v) form a clique in LG(P). Consider any vertex p of S where Inc(u)B(p) and any vertex q of S where Inc(v)B(q). (Since (S,B) is a clique tree of LG ( P ) F , such vertices p and q must exist.) Also, by Lemma 6, pq, B(p)∩(Inc(v) {e})= and B(q)∩(Inc(u) {e})=.

Let π=p,x1,x2,…,x m ,q be the path from p to q in S where m≥0. Let x0=p and xm+1=q. Let x i be the vertex nearest to p in path π where i[m+1] and B(x i )∩(Inc(u) {e})=. Let F=B(xi−1)∩B(x i ). Then by Theorem 2, FF. Since Inc(u)∩Inc(v)={e}, by the coherence property, eB(x j ) for every j[m]. Thus, eF. By Lemma 6, B(xi−1)∩(Inc(v) {e})=. Since B(x i )∩(Inc(u) {e})=, F∩Inc(u)={e} and F∩Inc(v)={e}. Thus, for every vertex eLG(T) where ee and ee, eF. Also, since every separator in is legal, we have fF for every vertex fLG(T) where fe=. Thus, e is the only vertex of LG(T) in F.

(i) (ii): Consider any complete set of pairwise parallel nice minimal cuts F of G(P). By Lemmas 2 and 5, F is a set of pairwise parallel legal minimal separators of LG(P). There exists a maximal set of pairwise parallel minimal separators where F F.

Assume that F F contains a minimal separator F that is not legal. Then, there must exist a tree TP where at least two nonincident edges e1={x,y} and e2={x,y} of T are in F. Consider any internal edge e3 in T where e1 and e2 are in different components of Te3. Such an edge exists because e1 and e2 are nonincident. Since F is complete, there exists a cut F F where e3 is the only edge of T in F. Since F and F are in , they are parallel to each other and vertices e1 and e2 are in the same connected component of LG(P) F . Thus, by Observation 1, there exists a path between vertices x and x in G(P) F and edges e1 and e2 are also in the same connected component of G(P) F . But by Lemma 7 that is impossible.

Thus, every separator of F F is legal and is a maximal set of pairwise minimal separators of LG(P) where every separator in is legal.

Proof of Theorem 6.

By Theorem 5 and Lemma 9, profile is compatible if and only if there exists a complete set of pairwise parallel nice minimal cuts for G(P). The rest follows from Lemma 8.

Splits and cuts

We first argue that for every nice minimal cut of G(P) we can derive a split of (P). We use the following notation: if H is a subgraph of G(P), then (H) represents the set of all leaves of H

Lemma 10.

Let F be a nice minimal cut of G(P) and let G1 and G2 be the two connected components of G(P)F. Then, L(G i )≠ for i{1,2}. In particular, ( G 1 )|( G 2 ) is a split of (P).

Proof.

Consider G i for each i{1,2}. We show that ( G i ) is non-empty. Since F is nice, G i contains at least one edge e of G(P). If e is a non-internal edge, then ( G i ) is non-empty. Assume that e={u,v} is an internal edge of some input tree T. If F does not contain an edge of T, then (T)( G i ) and thus ( G i ) is non-empty. Assume that F contains one or more edges of T. Let T u , T v be the two subtrees of Te. Since F is a nice minimal cut, F contains edges from either T u or T v but not both. Without loss of generality assume that F does not contain edges from T u . Then, every edge of T u is in the same component as e. Since T u contains at least one leaf, ( G i ) is non-empty. Thus, ( G 1 )|( G 2 ) is a split of (P).

Let σ(F) denote the split of (P) induced by a nice minimal cut F. If is a set of nice minimal cuts of G(P), Σ(F) denotes the set of all the non-trivial splits in F F σ(F). The following result expresses the relationship between complete sets of nice minimal cuts and the compatibility of splits.

Theorem 8.

If G(P) has a complete set of pairwise parallel nice minimal cuts , then Σ(F) is compatible and any compatible tree for Σ(F) is also a compatible tree for .

Example 2.

For the complete set of pairwise parallel nice minimal cuts F={ F 1 , F 2 , F 3 , F 4 } for the display graph of Example 1, we have σ(F1)=a b c|d e f g, σ(F2)=a b c f g|d e, σ(F3)=a b|c d e f g, and σ(F4)=a b c d e|f g. Note that these splits are pairwise compatible.

The proof of Theorem 8 uses the following lemma.

Lemma 11.

Let F1 and F2 be two parallel nice minimal cuts of G(P). Then, σ(F1) and σ(F2) are compatible.

Proof.

Let σ(F1)=U1|U2 and σ(F2)=V1|V2. Assume that σ(F1) and σ(F2) are incompatible. Thus, U i V j for every i,j{1,2}. Let aU1V1, bU1V2, cU2V1 and dU2V2. Since {a,b}U1, there exists a path π1 between leaves a and b in G(P) F 1 . But a and b are in different components of G(P) F 2 . Thus, an edge e1 of path π1 is in the cut F2. Similarly, {c,d}U2 and there exists a path π2 between labels c and d in G(P) F 1 . Since c and d are in different components of G(P) F 2 , cut F2 contains an edge e2 of path π2. But π1 and π2 are in different components of G(P) F 1 , so edges e1 and e2 are in different components of G(P) F 1 . Since {e1,e2}F2, the cuts F1 and F2 are not parallel, a contradiction.

Proof of Theorem 8.

The compatibility of Σ(F) follows from Lemma 11 and Theorem 1. Let S be a compatible tree for Σ(F), let T be an input tree of , let S = S | ( T ) , and let e be any internal edge of T. We show that S displays σ(e)

Let σ(e)=A|B. There exists a cut FF where e is the only edge of T in F. By Lemma 7, since F is minimal, the leaves of sets A and B are in different components of G(P)F. Thus, if σ(F)=A|B then, up to renaming of sets, we have AA and BB. Because S displays σ(F), S also displays σ(e). Since S displays all the splits of T, T can be obtained from S by contracting zero or more edges [10]. Thus, S displays T. Since S displays every tree in , S is a compatible tree for .

Characterizing agreement via cuts

The following characterization of agreement is similar to the one for tree compatibility given by Theorem 6, except for an additional restriction on the minimal cuts.

Theorem 9.

A profile has an agreement supertree if and only if G(P) has a complete set of pairwise parallel legal minimal cuts where, for every cut FF and for every TP, there is at most one edge of T in F.

Example 3.

One can verify that the display graph of Figure 1 does not meet the conditions of Theorem 9 and, thus, the associated profile does not have an AST. On the other hand, for the display graph of Figure 2, let F={ F 1 , F 2 , F 3 }, where F1={{1,2},{4,5}}, F2={{1,2},{5,6}} and F3={{2,3},{6,d}}. For any given input tree T, every cut in has at most one edge of T. Also, is a complete set of pairwise parallel legal minimal cuts. Thus, by Theorem 9, the input trees of Figure 2 have an AST

Figure 2
figure 2

Agreeing trees. (i) First input tree. (ii) Second input tree, which agrees with the first. (iii) Display graph of the input trees. (iv) Edge label intersection graph of the input trees, where label uv represents edge {u,v} of the display graph.

The analogue of Theorem 9 for LG(P) stated next follows from Theorem 9 and Lemmas 2, 5, and 8.

Theorem 10.

A profile has an agreement supertree if and only if LG(P) has a complete set of pairwise parallel legal minimal separators where, for every FF and every TP, there is at most one vertex of LG(T) in F.

Theorem 9 follows from Lemma 8 and the next result.

Lemma 12.

A profile has an agreement supertree if and only if G(P) has a complete set of pairwise parallel nice minimal cuts where, for every cut FF and every TP, there is at most one edge of T in F.

The rest of the section is devoted to the proof of Lemma 12

Let S be an AST of and let e={u,v} be an edge of S. Let S u and S v be the subtrees of Se containing u and v, respectively. Let L u =( S u ) and L v =( S v ). Thus, σ e (S)=L u |L v . Assume that there exists an input tree T where (T) L x for each x{u,v}. Then there exists an edge fE(T) where, if σ f (T)=A1|A2, then A1L u and A2L v . (If there were no such edge, S | ( T ) would contain a split that is not in T and would thus not be isomorphic to T.) We call e an agreement edge of S corresponding to edge f of T. Note that there does not exist any other edge f of T where e is also an agreement edge of S with respect to edge f of T.

The cut function of an AST S of is the mapping Ψ from E(S) to subsets of edges of G(P) defined as follows. For every eE(S), an edge f of an input tree T is in Ψ(e) if and only if e is an agreement edge of S corresponding to edge f of T. Observe that Ψ is uniquely defined. Given an edge eE(S), we define a set V x for each xe as follows. For every TP, let Vx,T consist of all the vertices of the minimal subtree of T connecting the labels in (T) L x . Then, V x = T P V x , T . Note that if e={u,v} then {V u ,V v } is a partition of V(G(P)).

Lemma 13.

Let S be an AST of and let Ψ be the cut function of S. Then, for every edge eE(S),

  1. (i)

    Ψ(e) is a cut of G(P) and

  2. (ii)

    Ψ(e) is a minimal cut of G(P) if and only if G(P)Ψ(e) has exactly two connected components.

Proof.

(i) Let e={u,v}. We show that G(P)Ψ(e) does not contain an edge whose endpoints are in distinct sets of {V u ,V v }. Assume the contrary. Let f={x,y} be an edge of G(P)Ψ(e) where xV u and yV v .

Since fG(P)Ψ(e), fΨ(e). Suppose f is an edge of input tree T. There are two cases.

  1. 1.

    Ψ(e ) does not contain an edge of T. Then, there exists an endpoint p of e where (T) L p . Without loss of generality, let u=p. Then, V(T)V u and thus yV u , a contradiction.

  2. 2.

    Ψ(e ) contains an edge f f of T. Let f ={r,s} and let L r L u and L s L v . Let x,r be the vertices of f and f where L x L r . Since T is a phylogenetic tree, such vertices x and r exist. Since L r L u , both the endpoints of f are in V u , a contradiction.

Thus, G(P)Ψ(e) does not contain an edge whose endpoints are in different sets of {V u ,V v }. Since V u and V v are non-empty, Ψ(e) is a cut of G(P).

(ii) The “only if” part follows from the definition of a minimal cut. We now prove the “if” part. Let e={u,v}. Assume that G(P)Ψ(e) has exactly two connected components. From the proof of (i), V u and V v are the vertex sets of those two connected components. Consider any edge fΨ(e). The endpoints of f are in different sets of {V u ,V v } and thus are in different connected components of G(P)Ψ(e). Hence, G(P)(Ψ(e){f}) is connected. Thus, if G(P)Ψ(e) has exactly two connected components, Ψ(e) is a minimal cut of G(P).

The next observation summarizes two basic facts about cut functions.

Observation 2.

Let S be an AST of . Then, the cut function Ψ of S has the following properties.

  1. 1.

    For any two distinct edges e 1 and e 2 in E(S), Ψ(e 1)≠Ψ(e 2).

  2. 2.

    Let e={u,v} be an edge of S. For any input tree T where (T) L v , all the labels of (T) L v are in the same connected component of G(P)Ψ(e).

Let S be an AST of and let e be an edge of S. Although Lemma 13 shows that Ψ(e) is a cut of G(P), Ψ(e) may not be minimal. We now argue that we can always construct an agreement supertree whose cut function gives minimal cuts

Lemma 14.

If has an AST, then it has an AST S of whose cut function Ψ satisfies the following: For every edge eS, Ψ(e) is a minimal cut of G(P).

We prove Lemma 14 by arguing that any AST that fails to satisfy the required cut minimality property can be transformed into one that does, through repeated application of the “splitting” operation, defined next.

Suppose e=(u,v) is a an edge of S where Ψ(e) is not minimal. Let {L1,…,L m } be the partition of L v where for every i[m], L i =(C) L v for some connected component C in G(P)Ψ(e). We assume without loss of generality that m>1 (if not, we can just exchange the roles of u and v). Let R v be the rooted tree derived from S v by distinguishing vertex v as the root. Let Rv,i be the (rooted) tree obtained from the minimal subtree of R v connecting the labels in L i by distinguishing the vertex closest to v as the root and suppressing every other vertex that has degree two. To split edge e at u is to construct a new tree S from S in two steps: (i) delete the vertices of R v from S and (ii) for every i[m], add an edge from u to the root of Rv,i.

Observation 3.

Let S be an AST of and let Ψ be the cut function of S. Let S be the tree derived by splitting edge e={u,v} at u. Consider any connected component C of G(P)Ψ(e) where (C) L v . Then, for every X((C) L v ), S|X and S | X are isomorphic.

The next observation follows from the definition of AST.

Observation 4.

Let S and T be two phylogenetic trees where (T)(S) and T agrees with S. Then, T and S|U agree for every U such that (T)U(S).

Lemma 15.

Let S be an AST of and let e={u,v} be an edge of S. Let S be the tree derived by splitting edge e at u. Then, S is an AST of .

Proof.

By construction, S is a phylogenetic tree over (P). As before, let {L1,…,L m } be the partition of L v where for every i[m], L i =(C) L v for some connected component C in G(P)Ψ(e). Consider any input tree T of profile . We prove that T and S agree. There are three cases. Case 1: ( T ) L u . Since (T) L u , by Observation 4, T and S | L u agree. By the definition of the split operation, trees S | L u and S | L u are isomorphic. Thus, T and S agree. Case 2: ( T ) L v . By Observation 2(ii), (T) L i for some i[m]. Since T and S agree and (T) L i , by Observation 4, T and S | L i agree. By construction, trees S | L i and S | L i are isomorphic. Thus, T and S agree. Case 3: ( ( T ) L u ) and ((T) L v ). By Observation 2(ii), (T) L v L i for some i[m]. Since T and S agree and (T)( L u L i ), by Observation 4, T also agrees with S | ( L u L i ) . By construction, trees S | ( L u L i ) and S | ( L u L i ) are isomorphic. Thus, T and S | ( L u L i ) agree. It follows that T and S agree

Thus, S is an AST of .

Observe that if S is the tree obtained by splitting edge e={u,v} of S at u, then the edges of E(S u ) are in both S and S.

Therefore, E(S)E(S u )=E(S)E(S) and E(S)E(S u )=E(S) E(S).

Lemma 16.

Let S be an AST of and let e={u,v} be an edge of S. Let S be the tree obtained by splitting e at u. Let Ψ, Ψ be the cut functions of S and S respectively. Consider any edge fE(S)E(S). There exists an edge eE(S)E(S) where Ψ(f)Ψ(e). Furthermore, if Ψ(e) is a minimal cut of G(P) then Ψ(f)=Ψ(e) and Ψ(f) is a minimal cut of G(P).

Proof.

Let f={x,y} and let x be the vertex of f where L x L v . Let S p be the minimal subtree of S connecting the labels in L x . Let p be the vertex of S p closest to u in S. Let q be the vertex adjacent to p in the path from p to u. Let e={p,q}. Note that, L x L p . Since L x L v , e is an edge of E(S)E(S). Consider any tree T that has an edge f1 in Ψ(f). We show that (T) L x =(T) L p . It then follows that f1Ψ(e) and thus, Ψ(f)Ψ(e).

Since L x L p , ( L x (T))((T) L p ). By Observation 2(ii), all the labels in (T) L v are in the same connected component of G(P)Ψ(e). Thus, all the labels in L x ( L p (T)) are in the same connected of G(P)Ψ(e). If ( L p (T))⫅̸( L x (T)), then S | ( L x ( L p ( T ) ) and S | ( L x ( L p ( T ) ) are not isomorphic, contradicting Observation 3. Thus, ( L p (T))( L x (T))

Assume that Ψ(e) is a minimal cut of G(P). Then, all the labels in L p are in the same connected component of G(P)Ψ( e ). By Observation 3, L p =L x . Thus, Ψ(f) is also a minimal cut of G(P).

Lemma 17.

Let S be an AST of and Ψ be the cut function of S. Let E0 be the set of all edges e of S such that Ψ(e) is not a minimal cut of G(P). Choose any edge e={u,v}E0 such that |Ψ( e )|= max e E 0 |Ψ(e)|. Let S be the tree obtained from S by splitting e at u and let Ψ be the cut function of S. We have the following.

  1. 1.

    For any edge fE(S ), if |Ψ (f)|>|Ψ(e )| then Ψ (f) is a minimal cut of G(P).

  2. 2.

    Let P be the set of all edges x in S such that |Ψ(e )|=|Ψ(x)| and Ψ(x) is not a minimal cut. Let P be the set of all edges x in S such that |Ψ(e )|=|Ψ(x)| and Ψ (x) is not a minimal cut. Then, |P |<|P|.

Proof.

(i) Consider any edge fE(S) where |Ψ(f)|>|Ψ(e)|. If fE(S)∩E(S), then Ψ(f)=Ψ(f). Since |Ψ(f)|>|Ψ(e)|, by assumption Ψ(f) is a minimal cut of G(P). Thus, Ψ(f) is also a minimal cut of G(P). Assume that fE(S)E(S). By Lemma 16, there exists an edge eE(S) where Ψ(f)Ψ(e). Since |Ψ(f)|>|Ψ(e)|, |Ψ(e)|>|Ψ(e)|. Thus, by assumption Ψ(e) is a minimal cut of G(P). From Lemma 16, it follows that Ψ(e)=Ψ(f) and Ψ(f) is a minimal cut of G(P).

(ii) Let Q=P∩(E(S)E(S)) and Q=P∩(E(S)E(S)). It suffices to show that |Q|<|Q|. Consider any edge fQ. By Lemma 16, there exists an edge eE(S)E(S) where Ψ(f)Ψ(e). Thus, |Ψ(e)|≥|Ψ(f)|. If |Ψ(e)|>|Ψ(f)|, then by assumption Ψ(e) is a minimal cut and thus by Lemma 16 |Ψ(e)|=|Ψ(f)|, a contradiction.

Thus, Ψ(e)=Ψ(f). Also, since Ψ(f) is not a minimal cut, by Lemma 16, neither is Ψ(e). If e=e, then all vertices of V v are in the same connected component of G(P)Ψ(e), contradicting the assumption that it is possible to split e at u. Thus, ee. Hence, we can conclude that for every edge fQ, there exists an edge e(Q{e}), where Ψ(f)=Ψ(e).

Let f1 and f2 be any two distinct edges in Q. Let e1 and e2 be the edges of Q{e} where Ψ(f1)=Ψ(e1) and Ψ(f2)=Ψ(e2). If e1=e2, then Ψ(f1)=Ψ(f2), contradicting Observation 2(i). Thus, e1e2. Since eQ and eQ, it follows that |Q|≤|Q|−1, and thus |Q|<|Q|.

Proof of Lemma 14.

Let S be an AST of and Ψ be the cut function of S. Do the following while S contains an edge e such that Ψ(f) is not a minimal cut of G(P): Pick an edge e satisfying the conditions of Lemma 17, and apply a split operation at e; let S be the resulting tree. By Lemma 15, S is also an AST of . Let Ψ be the cut function of S. Set S to S and Ψ to Ψ

We only need to prove that the total number of iterations, s, is finite. An AST of has at most 2|(P)| vertices. Also, |Ψ(e)|≥1 for any edge e of S. It thus follows from Lemma 17 that s is finite.

Proof of Lemma 12

() Assume that has an AST. Then, by Lemma 14, has an AST S whose cut function Ψ has the property that, for every edge eE(S), Ψ(e) is a minimal cut of G(P). Let be the set of all Ψ(e) such that e is an internal edge of S. Then, is a set of minimal cuts of G(P). Further, by definition of Ψ, for every FF and for every TP, F contains at most one edge of T. Thus every cut in is legal. We now prove that is a complete set of pairwise parallel nice minimal cuts of G(P).

We first argue that every cut in is nice. Consider any FF. Let e={u,v} be the internal edge of S where Ψ(e)=F. Let T be an input tree that has an internal edge f in Ψ(e). Since e is an internal edge at least one such input tree exists; otherwise Ψ(e) is not a minimal cut. Now, by definition, f is the only edge of T in Ψ(e), so, by Lemma 7, each of the two connected components of G(P)Ψ(e) has at least one non-internal edge of T. Hence, F is a nice minimal cut of G(P).

To prove that the cuts in are pairwise parallel, we argue that for any two distinct internal edges e1 and e2 of S, Ψ(e1) and Ψ(e2) are parallel. There exist vertices xe1 and ye2 where L x L y . For every edge fΨ(e1), we show that fΨ(e2) or fV y . It then follows that Ψ(e1) and Ψ(e2) are parallel. Let f be an edge of input tree T. Then there exists zf where L z L x . Thus, L z L y and zV y . By Lemma 13, all the vertices of V y are in the same connected component of G(P)Ψ( e 2 ). Thus, fΨ(e2) or fV y .

Lastly, we show that is complete. Consider any internal edge f={p,q} of some input tree T. Since S is an AST of , there exists an edge e={u,v} where, up to relabeling of sets, L p L u and L q L v . Thus, e is an agreement edge of S corresponding to f, so fΨ(e). Since f is an internal edge, e is also an internal edge of S and thus Ψ(e)F. Hence, for every internal edge f of an input tree there is a cut FF where fF. Thus, § is complete.

() Assume that there exists a complete set of pairwise parallel nice minimal cuts of G(P) where, for every FF and every TP, F contains at most one edge of T. By Theorem 18, Σ(F) is compatible and, by Theorem 1, there exists an unrooted tree S where Σ(F)=Σ(S). We prove that S is an AST of by showing that Σ( S | ( T ) )=Σ(T) for every input tree TP.

Consider an input tree T of . Let X1|X2 be the non-trivial split of T corresponding to edge fE(T). Since is complete, there exists a cut FF where FF. If σ(F)=Y1|Y2, by Lemma 7, up to relabeling of sets, X i Y i for every i{1,2}. Since σ(F) is a split of S, this implies that Σ(T)Σ( S | ( T ) ).

Consider any non-trivial split P1|P2 of Σ(S) where P i (T) for each i{1,2}. Let Q i = P i (T) for each i{1,2}. Since Σ(S)=Σ(F), there exists a cut FF where σ(F)=P1|P2. Since P1 and P2 are in different connected components of G(P)F, Q1 and Q2 are also in different connected components of G(P)F. Thus, F contains an edge f of T. Since F does not contain any other edge of T, σ(f)=Q1|Q2. Thus, Σ( S | ( T ) )Σ(T).

Relationship to legal triangulations

Taken together, Theorems 3 and 6 say that G(P) has a complete set of pairwise parallel legal minimal cuts if and only if it has a legal triangulation. The connection between legal triangulations and complete sets of pairwise parallel legal minimal cuts is through the existence (or nonexistence) of a compatible tree. Here we make the connection explicit, showing how, from a set of pairwise parallel legal minimal cuts, one can construct a legal triangulation of G(P) without going through a compatible tree. We leave the other direction —going from a triangulation to a set of cuts— to the reader.

Let be a complete set of pairwise parallel legal minimal cuts of G(P). We assume that the elements of are ordered in some arbitrary, but fixed, manner, and that no proper subset of is also complete. For each FF, we build a pair (X F ,Y F ) where X F and Y F are vertex separators of G(P), and X F ,Y F {u:u is the endpoint of some edge inF}. The collection of pairs {( X F , Y F ):FF} is not unique, as it depends on the order in which is arranged. We say that a cut FFdifferentiates an internal edge e={x,y} if xX F and yY F .

For each FF, let F i =E(T i )∩F for each i[k], and let F ̂ F denote the set of all edges e such that eF i for some i[k] with |F i |=1. Note that if |F i |>1, all edges in F i must share a common endpoint. Let A F and B F denote the two connected components of G(P)F.

For each cut F in , we build (X F ,Y F ) as follows.

  1. 1.

    For each internal edge e F ̂ :

    1. (a)

      If no cut preceding F differentiates e, add eV(A F ) to X F and eV(B F ) to Y F .

    2. (b)

      Otherwise, suppose cut IF, which precedes F, differentiates e. Let Q be the connected component of G(P)I where E(Q)∩F. (Note that Q is unique, since I and F are parallel.) Let v be the unique endpoint of e in Q. Add v to X F and Y F .

  2. 2.

    For each non-internal edge e F ̂ , add the non-leaf endpoint of e to both X F and Y F .

  3. 3.

    For each i[k] such that |F i |>1, add the common endpoint of the edges of F i to both X F and Y F .

By construction and the properties of , every edge internal edge of G(P) is differentiated by some cut FF. Further, the sets X F and Y F have the form X F ={x1,…,x m ,z1,…,z p } and Y F ={y1,…,y m ,z1,…,z p }, where m>0, p≥0, and for every i[m], {x i ,y i } is an internal edge of G(P) that is differentiated by F. Let

O F = { x 1 , , x j , y j , , y m , z 1 , , z p } : j [ m ] .

We now state how to go from a complete set of pairwise parallel legal cuts to a legal triangulation. As in Section ‘Preliminaries’, given a graph G and a collection Δ of subsets of V(G), GΔ denotes the graph derived from G by making the set of vertices of X a clique for every XΔ.

Theorem 11.

Let Δ be the collection of subsets of V(G(P)) given by

Δ={ N G ( P ) ():is a leaf inG(P)} F F ({ X F , Y F } O F ).
(1)

Then, G ( P ) Δ is a legal triangulation of G(P).

The proof of Theorem 11 relies on a series of auxiliary lemmas, for which we introduce some new notation. For each FF, F denotes X F Y F and F denotes X F Y F . Also, we abbreviate G ( P ) Δ to GΔ, where Δ is the set defined in Equation (1)

Lemma 18.

Let F and I be two distinct cuts of , and let x be a vertex of F. Suppose x lies in the connected component of G(P)I that does not contain edges of F. Then, xI.

Proof.

Let EF,x be the set of all edges of F that contain x and let EI,x be the set of all edges of I that contain x. We must have EF,xEI,xI. If |EI,x|>1, then xI. Thus, assume that |EI,x|=1. Let EI,x={e}, where e={x,y}. Since EF,xEI,x and |EF,x|≥1, EF,x={e}. We can assume that y is not a leaf (since, otherwise, xI). Let EI,y be the set of edges of I with y as an endpoint. Vertex y lies in the component of GF that does not contain I. Thus, every edge in EI,y is also present in F. If |EI,y|>1, then there is more than one edge in F with y as an endpoint and by construction, xF. Hence, |EI,y|=1, and so EI,y={e}.

Let J be the cut that differentiates e. If F=J then by construction, xI. Thus, assume that FJ. If J is in the same connected component of G(P)F as I, then, by construction xF, which is a contradiction. Thus, J is in the connected component of G(P)F that does not contain I and, by construction, xI.

Lemma 19.

Let FF. For every edge {u,v} in GΔ, (i) if uV(A F )F, then vV(B F )Y F , and (ii) if uV(B F )F, then vV(A F )X F .

Proof.

Without loss of generality, we consider only the case where uV(A F )F. Suppose that vV(B F )Y F . If eE(G(P)), then eF and hence, by construction, at least one of u and v is in F. But vY F , so uF, a contradiction.

Thus, e must be a fill-in edge. Since eF, there must be a cut IF, IF, such that eI. If E(A F )∩I, then by Lemma 18, vF, a contradiction. Thus, assume that E(B F )∩I. Then, by Lemma 18, uF, another contradiction

A clique of GΔ is illegal if it contains a fill-in edge with a leaf as an endpoint or it contains an internal edge along with any another edge of G(P). An illegal clique violates one of the legal triangulation conditions (LT1) or (LT2) stated in Section ‘Display graphs and edge label intersection graphs’.

Lemma 20.

Let F be a cut of and let H be the subgraph of GΔ induced by vertices of F. Then, H is triangulated and contains no illegal clique.

Proof.

Let X F ={x1,…,x m ,z1,…,z p } and Y F ={y1,…,y m ,z1,…,z p }, where for every i[m], x i V(A F ), y i V(B F ) and {x i ,y i } is an internal edge of G(P). Note that F={z1,…,z p }.

Claim. For every i,j[m] where i>j, e={x i ,y j }E(H ).

Proof.

Assume that eE(H). By construction of (X F ,Y F ), e is a fill-in edge. Since no set in O F contains both x i and y j , there is a cut IF where eI. Since F and I are parallel, only one of the two sets IE(A F ) or IE(B F ) is non-empty. Assume that IE(A F )≠. Then by Lemma 18, y j F, a contradiction. Similarly, if IE(B F )≠, then by Lemma 18, x i F, a contradiction.

Let C be a chordless cycle of length at least four in H. Since X F and Y F are cliques in GΔ, if C contains more than two vertices from one of X F or Y F , then C must contain a chord. Hence, C has exactly four vertices, with exactly two vertices each from X F and Y F . We will first show that z i C for any ip. Assume that z i C for some i[p]. Then, of the remaining three vertices of C, at least two of them belong to one of X F and Y F . Let a, b be those two vertices. Without loss of generality assume that {a,b}X F . Since, FX F , vertices z i , a, b form a clique in H. Thus, C is not chordless, a contradiction.

Let x i ,x j be the vertices of X F in C where 1≤i<jm. Similarly, let y i , y j be the vertices of Y F in C where 1≤i<jm. Now, either ii or i>i. If ii, then {x1,…,x i ,y i ,…,y m ,z1,…,z p }O F and thus vertices x i , y i , y j form a clique. Hence, C is not chordless, a contradiction. If i>i, then from the above claim neither of the edges { x i , y i } and { x j , y i } can exist. Thus, vertex y i cannot be in C, a contradiction. Hence, H does not contain a chordless cycle and is triangulated.

Assume that H contains an illegal clique H; that is, H contains two internal edges e and e. By construction, F cannot contain a leaf. By the legality of F and the construction of F, edges e and e are from different input trees and both are differentiated by F. Let e={x i ,y i } for some i[m] and let e={x j ,y j } for some j[m]. Without loss of generality, assume that i<j. By the above claim, there is no edge between x j and y i in H; thus, H is not a clique, a contradiction.

Lemma 21.

GΔ is chordal.

Proof.

Assume the contrary. Let C be a chordless cycle of length at least four in GΔ. By construction, C cannot contain a leaf. There are two cases. Case 1: There are vertices u,vV(C) and a cut FF where uX F F and vY F F.

We have two subcases. Case 2: There is no cutFF with vertices uX F F and vY F F such that u,vV(C ). Thus, for every cut FF at most two vertices of V(C) are in F. Let x1,x2,x3,x4 be a path of length four in C. For every i{1,2,3}, let F ( i ) F be the cut where { x i , x i + 1 } F ( i ) . We will first show that such cuts exist and are distinct.

  1. (a)

    Suppose C contains a vertex xF . Then, there exists a path u,x,v in C. Because C is a cycle, there must exist an edge between a vertex u V(A F )x and v V(B F )x. Since C is chordless, u F and v F . Thus, u V(A F )F and v V(B F )F . By Lemma 19, if u V(A F )X F then there is no edge between u and v . Thus, u X F F . Similarly, v Y F F . If uu or vv , C cannot be chordless. Thus, u=u and v=v and C has length three, a contradiction

  2. (b)

    Suppose C does not contain a vertex of F . Since uV(A F )F , vV(B F )F and F is a cut, there must exist two edges e 1={x 1,y 1} and e 2={x 2,y 2} in C where {x 1,x 2}V(A F )F and {y 1,y 2}V(B F )F . If x 1V(A F )X F , then by Lemma 19 there cannot exist an edge between x 1 and y 1. Thus, x 1X F F . Similarly, x 2X F F and {y 1,y 2}Y F F . Since X F and Y F are cliques in G Δ, there exist edges {x 1,x 2} and {y 1,y 2}. Thus, there cannot exist any other vertex in C and hence V(C)F . But, by Lemma 20 subgraph of G Δ induced by vertices of F is triangulated. Thus, C is not chordless, a contradiction

Recall that every vertex in C is internal. Also, C does not contain any edge e={x,y} from G; otherwise, there would be a cut F that differentiates e, contradicting the assumption for case 2. Since every edge in C is in GΔ, it must be the case that for every edge e in C there exists a cut F where eF. Also, at most two vertices of C are in F. Thus the cuts F(1), F(2) and F(3) are distinct.

To simplify notation, for each i{1,2,3} let A i = A F ( i ) and B i = B F ( i ) . Without loss of generality, assume that E(A1)∩F(2) and E(B2)∩F(1). There are three possibilities.

  1. (a)

    Suppose F (3)E(A 2)≠. If x 1A 2, then by Lemma 18, x 1 F ( 2 ) and C is not chordless, a contradiction. Thus, x 1B 2. Similarly, if x 4B 2, by Lemma 18, x 4 F ( 2 ) and C is not chordless, a contradiction. Thus, x 4A 2. Since C is a cycle, F (2) is a minimal cut and { F ( 2 ) { x 2 , x 3 }}V(C)=, there exists an edge {v 1,v 2} in C where v 1 V( A 2 ) F ( 2 ) and v 2 V( B 2 ) F ( 2 ) . But, by Lemma 19, such an edge cannot exist.

  2. (b)

    Suppose F (3)E(A 1)≠ and F (3)E(B 2)≠. Without loss of generality, assume that A 3, B 3 contain F (2) and F (1) respectively. Assume that x 2A 3. Since x 2 F ( 1 ) , by Lemma 18, x 2 F ( 3 ) . Then, there exists an edge {x 2,x 4} and C is not chordless, a contradiction. Thus, x 2B 3. But x 2 F ( 2 ) and thus, by Lemma 18, x 2 F ( 3 ) . Hence, there exists a chord {x 2,x 4} and C is not chordless, again a contradiction.

  3. (c)

    Suppose F (3)E(B 1)≠. Renaming vertices x 1, x 2, x 3 and x 4 as, x 4, x 3, x 2 and x 1, respectively, brings us back to subcase 2(b).

Thus, GΔ does not contain a chordless cycle of length four or greater; hence, GΔ is chordal.

Proof of Theorem 11.

Lemma 21 states that GΔ is triangulated. We now prove that GΔ is a legal triangulation; i.e., that it satisfies conditions (LT1) and (LT2) of Section ‘Display graphs and edge label intersection graphs’

Condition (LT2) holds for GΔ, because our construction adds no fill-in edge incident on a leaf. Now suppose that GΔ violates (LT1); i.e., GΔ has a clique H with two internal edges e={x1,y1} and e={x2,y2}. Let F be the cut that differentiates e. Assume that x1V(A F ) and y1V(B F ). By Lemma 20, F does not contain both endpoints of e. Without loss of generality, assume that x2F and x2A. Since x2F and y1F, by Lemma 19, there is no edge between x2 and y1 in GΔ. Thus, H is not a clique of GΔ, a contradiction. Hence, GΔ satisfies (LT1) and is therefore a legal triangulation of G(P).

Conclusion

We have shown that the characterization of tree compatibility in terms of restricted triangulations of the edge label intersection graph transforms into a characterization in terms of minimal cuts in the display graph. These two characterizations are closely related to the legal triangulation characterization of [8]. We also derived characterizations of the agreement supertree problem in terms of minimal cuts and minimal separators of the display and edge label intersection graphs respectively.

It remains to be seen whether any of our characterizations can lead to explicit fixed-parameter algorithms for the tree compatibility and agreement supertree problems when parametrized by the number of trees. Indeed, as of yet, the fixed-parameter tractability of agreement remains open.

We close with some remarks on characterizations of two problems related to compatibility. A profile defines a tree S if S is the only compatible supertree for . identifies a tree S if S is a compatible supertree for and every other compatible supertree for displays S. Grunewald et al. [16] use quartet graphs to characterize when a profile consisting of quartet trees defines or identifies a tree. An interesting question is whether similar characterizations can be derived for arbitrary profiles using display graphs or edge label intersection graphs. Along these lines, we note a connection between complete sets of cuts and the question of whether a profile defines a tree, which was pointed out by one of the reviewers. To explain it, we need some definitions ([10], p. 131). Let T be a tree and let q=x y|w z be a quartet tree displayed by T. Quartet tree q distinguishes an interior edge e of T if e is the only interior edge such that {x,y} and {w,z} are in different connected components of Te. Now, let S and T be two trees such that S displays T. An interior edge e of T distinguishes an interior edge f of S if there exists a quartet q such that e and f are both distinguished by q. Suppose is a profile in which there is at least one taxon in common among all input trees. Then, defines a tree S if and only if is compatible and every interior edge of S is distinguished by an interior edge of at least one tree in ([10], p. 133). Now, recall that if is a complete set of cuts of G(P), then, for every tree T i P and every internal edge e of T i , there is some cut FF in which e is the only edge of T i . Thus, if is compatible, e must be a distinguishing edge for some internal edge of a supertree for . This observation could lead to a cut-based characterization of definability analogous to known triangulation-based characterizations (see [10], p. 79).

References

  1. Gordon AD: Consensus supertrees: the synthesis of rooted trees containing overlapping sets of labelled leaves. J Classif. 1986, 9: 335-348.

    Article  Google Scholar 

  2. Aho A, Sagiv Y, Szymanski T, Ullman J: Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions. SIAM J Comput. 1981, 10 (3): 405-421. 10.1137/0210030.

    Article  Google Scholar 

  3. Ng M, Wormald N: Reconstruction of rooted trees from subtrees. Discrete Appl Math. 1996, 69 (1–2): 19-31.

    Article  Google Scholar 

  4. Steel MA: The complexity of reconstructing trees from qualitative characters and subtrees. J Classif. 1992, 9: 91-116. 10.1007/BF02618470.

    Article  Google Scholar 

  5. Bryant D, Lagergren J: Compatibility of unrooted phylogenetic trees is FPT. Theor Comput Sci. 2006, 351: 296-302. 10.1016/j.tcs.2005.10.033.

    Article  Google Scholar 

  6. Courcelle B: The monadic second-order logic of graphs I, Recognizable sets of finite graphs. Inf Comput. 1990, 85: 12-75. 10.1016/0890-5401(90)90043-H.

    Article  Google Scholar 

  7. Arnborg S, Lagergren J, Seese D: Easy problems for tree-decomposable graphs. J Algorithms. 1991, 12 (2): 308-340. 10.1016/0196-6774(91)90006-K.

    Article  Google Scholar 

  8. Vakati S, Fernández-Baca D: Graph triangulations and the compatibility of unrooted phylogenetic trees. Appl Math Lett. 2011, 24 (5): 719-723. 10.1016/j.aml.2010.12.015.

    Article  Google Scholar 

  9. Gysel R, Stevens K, Gusfield D: Reducing problems in unrooted tree compatibility to restricted triangulations of intersection graphs. Algorithms in Bioinformatics – 12th International Workshop, WABI 2012 Ljubljana, Slovenia, September 10–12, 2012. Proceedings, Volume 7534 of Lecture Notes in Computer Science. Edited by: Raphael BJ, Tang J. 2012, 93-105. Heidelberg: Springer

    Google Scholar 

  10. Semple C, Steel M: Phylogenetics. 2003, Oxford Lecture Series in Mathematics, Oxford: Oxford University Presss

    Google Scholar 

  11. Buneman P: The recovery of trees from measures of dissimilarity. Mathematics in the Archaeological and Historical Sciences. 1971, 387-395. Edinburgh: Edinburgh University Press

    Google Scholar 

  12. Parra A, Scheffler P: Characterizations and algorithmic applications of chordal graph embeddings. Discrete Appl Math. 1997, 79 (1–3): 171-188.

    Article  Google Scholar 

  13. Todinca I, : Treewidth and minimum fill-in: grouping the minimal separators. SIAM J Comput. 2001, 31: 212-232. 10.1137/S0097539799359683.

    Article  Google Scholar 

  14. Heggernes P: Minimal triangulations of graphs: a survey. Discrete Math. 2006, 306 (3): 297-317. 10.1016/j.disc.2005.12.003.

    Article  Google Scholar 

  15. Gusfield D: The multi-state perfect phylogeny problem with missing and removable data: solutions via integer-programming and chordal graph theory. J Comput Biol. 2010, 17 (3): 383-399.

    Article  CAS  PubMed  Google Scholar 

  16. Grunewald S, Humphries PJ, Semple C: Quartet compatibility and the quartet graph. Electron J Comb. 2008, 15: R103.

    Google Scholar 

Download references

Acknowledgements

We thank Sylvain Guillemot for his valuable comments. We are also grateful to the reviewers for providing constructive criticism. This work was supported in part by the National Science Foundation under grants CCF-1017189 and DEB-0829674.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Fernández-Baca.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

SV stated and proved the main results of the paper and wrote most of the first draft. DFB proposed the research topic to SV, supervised the research, contributed to the first draft, and was in charge of the final draft. Both authors read and approved the final manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vakati, S., Fernández-Baca, D. Characterizing compatibility and agreement of unrooted trees via cuts in graphs. Algorithms Mol Biol 9, 13 (2014). https://doi.org/10.1186/1748-7188-9-13

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1748-7188-9-13

Keywords