Using Robinson-Foulds supertrees in divide-and-conquer phylogeny estimation

One of the Grand Challenges in Science is the construction of the Tree of Life, an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP-hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a “supertree method”. Here, we present Exact-RFS-2, the first polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree (RFS) criterion (a major approach in supertree estimation that is related to maximum likelihood supertrees), and we prove that finding the RFS of three input trees is NP-hard. Exact-RFS-2 is available in open source form on Github at https://github.com/yuxilin51/GreedyRFS.


Introduction
Supertree construction (i.e., the combination of a collection of trees, each on a potentially different subset of the species, into a tree on the full set of species) is a natural algorithmic problem that has important applications to computational biology; see [1] for a 2004 book on the subject and [2][3][4][5][6][7][8][9] for some of the recent papers on this subject. Supertree methods are particularly important for large-scale phylogeny estimation, where it can be used as a final step in a divide-and-conquer pipeline [10]: the species set is divided into two or more overlapping subsets, unrooted leaf-labelled trees are constructed (possibly recursively) on each subset, and then these subset trees are combined into a tree on the full dataset, using the selected supertree method. Furthermore, provided that optimal supertrees are computed, divide-and-conquer pipelines can be provably statistically consistent under stochastic models of evolution: i.e., as the amount of input data (e.g., sequence lengths when estimating gene trees, or number of gene trees when estimating species trees) increases, the probability that the true tree is returned converges to 1 [11,12].
Unfortunately, the most accurate supertree methods are typically local-search heuristics for NP-hard optimization problems [2,3,7,[13][14][15][16][17], and are computationally intensive on large datasets. However, divide-and-conquer strategies, especially recursive ones, may only need to apply supertree methods to two trees at a time, and hence the computational complexity of supertree estimation given two trees is of interest. One optimization problem where optimal supertrees can be found on two trees is the NP-hard Maximum Agreement Supertree (SMAST) problem (also known as the Agreement Supertree Taxon Removal problem), which removes a minimum number of leaves so that the reduced trees have an agreement supertree [4,6]. Similarly, the Maximum Compatible Supertree (SMCT) problem, which removes a minimum number of leaves so that the reduced trees have a compatibility supertree [18,19], can also be solved

Open Access
Algorithms for Molecular Biology *Correspondence: yuxilin51@gmail.com 1 Amazon AWS, Seattle, USA Full list of author information is available at the end of the article in polynomial time on two trees (and note that SMAST and SMCT are identical when the input trees are fully resolved). Because SMAST and SMCT remove taxa, methods for these optimization problems are not true supertree methods, because they do not return a tree on the entire set of taxa. However, solutions to SMAST and SMCT could potentially be used as constraints for other supertree methods, where the deleted leaves are added into the computed SMAST or SMCT trees, so as to optimize the desired criterion.
When restricting to methods that return trees on the full set of taxa, much less seems to be understood about finding supertrees on two trees. However, if the two input trees are compatible (i.e., there is a supertree that equals or refines each tree when restricted to the respective leaf set), then finding that compatibility supertree is solvable in polynomial time, using (for example) the well-known BUILD algorithm [20], but more efficient algorithms exist (e.g., [19,21]).
Since compatibility is a strong requirement (rarely seen in biological datasets), optimization problems are more relevant. One optimization problem worth discussing is the Maximum Agreement Supertree Edge Contraction problem (which takes as input a set of rooted trees and seeks a minimum number of edges to collapse so that an agreement supertree exists). This problem is NP-hard, but the decision problem can be solved in O((2k) p kn 2 ) time when the input has k trees and p is the allowed number of number of edges to be collapsed [4]. Note that the algorithm for MAST-EC proposed by [4] may be exponential even for two trees, when the number of edges that must be collapsed is �(n) (e.g., imagine two caterpillar trees, where one is obtained from the other by moving the left-most leaf to the rightmost position).
In sum, while supertree methods are important and well studied, when restricted to the major optimization problems that do not remove taxa, polynomial time methods do not seem to be available, even for the special case where the input contains just two trees. This restriction has consequences for large-scale phylogeny estimation, as without good supertree methods, divideand-conquer pipelines are not guaranteed to be statistically consistent, are not fast, and do not have good scalability [12].
In this paper we examine the well-known Robinson-Foulds Supertree (RFS) problem [22], which seeks a supertree that minimizes the total Robinson-Foulds [23] distance to the input trees. Although RFS is NPhard [24], it has several desirable properties: it is closely related to maximum likelihood supertrees [25] and, as shown very recently, has good theoretical performance for species tree estimation in the presence of gene duplication and loss [26]. Because of its importance, there are several methods for RFS supertrees, including Plu-MiST [5], MulRF [27], and FastRFS [28]. A comparison between FastRFS and other supertree methods (MRL [2], ASTRAL, ASTRID [29], PluMiST, and MulRF) on simulated and biological supertree datasets showed that Fas-tRFS matched or improved on the other methods with respect to topological accuracy and RFS criterion scores [28]. Hence, FastRFS is currently the leading method for the RFS optimization problem.
The main contributions of this paper are: • We present Exact-2-RFS, a polynomial time algorithm for the Robinson-Foulds Supertree (RFS) of two trees, which establishes that RFS is solvable in O(n 2 |X|) time for two trees, where n is the number of leaves and X is the set of shared leaves (Theorem 1). We also show that RFS is NP-hard for three or more trees (Lemma 5). • We prove that divide-and-conquer pipelines using Exact-RFS-2 are statistically consistent methods for phylogenetic tree estimation (both gene trees and species trees) under standard evolutionary models (Theorem 2). • We establish the relationships between RFS and other supertree problems (Sect. 2.1), showing that it is equivalent to some other problems but not to all.
The remainder of the paper is organized as follows. In Sect. 1, we provide terminology and define the optimization problems we consider. We present the Exact-RFS-2 algorithm and establish theory related to the algorithm in Sect. 2. We conclude in Sect. 3 with a summary of our results and a discussion regarding future research directions.

Terminology and problem statements
We denotes the leaf set of t) and the output is a tree T where L(T) is the set of all species that appear as a leaf in at least one tree in A , which we will assume is all of S. We use the standard supertree terminology, and refer to the trees in A as "source trees" and the set A as a "profile". For a tree T, let V(T) and E(T) denote the set of vertices and edges of T, respectively.

Robinson-Foulds Supertree
Each edge e in a tree T defines a bipartition π e := [A|B] of the leaf set, and each tree is defined by the set C(T ) := {π e | e ∈ E(T )} . The Robinson-Foulds distance [23] (also called the bipartition distance) between trees T and T ′ with the same leaf set is RF(T , T ′ ) := |C(T )\C(T ′ )| + |C(T ′ )\C(T )| . We extend the definition of RF distance to allow for T and T ′ to have different leaf sets as follows: where X is the shared leaf set and t| X denotes the homeomorphic subtree of t induced by X. Letting T S denote the set of all phylogenetic trees such that L(T ) = S and T B S denote the binary trees in T S , then a Robinson-Foulds supertree [22] of a profile A is a binary tree We let RF(T , A) := i∈[N ] RF(T , T i ) denote the RFS score of T with respect to profile A . Thus, the Robinson-Foulds Supertree problem takes as input the profile A and seeks a Robinson-Foulds (RF) supertree for A , which we denote by RFS(A).
Split Fit Supertree The Split Fit (SF) Supertree problem was introduced in [30], and is based on optimizing the number of shared splits (i.e., bipartitions) between the supertree and the source trees. For two trees T, T ′ with the same leaf set, the split support is the number of shared bipartitions, i.e., SF(T , For trees with different leaf sets, we restrict them to the shared leaf set before calculating the split support. The Split Fit supertree for a profile A of source trees, denoted Thus, the split support score of T with respect to A is SF(T , A) := i∈[N ] SF(T , T i ) . The Split Fit Supertree (SFS) problem takes as input the profile A and seeks a Split Fit supertree (the supertree with the maximum split support score), which we denote by SFS(A).

Nomenclature for variants of RFS and SFS problems
• The relaxed versions of the problems where we do not require the output to be binary (i.e., we allow T ∈ T S ) are named Relax-RFS and Relax-SFS. • We append "-N" to the name to indicate that we assume there are N source trees. If no number is specified then the number of source trees is unconstrained. • We append "-B" to the name to indicate that the source trees are required to be binary; hence, we indicate that the source trees are allowed to be nonbinary by not appending -B.
Thus, the RFS problem with two binary input trees is RFS-2-B and the relaxed SFS problem with three (not necessarily binary) input trees is Relax-SFS-3.
Other notation For any v ∈ V (T ) , we let N T (v) denote the set of neighbors of v in T. A tree T ′ is a refinement of T iff T can be obtained from T ′ by contracting a set of edges. Two bipartitions π 1 and π 2 of the same leaf set are said to be compatible if and only if there exists a tree T such that π i ∈ C(T ), i = 1, 2 . A bipartition π = [A|B] restricted to a subset R is π| R = [A ∩ R|B ∩ R] . For a graph G and a set F of vertices or edges, we use G + F to represent the graph obtained from adding the set F of vertices or edges to G, and G − F is defined for deletions, similarly.

Theoretical results
In this section we establish the main theoretical results, including the relationship between supertree problems (Sect. 2.1), the proof that RFS for 3 trees is NP-hard (Sect. 2.2), the polynomial time Exact-2-RFS algorithm (Sect. 2.3), and the use of this algorithm within divideand-conquer pipelines for statistically consistent phylogeny estimation (Sect. 2.4).

Relationships between supertree problems
This section establishes the relationships between the different supertree problems. We establish that some supertree problems have the same optimal solutions, others do not, etc. We begin by establishing the equivalence between the RFSand SFSsupertree problems.

Lemma 1 Given an input set A of source trees, a tree T ∈ T B S is an optimal solution for RFS(A) if and only if it is an optimal solution for SFS(A).
Proof Let T 1 , T 2 , . . . , T N and S 1 , S 2 , . . . , S N be defined as from problem statement of RFS . Let T be any binary tree of leaf set S. Then T | S i is also binary and thus Taking the sum of the equations over i ∈ [N ] , we have which is a constant (i.e., it does not depend on the tree T). Therefore, for any binary tree T and any profile A of source trees, the sum of T's RFS score and twice T's split support score is the same, independent of T. This implies that minimizing the RFS score is the same as maximizing the split support score. Although this argument depends on the output tree being binary, it does not depend on the input trees being binary. Hence, we conclude that RFS and SFS have the same set of optimal supertrees.
In contrast with Lemma 1, we will show that Relax-RFS and Relax-SFS are not equivalent. (i.e., T has no internal edges). We note that , ′ contains all the nontrivial bipartitions from the trees in A ). Let T ′ be the caterpillar tree on leaf set [N] (i.e., T ′ is formed by taking a path of length N − 2 with vertices v 2 , v 3 , . . . , v N −1 in that order, and making leaf 1 adjacent to v 2 , leaf i adjacent to v i , and leaf N adjacent to v N −1 ). We note that T ′ is the unique tree such that C(T ′ ) = � [N ] ∪ � ′ and thus a compatibility supertree for A.
We will show that (1) T is an optimal solution for Relax -RFS(A ), but not an optimal solution for Relax-SFS (A ), and (2) that T ′ is an optimal solution for Relax-SFS (A ), but not an optimal solution for Relax-RFS(A ). (1) We first show that T is not an optimal solution for Relax -SFS(A ). Since T' is a compatibility supertree of trees in A , it achieves the maximum split support score possible. In particular, , we have for any N ≥ 5 . Therefore, T is not an optimal solution for Relax-SFS(A). (1) (2) Next, we show that T is an optimal solution to Relax-RFS(A ). Since |C(T )\C(T i )| + |C(T i )\C(T )| = 1 for all i ∈ [N − 3] , the RFS score of T is Now consider any tree t = T with leaf set [N], and suppose t contains p bipartitions in ′ and q bipartitions in where p, q ∈ N . Since t = T , at least one of p and q is nonzero. Therefore, Since N ≥ 5 and both p and q are non-negative with at least one of them nonzero, we know the RFS score of t is strictly greater than that of T. Therefore, T is an optimal solution to Relax-RFS(A).
For (2), the analysis above already shows that T ′ has the largest possible split support score. Hence, T ′ is an optimal solution to the relaxed Split Fit Supertree problem. However, the RFS score for the star tree T is N − 3 and the RFS score for T ′ is (N − 4)(N − 3) , which is strictly larger than N − 3 for N > 5 ; hence, T ′ is not an optimal solution for the relaxed RF supertree problem.
We show that the Split Fit Supertree problem and the Asymmetric Median Supertree ( AMS ) problem, which was introduced in [31] and which we will present below, have the same set of optimal solutions and thus the hardness of one implies hardness of another.
The Asymmetric Median Supertree problem takes a profile A = {T 1 , T 2 , . . . , T N } with leaf sets S i for T i and finds a binary tree T * on leaf set S := i∈[N ] S i such that In other words, the asymmetric median supertree T * minimizes the total number of bipartitions that are in the source trees and not in the supertree (equivalently, it minimizes the total number of false negatives).    Proof Let FN(T , A) = i∈[N ] |C(T i ) \ C(T | S i )| be the total number of false negatives of T with respect to A , and we refer to this as the false negative score of T. Then,

Lemma 3 Given a profile
Since the sum of the split support score of T and the false negative score of T is the same, regardless of T, minimizing the false negative score is the same as maximizing the split support score. Hence any tree T is an Asymmetric Median supertree if and only if it is a Split Fit supertree, for all profiles A .
Recall that the SMAST and SMCT problems seek trees that are obtained after deleting minimal numbers of leaves from the input trees so that an agreement supertree or compatible supertree can be constructed from the reduced input trees. Here, we examine the possibility of using these output trees as constraint trees on the search for RFS supertrees, so that the removed taxa could be introduced into the constraint trees. We show that exact solutions to the SMAST and SMCT (Maximum Agreement Supertree and Maximum Compatible Supertree) problems are not directly relevant to solving the Robinson-Foulds supertree problem.

Lemma 4
There exists a pair of binary trees T 1 and T 2 for which some optimal SMAST or SMCT supertree cannot be extended to any optimal RFS supertree through the insertion of missing taxa.
Proof Let T 1 and T 2 be unrooted trees, with T 1 given by the Newick string (A, ((B, x), ((C, y), (D, E)))) and T 2 given by (A, (C, (z, (B, (D, E))))). An RFS supertree for this pair (T 1 , T 2 ) is given by (A, ((C, y), (z, ((B, x), (D, E)) ))), and has total RF distance to T 1 and T 2 equal to 2. Note that at least one of A,B,C must be deleted to form an agreement supertree. Suppose C is deleted. Then ((A, z), ((B, x), (y, (D, E)))) is an optimal SMAST. Observe that any way of adding C into this tree produces a supertree that has total RFS score greater than 2. Hence, for this pair of input trees, for at least one optimal SMAST supertree, there is no way to extend that optimal supertree into an optimal RFS supertree. The same proof follows for the SMCT problem, since SMCT and SMAST (9) are identical when the input trees are fully resolved (binary).

NP-hardness results
We establish that some supertree problems are NP-hard.
Proof By Lemma 3 and Lemma 1, we know that for any profile A , the Robinson-Foulds, Split Fit, and Asymmetric Median supertrees all have the same set of optimal solutions. We also note that the Asymmetric Median Tree problem was shown to be NP-hard for three trees [32], which is the same as the Asymmetric Median Supertree problem when all three trees have the same set of species. Therefore, SFS-3 and RFS-3 are both NPhard. Since refining a tree never decreases its split support score, SFS-3 trivially reduces to Relax-SFS-3 , and thus Relax-SFS-3 is also NP-hard.

Solving RFS and SFS on two binary trees
The main result of this paper is Theorem 1 and the polynomial time algorithm, Exact-RFS-2, for RFS and SFS of two binary trees. The proof for Theorem 1 is provided later; here we present the algorithm, Exact-RFS-2, which we use to establish Theorem 1.
The input to Exact-RFS-2 is a pair of binary trees T 1 and T 2 . Let X denote the set of shared leaves. At a high level, Exact-RFS-2 constructs a tree T init that has a central node that is adjacent to every leaf in X and to the root of every "rooted extra subtree" (a term we define below under "Additional notation") so that T init contains all the leaves in S. It then modifies T init by repeatedly refining it to add specific desired bipartitions, to produce an optimal Split Fit (and optimal Robinson-Foulds) supertree (Fig. 3). The bipartitions that are added are defined by a maximum independent set in a bipartite "weighted incompatibility graph" we compute.
Additional notation Let 2 X denote the set of all bipartitions of X; any bipartition that splits a single leaf from the remaining |X| − 1 leaves will be called "trivial" and the others will be called "non-trivial". Let C(T 1 , T 2 , X) denote C(T 1 | X ) ∪ C(T 2 | X ) , and let Triv and NonTriv denote the sets of trivial and non-trivial bipartitions in C(T 1 , T 2 , X) , respectively. We refer to T i | X , i = 1, 2 as backbone trees (Fig. 2). Recall that we suppress degreetwo vertices when restricting a tree T i to a subset X of the leaves; hence, every edge e in T i | X will correspond to an edge or a path in T (see Fig. 1 for an example). We will let P(e) denote the path associated to edge e ∈ T i | X , and let w(e) := |P(e)| (the number of edges in P(e)). Finally, for π ∈ C(T i | X ) , we define e i (π ) to be the edge that induces π in T i | X (Fig. 1). The next concept we introduce is the set of extra subtrees, which are rooted subtrees of T 1 and T 2 , formed by deleting X and all the edges and vertices on paths between vertices in X (i.e., we delete and the extra subtree t is naturally seen as rooted at the unique vertex r(t) that is adjacent to a vertex in T i | X . Thus, Note that if X contains the leaves of T i then there are no extra subtrees associated to T i for X.
We can now define the initial tree T init computed by Exact-RFS-2: T init has a center node that is adjacent to every x ∈ X and also to the root r(t) for every extra subtree t ∈ Extra(T 1 ) ∪ Extra(T 2 ) . Note that T init has a leaf for every element in S, and that T init | S i is a contraction of T i , formed by collapsing all the edges in the backbone tree T i | X .
We say that an extra subtree t is attached to edge e ∈ E(T i | X ) if the root of t is adjacent to an internal node of P(e), and we let T R(e) denote the set of such extra subtrees attached to edge e. Similarly, if π ∈ C(T 1 , T 2 , X) , we let T R * (π ) refer to the set of extra subtrees that attach to edges in a backbone tree that induce π in either T 1 | X or T 2 | X . For example, if both trees T 1 and T 2 contribute extra subtrees to π , then T R * (π ) := i∈ [2] T R(e i (π )).
For any Q ⊆ X , we let BP i (Q) denote the set of bipartitions in C(T i | X ) that have one side being a strict subset of Q, and we let T RS i (Q) denote the set of extra subtrees associated with these bipartitions. In other words, Intuitively, T RS i (Q) denotes the set of extra subtrees in T i that are "on the side of Q". Note that for any Fig. 2 We show (a) T 1 | X , (b) T 2 | X , and (c) their incompatibility graph, based on the trees T 1 and T 2 in Fig. 1 (without the trivial bipartitions). Each π i is the bipartition induced by e i , and the weights for π 1 , . . . , π 8 are 3, 4, 1, 1, 2, 2, 2, 3, in that order. We note that π 1 and π 5 are the same bipartition, but they have different weights as they are induced by different edges; similarly for π 3 and π 7 . The maximum weight independent set in this graph has all the isolated vertices ( π 1 , π 3 , π 5 , π 7 ) and also π 2 , π 8 , and so has total weight 15 We give an example for these terms in Fig. 1.
The incompatibility graph of a set of trees, each on the same set of leaves, has one vertex for each bipartition in any tree (and note that bipartitions can appear more than once) and edges between bipartitions if they are incompatible (see [32]). We compute a weighted incompatibility graph for the pair of trees T 1 | X and T 2 | X , in which the weight of the vertex corresponding to bipartition π in 1 and the score contributed by bipartitions in 2 ; thus, the split support score of T with respect to T 1 , T 2 is p 1 (T ) + p 2 (T ).
As we will show, the two scores can be maximized independently and we can use this observation to refine T init so that it achieves the optimal total score.
Overview of Exact-RFS-2 Algorithm 1 Exact-RFS-2: Computing a Robinson-Foulds supertree of two trees (see Figure 3) Input: two binary trees T 1 , T 2 with leaf sets S 1 and S 2 where S 1 ∩ S 2 = X = ∅ Output: a binary supertree T on leaf set S = S 1 ∪ S 2 that maximizes the split support score

5: compute BP(A), BP(B), T RS(A), T RS(B)
, and T R * (π), 6: construct T as a star tree with leaf set X and center vertexv and with the root of each t ∈ Extra(T 1 ) ∪ Extra(T 2 ) connected tov by an edge let T init = T 7: construct the weighted incompatibility graph G of T 1 | X and T 2 | X 8: compute the maximum weight independent set I * in G 9: let I be the set of bipartitions associated with vertices in I * 10: for each π = [{a}|B] ∈ Triv do 11: detach all extra subtrees in T R * (π) fromv and attach them onto (v, a) such that T R(e 1 (π)) are attached first with their ordering matching their attachments on e 1 (π) and T R(e 2 (π)) are attached to the right of all subtrees in T R(e 1 (π)) with the ordering of them also matching their attachments on e 2 (π) letT = T after for loop 12: H(v) = NonTriv, set sv(π) =v for all π ∈ NonTriv 13: for each π ∈ NonTriv ∩ I (in any order) do 14: T ← Refine(T, π, H, sv) let T * = T after for loop 15: arbitrarily refine T to make it a binary tree 16: return T appearing in tree T i | X is w(e i (π )) , as defined previously. Thus, if a bipartition is common to the two trees, it produces two vertices in the weighted incompatibility graph, and each vertex has its own weight (Fig. 2).
Intuitively, 1 is the set of bipartitions from the input trees that are induced by edges in the minimal subtree of T 1 or T 2 spanning X, and 2 are all the other input tree bipartitions. We define p 1 (·) and p 2 (·) on trees T ∈ T S by: Note that p 1 (T ) and p 2 (T ) decompose the split support score of T into the score contributed by bipartitions Exact-RFS-2 (Algorithm 1) has four phases. In the pre-processing phase (lines 1-5), it computes the weight function w and the mappings T R, T R * , BP , and T RS for use in latter parts of Algorithm 1 and Algorithm 2. In the initial construction phase (line 6), it constructs a tree T init (as described earlier), and we note that T init maximizes p 2 (·) score (Lemma 7). In the refinement phase (lines 7-14), it refines T init so that it attains the maximum p 1 (·) score, without changing the p 2 (·) score. In the last phase (line 15), it arbitrarily refines T to make it binary. The refinement phase begins with the construction of a weighted incompatibility graph G of T 1 | X and T 2 | X (see Fig. 2). It then finds a maximum weight independent set of G that defines a set I ⊆ C(T 1 , T 2 , X) of compatible bipartitions of X. Finally, it uses these bipartitions of X in I to refine T init to achieve the optimal p 1 (·) score, by repeatedly applying Algorithm 2 for each π ∈ I (and we note that the order does not matter). See Fig. 3 for an example of Exact-RFS-2 given two input source trees.

Algorithm 2 Refine
Input: a tree T on leaf set S, a nontrivial bipartition π = [A|B] of X, two data structures H and sv Output: a tree T which is a refinement of T such that for both i = 1, 2, : detach all extra subtrees in T R * (π) from v and attach them onto (v a , v b ) such that T R(e 1 (π)) are attached first with their ordering matching their attachments on e 1 (π) and T R(e 2 (π)) are attached to the right of all subtrees in T R(e 1 (π)) with the ordering of them also matching their attachments on e 2 (π) if π ∈ BP(A) then 17: Algorithms 1 and 2 also use two data structures (functions) H and sv: (1) For a given node v ∈ V (T ) , H(v) ⊆ C(T 1 , T 2 , X) is the set of bipartitions of X that can be added to T | X by refining T | X at v, and (2) Given Formal proofs We start the formal theory and proofs with a natural relationship between edges in restricted trees and restricted bipartitions.  Algorithm 2 refines the given tree T on leaf set S with bipartitions on X from C(T 1 , T 2 , X) \ C(T | X ) . Given bipartition π = [A|B] on X, Algorithm 2 produces a refinement T ′ of T such that To do this, we first find the unique vertex v such that no component of T − v has leaves from both A and B. We create two new vertices v a and v b with an edge between them. We divide the neighbor set of v into three sets: N A is the set of neighbors that split v from leaves in A, N B is the set of neighbors that split v from leaves in B, and N other contains the remaining neighbors. Then, we make vertices in N A adjacent to v a and vertices in N B adjacent to v b . We note that N other = ∅ if X = S and thus there are no extra subtrees. In the case where X = S , N other contains the roots of the extra subtrees adjacent to v and we handle them in four different cases to make T ′ include the desired bipartitions: • those vertices that root extra subtrees in T R * (π ) are moved onto the edge (v a , v b ) (by subdividing the edge to create new vertices, and then making these vertices adjacent to the new vertices) • those vertices that root extra subtrees in T RS(A) are made adjacent to v a Proof Let T R be the minimal subtree of T that spans R. It follows that the leaf set of T R is R and T | R is obtained from T R by suppressing all degree-two vertices.
(Proof of 1) We first claim that if R ∩ A = ∅ or R ∩ B = ∅ , then e / ∈ E(T R ) . Assume by way of contradiction that e ∈ E(T R ) . There are then two non-empty components in T R − e . Since e induces [A|B] in T, the two components in T R − e have leaf sets R ∩ A and R ∩ B , which contradicts the assumption that one intersection is empty. Therefore, e / ∈ E(T R ) . Furthermore, every edge e ′ ∈ E(T | R ) comes from a path in T R . Since e / ∈ E(T R ) , then e / ∈ P(e ′ ) for any e ′ ∈ E(T | R ).
Thus, e is in any subtree of T spanning R; in particular, e ∈ E(T R ) . Fix any π ′ ∈ C(T | R ) induced by e ′ ∈ E(T | R ) . Note that the bipartition induced by P(e ′ ) in T R equals the bipartition induced by e ′ in T | R , i.e., π ′ . For one direction of the proof, suppose e ∈ P(e ′ ) . Because internal vertices of P(e ′ ) in T R are not adjacent to any leaves, the bipartition induced by the path P(e ′ ) in T R equals the bipartition induced by any of its edges (and hence, in particular, by e). Since e induces [A|B] On the other hand, if π| R = π ′ , then π ′ induces [R ∩ A|R ∩ B] in T | R . It follows that P(e ′ ) also induces [R ∩ A|R ∩ B] in T R . Suppose e ∈ P(e * ) for some edge e * ∈ E(T | R ) such that e * � = e ′ . Then, by the previous argument, π e * = [R ∩ A|R ∩ B] , which contradicts the assumption that e * and e ′ are different edges. Therefore, e ∈ P(e ′ ).
Lemma 7 formally states that the tree T init we build in line 6 of Exact-RFS-2 (Algorithm 1) maximizes the p 2 (·) score.
Proof Since T 1 and T 2 have different leaf sets, C(T 1 ) and C(T 2 ) are disjoint. Since � 2 ⊆ C(T 1 ) ∪ C(T 2 ) , C(T 1 ) ∩ � 2 and C(T 2 ) ∩ � 2 form a disjoint decomposition of 2 . By definition of p 2 (·) , for any tree T of leaf set S, Fix any π = [A|B] ∈ � 2 . Suppose π ∈ C(T i ) and is induced by e ∈ E(T i ) for some i ∈ [2] . By definition of 2 , either A ∩ X = ∅ or B ∩ X = ∅ . By Lemma 6, e / ∈ P(e ′ ) for any backbone edge e ′ ∈ E(T i | X ) . Therefore, either e is an internal edge in an extra subtree in Extra(T i ) , or e connects one extra subtree in Extra(T i ) to the backbone tree. In either case, the construction of T init ensures that e is also present in T init | S i and thus π ∈ C(T init | S i ) . Therefore, each bipartition π ∈ � 2 contributes 1 to |C(T init | S i ) ∩ C(T i ) ∩ � 2 | for exactly one index i ∈ [2] and thus it contributes 1 to p 2 (T init ) . Hence, p 2 (T init ) = |� 2 | .
Lemma 8 shows that w * (π ) represents the maximum potential increase in p 1 (·) as a result of adding bipartition π to T | X . The proof of Lemma 8 follows the idea that for any bipartition π of X, there are at most w * (π ) edges in either T 1 or T 2 whose induced bipartitions become π when restricted to X. Therefore, by only adding π to T | X , at most w * (π ) more bipartitions get included in C(T | S 1 ) or C(T | S 2 ) so that they contribute to the increase of p 1 (T ).

Lemma 8 Let π = [A|B]
be a bipartition of X where X ⊆ S . Let T ∈ T S be any tree with leaf set S such that π / ∈ C(T | X ) but π is compatible with C(T | X ) . Let T ′ be a refinement of T such that for all π ′ ∈ C( Proof By definition of p 1 (·), i∈ [2] w(e i (π )) else. Therefore, we only need to prove that We perform a case analysis, as follows: Case (1): π / ∈ C(T 1 , T 2 , X) , Case (2): π ∈ C(T 1 | X )�C(T 2 | X ) , and Case (3): π ∈ C(T 1 | X ) ∩ C(T 2 | X ).

Lemma 9
For any compatible set F of bipartitions on X where X ⊆ S , let T ∈ T S be any tree with leaf set S such that C(T | X ) = F . Then p 1 (T ) ≤ w * (F ).
The proof of Lemma 9 is straightforward, and uses Lemma 8 repeatedly by adding the compatible bipartitions to the tree in any selected order.
Proposition 1 Let T be the tree constructed after line 11 of Algorithm 1, then p 1 (T ) = w * (Triv).
The proof naturally follows by construction (Line 8 of Algorithm 1), and implies that the algorithm adds the trivial bipartitions of X (which are all in I) to T | X so that p 1 (T ) reaches the full potential of adding those trivial bipartitions.
Lemma 10 will show that the auxiliary data structures of Algorithm 1 and 2 are maintaining the desired information and that the algorithm can split the vertex and perform the detaching and reattaching of the extra subtrees correctly. These invariants are important to the proof of Lemma 11.

Lemma 10
At any stage of the Algorithm 1 after line 12, we have the following invariants of T and the auxiliary data structures H and sv: 1 For any bipartition π ∈ NonTriv , sv(π ) is the vertex to split to add π to C(T | X ) . For any internal vertex v, the set of bipartitions H(v) ⊆ NonTriv is the set of bipartitions which can be added to C(T | X ) by splitting v. i∈ [2] |(C(T ′ | S i )\C(T | S i )) ∩ C(T i ) ∩ � 1 | ≤ w(e 1 (π )) + w(e 2 (π )) = w * (π ). Proof We prove the invariants by induction on the number of refinement steps k performed on T. When k = 0 , we have T =T and T | X is a star with leaf set X and center vertex v . Thus all bipartitions in NonTriv are compatible with C(T | X ) . For any π ∈ NonTriv , the center vertex v is the vertex to refine in T | X in order to add π to C(T | X ) . Therefore, it is correct that sv(π ) =v for every π ∈ NonTriv and H (v) = NonTriv . The roots of all extra subtrees in T R * (π ) for any π ∈ NonTriv are all neighbors of v , so invariant 2 also holds. We note that at this point, C(T | X ) = Triv . For any trivial bipartition π ∈ C(T | X ) , let π = [{a}|B] . It is easy to see that since a is a leaf, Assume that all invariants hold after any k ′ < k steps of refinement. Let π = [A|B] be the bipartition to add in the kth refinement step. We will show that after the kth refinement step, i.e., one execution of Algorithm 2, the invariants still hold for the resulting tree T ′ . Since v = sv(π ) at the beginning of Algorithm 2, π can be added to C(T | X ) by splitting v. We can divide the set of neighbors of v in T | X into N A ∪ N B such that N A (or N B respectively) consists of neighbors of v that can reach vertices of A (or B) but not B (or A) in T | X − v . Then, the algorithm correctly finds N A and N B and connects N A to v a and N B to v b so the new edge (v a , v b ) induces the bipartition π = [A|B] in T | X . For any vertex u other than v and any bipartition π ′ ∈ H (u) , the invariants 1 and 2 still hold after Algorithm 2 as we do not change H(u), sv(π ′ ) , or the extra subtrees attached to u. For any bipartition π ′ ∈ H (v) such that π ′ � = π , if π ′ is not compatible with π , then it cannot be added to C(T ′ | X ) since π is added, so the algorithm correctly discards π ′ and does not add it to H(v a ) or H(v b ) . If π ′ is compatible with π , we will show that the invariants 1 and 2 still hold for π ′ .
Fix any π ′ = [A ′ |B ′ ] ∈ H (v) s.t. π ′ � = π and π ′ is compatible with π . One of A ′ and B ′ must be a subset of one side of [A|B]. Assume without loss of generality that A ′ ⊆ A (other cases are symmetric), so that B ⊆ B ′ . In this case, Algorithm 2 adds π ′ to H(v a ) and sets sv(π ) = v a . We will show that this step preserves the invariants. Since π ′ ∈ H(v) , before adding π we also could have split v to add π ′ to C(T | X ) . Then there exists a division of neighbors of v in T | X into N A ′ and N B ′ such that N A ′ (or N B ′ , respectively) consists of neighbors of v that can reach vertices of Such a division proves that v a is the correct vertex to refine in T ′ | X to add π ′ to C(T ′ | X ) after the kth refinement. Therefore, invariant 1 holds with respect to π ′ . Since π ′ ∈ H (v) before adding π , we also have for all t ∈ T R * (π ′ ) , the root of t is a neighbor of v before adding π . Since A ′ ⊆ A , π ′ ∈ BP(A) and thus T R * (π ) ⊆ T RS(A) . Then, Algorithm 2 correctly attaches roots of all trees in T R * (π ′ ) to v a . Therefore invariant 2 holds for π ′ .
We have shown that invariants 1 and 2 hold for the tree T ′ with the auxiliary data structures H and sv. Next, we show that invariant 3 holds. Since π is the only bipartition in C(T ′ | X ) that is not in C(T | X ) , we only need to show two things: i) for any π ′ ∈ C(T | X ) , the invariant 3 still holds, ii) invariant 3 holds for π . We first show i). Fix π ′ = [A ′ |B ′ ] ∈ C(T | X ) . Since π is compatible with π ′ , one of A ′ and B ′ must be a subset of one of A and B. We assume without loss of generality that A ′ ⊆ A . Therefore, B ⊆ B ′ . Let Comp(A ′ ) and Comp(B ′ ) be the components containing the leaves of During the refinement, v is split into v a and v b , both of which are still part of Comp(B ′ ) . Since all t ∈ T RS(B) are attached to an edge or a vertex in Comp(B ′ ) before refinement and any extra subtree attached to v before is now on either v a , or v b , or (v a , v b ) , all of which are part of Comp(B ′ ) , they are all still attached to an edge or a vertex in Comp(B ′ ) . Thus, the invariant 3 holds with respect to π ′ .
For ii), we show invariant 3(a) holds for π and 3(b) follows the same argument. For any extra subtree in t ∈ T RS(A) , if it was attached to v before refinement, then it is now attached to v a , which is in Comp(A). If it was not attached to v before refinement, then let N B be as defined from Algorithm 2. For any bipartition π ′ = [A ′ |B ′ ] induced by (v, u) where u ∈ N B . We know that (v, u) ∈ Comp(B) and thus either A ′ ⊆ B or B ′ ⊆ B . Assume without loss of generality that B ′ ⊆ B . Then we have BP(B ′ ) ∪ {π ′ } ⊆ BP(B) and thus T RS(B ′ ) ∪ T R * (π ′ ) ⊆ T RS(B) . We note that T RS(A) and T RS(B) are disjoint. Since t ∈ T RS(A) , we know t / ∈ T RS(B) , then t / ∈ T RS(B ′ ) ∪ T RS * (π ′ ) . Let Comp(A ′ ) and Comp(B ′ ) be the components containing the leaves of A ′ and B ′ in T | X − (v, u) , respectively. Then Comp(A ′ ) contains v and Comp(B ′ ) contains u. Since t / ∈ T R * (π ′ ) , it cannot be attached to (v, u). Also by the invariant 3 with respect to π ′ , t is not attached to any vertex or edge in Comp(B ′ ) . Since this is true for every neighbor of v in N B , t / ∈ Comp(B) as Comp(B) consists of only edges connecting v to a neighbor u ∈ N B and the component containing u. Since t was not attached to v before the refinement, t is not attached to (v a , v b ) or Comp(B) after the refinement, then t must be attached to some edge or vertex in Comp(A). This proves invariant 3(a) for π and thus the inductive proof.

Lemma 11
Let T be a supertree computed within Algorithm 1 at line 14 immediately before a refinement step. Let π = [A|B] ∈ NonTriv ∩ I . Let T ′ be a refinement of T obtained from running Algorithm 2 with supertree T, bipartition π , and the auxiliary data structures H and sv. Then, p 1 (T ′ ) − p 1 (T ) = w * (π ).
The idea for the proof of Lemma 11 is that for any nontrivial bipartition π ∈ I of X to be added to T | X , Algorithm 2 is able to split the vertex correctly and move extra subtrees around in a way such that each bipartition in T 1 or T 2 that is induced by an edge in P(e 1 (π )) or P(e 2 (π )) , which is not in T | S 1 or T | S 2 before the refinement, becomes present in T | S 1 or T | S 2 after the refinement. Since there are exactly w * (π ) such bipartitions, they increase p 1 (·) by w * (π ) . Now we give the formal proof.
Proof Since I corresponds to an independent set in the incompatibility graph G, all bipartitions in I are compatible. Since C(T | X ) ⊆ Triv ∪ (NonTriv ∩ I) = I , π ∈ NonTriv ∩ I must be compatible with C(T | X ) , then there is a vertex to split to add π to C(T | X ) . By invariant 1 of Lemma 10, v = sv(π ) is the vertex to split to add π to T | X and thus Algorithm 2 correctly splits v into v a and v b and connects them to appropriate neighbors such that in We abbreviate e 1 (π ) and e 2 (π ) by e 1 and e 2 . We number the extra subtrees attached to e 1 as t 1 , t 2 , . . . , t p , where p = w(e 1 ) − 1 and t 1 is the closest to A in T 1 . Similarly, we number the extra subtrees attached to e 2 as t ′ 1 , t ′ 2 , . . . , t ′ q , where q = w(e 2 ) − 1 and t ′ 1 is the closest to A in T 2 . For any set T of trees, let L(T ) denote the union of the leaf set of trees in T . We note that if e i exists, Thus, . For each k ∈ [w(e 1 )] , we define and for each k ∈ [w(e 2 )] , we define We know that for each k ∈ [w(e 1 )], Thus, for any k ∈ [w(e 1 )] , π k is the bipartition induced by the kth edge on P(e 1 ) in T 1 , where the edges are numbered from the side of A. Therefore, π k ∈ C(T 1 ) for any k ∈ [w(e 1 )] . Similarly, π ′ k ∈ C(T 2 ) for any k ∈ [w(e 2 )].
Since for any k ∈ [w(e 1 )] , A k ∩ X = A � = ∅ and (S 1 \A k ) ∩ X = B � = ∅ , we have π k | X = π and π k ∈ � 1 . Similarly, for each k ∈ [w(e 2 )] , π ′ k ∈ � 1 and π ′ k | X = π . We also know that since π / ∈ C(T | X ) , by Lemma 6, π k / ∈ C(T | S 1 ) for any k ∈ [w(e 1 )] and π ′ k / ∈ C(T | S 2 ) for any k ∈ [w(e 2 )] . We claim that π k ∈ C(T ′ | S 1 ) for all k ∈ [w(e 1 )] and π ′ k ∈ C(T ′ | S 2 ) for all k ∈ [w(e 2 )] . Then assuming the claim is true, we have and Now we only need to prove the claim. Fix k ∈ [w(e 1 )] , we will show that π k ∈ C(T ′ | S 1 ) . The claim of π ′ k ∈ C(T ′ | S 2 ) for any k ∈ [w(e 2 )] follows by symmetry. By invariant 2 of Lemma 10, we know that all extra subtrees of T R(e 1 ) were attached to v at the beginning of Algorithm 2 and thus the algorithm attaches them all onto (v a , v b ) in the order of t 1 , t 2 , . . . , t p , such that t 1 is closest to A. Let the attaching vertex of t i onto (v a , v b ) be u i for any i ∈ [w(e 1 )] . Then we note P ((v a , v b )) is the path from v a to u 1 , u 2 , . . . , u p and then to v b . For any t ∈ T RS 1 (A) , by invariant 3 of Lemma 10, t is attached to Comp(A), the . Therefore, if we delete any edge of P((v a , v b )) from T ′ , t is in the same component as A. Similarly, for any t ∈ T RS 1 (B) , t is in the same component as B if we delete any edge of P ((v a , v b )) from T. In particular, consider T ′ | S 1 − (u k−1 , u k ) . The component containing u k−1 and A contains all of T RS 1 (A) and {t i | i ∈ [k − 1]} , thus the leafset of that component is Therefore, the edge (u k−1 , u k ) induces the bipartition Proposition 2 Let G be the weighted incompatibility graph on T 1 | X and T 2 | X , and let I be the set of bipartitions associated with vertices in I * , which is a maximum weight independent set of G. Let F be any compatible subset of C(T 1 , T 2 , X) . Then w * (I) ≥ w * (F ).
Proof We extend the weight function of vertices in G and let weight(U ) denote the total weight of any set U of vertices of G. For any compatible subset of bipartitions F ⊆ C(T 1 , T 2 , X) , let V(F) be the set of vertices of G associated with the bipartitions in F. We first claim that w * (F ) = weight(V (F )) . For each π ∈ C(T 1 | X ) ∩ C(T 2 | X ) , there are two vertices associated with it in G with a total weight of w(e 1 (π )) + w(e 2 (π )) , which is exactly w * (π ) . For each π ∈ C(T i | X )\C(T j | X ) for i, j ∈ [2] and i = j , let v π be the only vertex for π in G, then weight(v π ) = w(e i (π )) = w * (π ) . Since , we have shown that the w * (π ) = weight(V ({π })) for any π ∈ C(T 1 , T 2 , X) . Then taking the sum over all π ∈ F , we have w * (F ) = weight(V (F )) . Since V (I) = I * , we have w * (I) = weight(V (I)) = weight(I * ) . Fix any compatible subset F of C(T 1 , T 2 , X) . Let We conclude that w * (F ) ≤ weight(V (F ′ ∪ (C(T 1 | X ) ∩ C(T 2 | X )))) ≤ weight(I * ) = w * (I) .
From Lemma 7 and the fact that a refinement of a tree never decreases p 1 (·) and p 2 (·) , we also know that p 2 (T * ) ≥ p 2 (T init ) ≥ p 2 (T ) for any tree T ∈ T S . Since for any T ∈ T S , SF(T , A) = p 1 (T ) + p 2 (T ) , T * achieves the maximum split support score with respect to A among all trees in T S . Thus, T * is a solution to Relax-SFS-2-B (Corollary 1). If T * is not binary, Algorithm 1 arbitrarily resolves every high degree node in T * until it is a binary tree and then returns a tree that achieves the maximum split support score among all binary trees of leaf set S.
= w * (Triv) + w * (NonTriv ∩ I) = w * (I). time by checking whether one side of π ′ is a subset of one side of π (assuming that labels of leaves in both sides of the bipartitions are stored as pre-processed sorted lists instead of sets). The rest of the algorithm takes constant time. Overall, Algorithm 2 runs in O(n + |X| 2 ) time.
Next we analyze the running time for Algorithm 1, i.e., Exact-RFS-2. Computing C(T 1 | X ) and C(T 2 | X ) in line 1 takes O(n 2 + n|X| 2 ) time as we need to compute π e | X for all e ∈ E(T 1 ) ∪ E(T 2 ) and then take the union. There are O(n) edges in E(T 1 ) ∪ E(T 2 ) . Computing π e | X for each edge takes O(n) time by running DFS on T i − e to obtain π e and then taking intersection of both sides of π e with X, separately. Together it takes O(n 2 ) time. Tak . In this step, we can always maintain a set of edges in T i for each bipartition π ∈ C(T 1 , T 2 , X) such that π e | X = π.
Lines 2 − 5 compute the mappings and values we need in latter part of the algorithm. We analyze the running time for each π = [A|B] ∈ C(T 1 , T 2 , X) first. We can compute the path P(e i (π )) by assembling the set of edges associated with π in T i from the last step into a path. This takes O(n 2 ) time by counting the times any vertex appear as an end vertex in the set of edges. The two vertices appearing once are the end vertices of the path while those appearing twice are internal vertices of the path. Then w(e i (π )) = |P(e i (π ))| can be found in constant time.
Then we can find T R(e i (π )) by DFS in T i − v for every internal node v of P(e i (π )) , starting the search from the unique neighbor u of v such that u does not appear in the path. This takes O(n) time. We compute BP i (A) and Strict Consensus Merger technique, and was proven statistically consistent when the subset trees are computed using statistically consistent methods. Before we prove that DACTAL-Exact-RFS-2 can enable statistically consistent pipelines, we begin with some definitions. Given a tree T and an internal edge e in T, the deletion of the edge e and its endpoints defines four subtrees. A short quartet around e is a set of four leaves, one from each subtree, selected to be closest to the edge. Note that due to ties, there can be multiple short quartets around some edges. The set of short quartets for a tree T is the set of all short quartets around the edges of T. The short quartet trees of T is the set of quartet trees on short quartets induced by T. It is well known that the short quartet trees of a tree T define T, and furthermore T can be computed from this set in polynomial time [33][34][35].

Lemma 12
Let T be a binary tree on leaf set S and let A, B ⊆ S . Let T A = T | A and T B = T | B (i.e., T A and T B are induced subtrees). If every short quartet tree is induced in T A or in T B , then T is the unique compatibility supertree for T A and T B and Exact-2-RFS(T A , T B ) = T.
Proof Because T A and T B are induced subtrees of T, it follows that T is a compatibility supertree for T A and T B . Furthermore, because every short quartet tree appears in at least one of these trees, T is the unique compatibility supertree for T A and T B (by results from [34,35], mentioned above). Finally, because T is a compatibility supertree, the RFS score of T with respect to T A , T B is 0, which is the best possible. Since Exact-2-RFS solves the RFS problem on two binary trees, Exact-2-RFS returns T given input T A and T B .
Thus, Exact-2-RFS is guaranteed to return the true tree when given two correct trees that have sufficient overlap (in that all short quartets are included). We continue with proving that these pipelines are statistically consistent.

Theorem 2
The DACTAL-Exact-RFS-2 pipeline is a statistically consistent method for estimating the unrooted tree topology under any model for which statistically consistent unrooted tree topology estimation methods exist.
Proof The proof is very similar to the proof given for the original DACTAL pipeline in [11]. Let be the stochastic evolution model. To establish statistical consistency of the DACTAL-Exact-RFS-2 pipeline (see above), we need to prove that as the amount of data increases the unrooted tree topology that is returned by the pipeline converge to the true unrooted tree topology. That is, we will show that for any ǫ > 0 , there is an amount of data so that the probability of returning the true tree topology given that amount of data is at least 1 − ǫ . Hence, let F be the method used to compute the starting tree, let G be the method used to compute the subset trees, and let ǫ > 0 be given. Because F is statistically consistent under , it follows that there is an amount of data so that the starting tree computed by F will have the true tree topology T with probability at least 1 − ǫ/2 . Now consider the decomposition into two sets produced by the algorithm produced by deleting edge e, applied to a tree with the true unrooted tree topology. Note that for any p ≥ 1 , all the leaves appearing in any short quartet around e are placed in the set P. Now, subset trees are computed using G on A ∪ P and B ∪ P , where π e = [A|B] , which we will refer to as T A and T B , respectively. Since G is statistically consistent, for a large enough amount of data, T A and T B will have the true tree topology on their leaf sets ( T | L(T A ) and T | L(T B ) , respectively) with probability at least 1 − ǫ/2 . When T A and T B are equal to the true trees on their leaf sets, then every short quartet tree of T is in T A or T B , so that by Lemma 12, T is the only compatibility supertree for T A and T B . Thus, under these conditions, Exact-2-RFS(T A , T B ) returns T. Hence, for a large enough amount of data, Exact-2-RFS(T A , T B ) returns T with probability at least 1 − ǫ , completing our proof.
Hence, DACTAL+Exact-2-RFS is statistically consistent under all standard molecular sequence evolution models and also under the MSC+GTR model [36,37] where gene trees evolve within species trees under the multi-species coalescent model (which addresses gene tree discordance due to incomplete lineage sorting [38]) and then sequences evolve down each gene tree under the GTR model.
Note that all that is needed to guarantee that the pipeline is statistically consistent is that the method F used to compute the starting tree and the method G used to compute the subset trees be statistically consistent. However, for the sake of improving empirical performance, F should be fast so that it can run on the full dataset but G can be more freely chosen, since it will only be run on smaller datasets. Indeed, the user can specify the size of the subsets that are analyzed, with smaller datasets enabling the use of more computationally intensive methods. Thus, when estimating gene trees under the GTR [39] sequence evolution model, F could be a fast distance-based method, such as neighbor joining [40] and G could be maximum likelihood, and when estimating species trees under the MSC+GTR model, then F could be ASTRID [29] or ASTRAL [41] (both polynomial time methods that are statistically consistent) while G could be an expensive method, such as StarBeast2 [42], a Bayesian method for co-estimating gene trees and species trees under the MSC+GTR model.
When empirical performance is the main objective and statistical consistency is not a requirement, then other options are available. For example, although heuristics for maximum likelihood are not provably statistically consistent (since they are not guaranteed to find optimal solutions), they can be highly accurate in practice. Therefore, when estimating trees under the GTR [39] model, F could be a distance-based method, such as neighbor joining [40], and G could be a good maximum likelihood heuristic, such as RAxML [43] or IQ-TREE 2 [?].

Conclusions
The main contribution of this paper is Exact-2-RFS, a polynomial time algorithm for the Robinson-Foulds Supertree (RFS) of two binary trees. We established equivalence between RFS and some other supertree problems, while also showing that solving the RFS of three or more trees is NP-hard. Finally, showed that this approach can be used in a divide-and-conquer pipeline to enable statistically consistent phylogeny estimation under sequence evolution models (e.g., GTR [39] and the hierarchical MSC+GTR model [37]).
This study advances the theoretical understanding of several established supertree problems, showing that supertree methods have the theoretical potential to be useful in phylogeny estimation on large datasets. Some prior studies (e.g., [11,44]) have evaluated supertree methods in the context of divide-and-conquer phylogeny estimation, which is an approach championed by Wilkinson [10]. Hence, a valuable direction for future work would evaluate these and other supertree methods in order to evaluate the utility and benefits of using such approaches on biological datasets.