Timeconsistent reconciliation maps and forbidden time travel
 Nikolai Nøjgaard^{1, 2},
 Manuela Geiß^{5},
 Daniel Merkle^{2},
 Peter F. Stadler^{5, 6, 7, 8, 9, 10, 11},
 Nicolas Wieseke^{3} and
 Marc Hellmuth^{1, 4}Email author
https://doi.org/10.1186/s1301501801218
© The Author(s) 2018
Received: 21 October 2017
Accepted: 20 January 2018
Published: 6 February 2018
Abstract
Background
In the absence of horizontal gene transfer it is possible to reconstruct the history of gene families from empirically determined orthology relations, which are equivalent to eventlabeled gene trees. Knowledge of the event labels considerably simplifies the problem of reconciling a gene tree T with a species trees S, relative to the reconciliation problem without prior knowledge of the event types. It is wellknown that optimal reconciliations in the unlabeled case may violate timeconsistency and thus are not biologically feasible. Here we investigate the mathematical structure of the event labeled reconciliation problem with horizontal transfer.
Results
We investigate the issue of timeconsistency for the eventlabeled version of the reconciliation problem, provide a convenient axiomatic framework, and derive a complete characterization of timeconsistent reconciliations. This characterization depends on certain weak conditions on the eventlabeled gene trees that reflect conditions under which evolutionary events are observable at least in principle. We give an \(\mathcal {O}(V(T)\log (V(S)))\)time algorithm to decide whether a timeconsistent reconciliation map exists. It does not require the construction of explicit timing maps, but relies entirely on the comparably easy task of checking whether a small auxiliary graph is acyclic. The algorithms are implemented in C++ using the boost graph library and are freely available at https://github.com/Nojgaard/tcrecon.
Significance
The combinatorial characterization of time consistency and thus biologically feasible reconciliation is an important step towards the inference of gene family histories with horizontal transfer from orthology data, i.e., without presupposed gene and species trees. The fast algorithm to decide time consistency is useful in a broader context because it constitutes an attractive component for all tools that address tree reconciliation problems.
Keywords
Background
Modern molecular biology describes the evolution of species in terms of the evolution of the genes that collectively form an organism’s genome. In this picture, genes are viewed as atomic units whose evolutionary history by definition forms a tree. The phylogeny of species also forms a tree. This species tree is either interpreted as a consensus of the gene trees or it is inferred from other data. An interesting formal manner to define a species tree independent of genes and genetic data is discussed, e.g. in [1].
In this contribution, we assume that gene and species trees are given independently of each other. The relationship between gene and species evolution is therefore given by a reconciliation map that describes how the gene tree is embedded in the species tree: after all, genes reside in organisms, and thus at each point in time can be assigned to a species.
From a formal point of view, a reconciliation map \(\mu\) identifies vertices of a gene tree with vertices and edges in the species tree in such a way that (partial) ancestor relations given by the genes are preserved by \(\mu\). Vertices in the species tree correspond to speciation events. By definition, in a speciation event all genes are faithfully transmitted from the parent species into both (all) daughter species. Some of the vertices in the gene tree therefore correspond to speciation events. In gene duplications, two copies of a gene are formed from a single ancestral gene and then keep residing in the same species. In horizontal gene transfer (HGT) events, the original remains within the parental species, while the offspring copy “jumps” into a different branch of the species tree. Given a gene tree with event types assigned to its interior vertices, it is customary to define pairwise relations between genes depending on the event type of their last common ancestor [2–4].
Most of the literature on this topic assumes that both the gene tree and the species tree are known but no information is available of the type of events [5–8]. The aim is then to find a mapping of the gene tree T into the species tree S and, at least implicitly, an eventlabeling on the vertices of the gene tree T. Here we take a different point of view and assume that T and the types of evolutionary events on T are known. This setting has ample practical relevance because eventlabeled gene trees can be derived from the pairwise orthology relation [4, 9]. These relations in turn can be estimated directly from sequence data using a variety of algorithmic approaches that are based on the pairwise best match criterion and hence do not require any a priori knowledge of the topology of either the gene tree or the species tree, see e.g. [10–13].
Genes that share a common origin (homologs) can be classified into orthologs, paralogs, and xenologs depending whether they originated by a speciation, duplication or horizontal gene transfer (HGT) event [2, 4]. Recent advances in mathematical phylogenetics [9, 14] have shown that the knowledge of these eventrelations (orthologs, paralogs and xenologs) suffices to construct eventlabeled gene trees and, in some case, also a species tree [3, 15, 16].
Conceptually, both the gene tree and species tree are associated with a timing of each event. Reconciliation maps must preserve this timing information because there are biologically infeasible event labeled gene trees that cannot be reconciled with any species tree. In the absence of HGT, biologically feasibility can be characterized in terms of certain triples (rooted binary trees on three leaves) that are displayed by the gene trees [16]. In the presence of HGT such triples give at least necessary conditions for a gene tree being biologically feasible [15]. In particular, the timing information must be taken into account explicitly in the presence of HGT. That is, gene trees with HGT that must be mapped to species trees only in such a way that some genes do not travel back in time.
There have been several attempts in the literature to handle this issue, see e.g. [17] for a review. In [18, 19] a single HGT adds timing constraints to a time map for a reconciliation to be found. Timeconsistency is then defined as the existence of a topological order of the digraph reflecting all the time constraints. In [20] NPhardness was shown for finding a parsimonious timeconsistent reconciliation based on a definition for timeconsistency that in essence considers pairs of HGTs. However, the latter definitions are explicitly designed for binary gene trees and do not apply to nonbinary gene trees, which are used here to model incomplete knowledge of the exact gene phylogenies. Different algorithmic approaches for tackling timeconsistency exist [17] such as the inclusion of “timezones” known for specific evolutionary events. It is worth noting that a posteriori modifications of timeinconsistent solutions will in general violate parsimony [18]. So far, no results have become available to determine the existence of timeconsistent reconciliation maps given the (undated) species tree and the eventlabeled gene tree.
Here, we introduce an axiomatic framework for timeconsistent reconciliation maps and characterize for given eventlabeled gene trees T and species trees S whether there exists a timeconsistent reconciliation map. We provide an \(\mathcal {O}(V(T)\log (V(S)))\)time algorithm that constructs a timeconsistent reconciliation map if one exists.
Notation and preliminaries
We consider rooted trees \(T=(V,E) \text{ (on } L_T \text{)}\) with root \(\rho _T \in V\) and leaf set \(L_T\subseteq V\). A vertex \({v}\in V\) is called a descendant of \({u}\in V\), \({v \preceq _T u}\), and u is an ancestor of v, \({u \succeq _T v}\), if u lies on the path from \(\rho _T\) to v. As usual, we write \({v \prec _T u}\) and \({u \succ _T v}\) to mean \({v \preceq _T u}\) and \(u\ne v\). The partial order \(\succeq _T\) is known as the ancestor order of T; the root is the unique maximal element w.r.t \(\succeq _T\). If \(u \preceq _T v\) or \(v \preceq _T u\) then u and v are comparable and otherwise, incomparable. We consider edges of rooted trees to be directed away from the root, that is, the notation for edges (u, v) of a tree is chosen such that \(u\succ _T v\). If (u, v) is an edge in T, then u is called parent of v and v child of u. It will be convenient for the discussion below to extend the ancestor relation \(\preceq _T\) on V to the union of the edge and vertex sets of T. More precisely, for the edge \(e=(u,v)\in E\) we put \(x \prec _T e\) if and only if \(x\preceq _T v\) and \(e \prec _T x\) if and only if \(u\preceq _T x\). For edges \(e=(u,v)\) and \(f=(a,b)\) in T we put \(e\preceq _T f\) if and only if \(v \preceq _T b\). For \(x\in V\), we write \(L_T(x):=\{y\in L_T \mid y\preceq _T x\}\) for the set of leaves in the subtree T(x) of T rooted in x.
For a nonempty subset of leaves \(A\subseteq L\), we define \({\text {lca}}_T(A)\), or the least common ancestor of A, to be the unique \(\preceq _T\)minimal vertex of T that is an ancestor of every vertex in A. In case \(A=\{u,v \}\), we put \({\text {lca}}_T(u,v):={\text {lca}}_T(\{u,v\})\). We have in particular \(u={\text {lca}}_T(L_T(u))\) for all \(u\in V\). We will also frequently use that for any two nonempty vertex sets A, B of a tree, it holds that \({\text {lca}}(A\cup B) = {\text {lca}}({\text {lca}}(A),{\text {lca}}(B))\).
A phylogenetic tree is a rooted tree such that no interior vertex in \(v\in V\setminus L_T\) has degree two, except possibly the root. If \(L_T\) corresponds to a set of genes \(\mathbb {G}\) or species \(\mathbb {S}\), we call a phylogenetic tree on \(L_T\) gene tree or species tree, respectively. In this contribution we will not restrict the gene or species trees to be binary, although this assumption is made implicitly or explicitly in much of the literature on the topic. The more general setting allows us to model incomplete knowledge of the exact gene or species phylogenies. Of course, all mathematical results proved here also hold for the special case of binary phylogenetic trees.
In our setting a gene tree \(T=(V,E)\) on \(\mathbb {G}\) is equipped with an eventlabeling map \(t:V\cup E\rightarrow I\cup \{0,1\}\) with \(I=\{\bullet ,\square ,\triangle , \odot \}\) that assigns to each interior vertex v of T a value \(t(v)\in I\) indicating whether v is a speciation event (\(\bullet\)), duplication event (\(\square\)) or HGT event (\(\triangle\)). It is convenient to use the special label \(\odot\) for the leaves x of T. Moreover, to each edge e a value \(t(e)\in \{0,1\}\) is added that indicates whether e is a transfer edge (1) or not (0). Note, only edges (x, y) for which \(t(x)=\triangle\) might be labeled as transfer edge. We write \(\mathcal {E}= \{e\in E\mid t(e)=1\}\) for the set of transfer edges in T. We assume here that all edges labeled “0” transmit the genetic material vertically, that is, from an ancestral species to its descendants.
We remark that the restriction \(t_{V}\) of t to the vertex set V coincides with the “symbolic dating maps” introduced in [21]; these have a close relationship with cographs [14, 22, 23]. Furthermore, there is a map \(\sigma :\mathbb {G}\rightarrow \mathbb {S}\) that assigns to each gene the species in which it resides. The set \(\sigma (M)\), \(M\subseteq \mathbb {G}\), is the set of species from which the genes M are taken. We write \((T;t,\sigma )\) for the gene tree \(T=(V,E)\) with eventlabeling t and corresponding map \(\sigma\).
Removal of the transfer edges from \((T;t,\sigma )\) yields a forest \(T_{\mathcal {\overline{E}}}:=(V,E\setminus \mathcal {E})\) that inherits the ancestor order on its connected components, i.e., \(\preceq _{T_{\mathcal {\overline{E}}}}\) iff \(x\preceq _{T}y\) and x, y are in same subtree of \(T_{\mathcal {\overline{E}}}\) [20]. Clearly \(\preceq _{T_{\mathcal {\overline{E}}}}\) uniquely defines a root for each subtree and the set of descendant leaf nodes \(L_{T_{\mathcal {\overline{E}}}}(x)\).
In order to account for duplication events that occurred before the first speciation event, we need to add an extra vertex and an extra edge “above” the last common ancestor of all species in the species tree \(S=(V,E)\). Hence, we add an additional vertex to V (that is now the new root \(\rho _S\) of S) and the additional edge \((\rho _S,{\text {lca}}_S(\mathbb {S}))\) to E. Strictly speaking S is not a phylogenetic tree in the usual sense, however, it will be convenient to work with these augmented trees. For simplicity, we omit drawing the augmenting edge \((\rho _S,{\text {lca}}_S(\mathbb {S}))\) in our examples.
Observable scenarios

(O1) Every internal vertex v has degree at least 3, except possibly the root which has degree at least 2.

(O2) Every HGT node has at least one transfer edge, \(t(e)=1\), and at least one nontransfer edge, \(t(e)=0\);

(O3)

(a) If x is a speciation vertex, then there are at least two distinct children v, w of x such that the species V and W that contain v and w, resp., are incomparable in S.

(b) If (v, w) is a transfer edge in T, then the species V and W that contain v and w, resp., are incomparable in S.

Assuming that (O2) is satisfied, we obtain the following useful result:
Lemma 1
Let \(\mathcal {T}_1, \ldots , \mathcal {T}_k\) be the connected components of \(T_{\mathcal {\overline{E}}}\) with roots \(\rho _1, \ldots , \rho _k\), respectively. If (O2) holds, then, \(\{L_{T_{\mathcal {\overline{E}}}}(\rho _1), \ldots , L_{T_{\mathcal {\overline{E}}}}(\rho _k)\}\) forms a partition of \(\mathbb {G}\).
Proof
Since \(L_{T_{\mathcal {\overline{E}}}}(\rho _i)\subseteq V(T)\), it suffices to show that \(L_{T_{\mathcal {\overline{E}}}}(\rho _i)\) does not contain vertices of \(V(T)\setminus \mathbb {G}\). Note, \(x\in L_{T_{\mathcal {\overline{E}}}}(\rho _i)\) with \(x\notin \mathbb {G}\) is only possible if all edges (x, y) are removed.
Let \(x\in V\) with \(t(x) = \triangle\) such that all edges (x, y) are removed. Thus, all such edges (x, y) are contained in \(\mathcal {E}\). Therefore, every edge of the form (x, y) is a transfer edge; a contradiction to (O2). \(\square\)

(\({\varvec{\Sigma 1}}\)) If \(t(x)=\bullet\), then there are distinct children v, w of x in T such that \(\sigma (L_{T_{\mathcal {\overline{E}}}}(v))\cap \sigma (L_{T_{\mathcal {\overline{E}}}}(w)) = \emptyset\).

(\({\varvec{\Sigma 2}}\)) If \((v,w) \in \mathcal {E}\), then \(\sigma (L_{T_{\mathcal {\overline{E}}}}(v))\cap \sigma (L_{T_{\mathcal {\overline{E}}}}(w)) = \emptyset\).
Now consider a transfer edge \((v,w) \in \mathcal {E}\), i.e., \(t(v)=\triangle\). Then \(T_{\mathcal {\overline{E}}}(v)\) and \(T_{\mathcal {\overline{E}}}(w)\) are subtrees of distinct connected components of \(T_{\mathcal {\overline{E}}}\). Since HGT amounts to the transfer of genetic material across distinct species, the genes v and w must be contained in distinct species X and Y, respectively. Since no genetic material is transferred between contemporary species \(X'\) and \(Y'\) in \(T_{\mathcal {\overline{E}}}\), where \(X'\) and \(Y'\) is a descendant of X and Y, respectively we derive (\({\varvec{\Sigma 1}}\)).
Proposition 1
Conditions (O1)–(O3) imply (\({\varvec{\Sigma 1}}\)) and (\({\varvec{\Sigma 2}}\)).
Proof
Since (O2) is satisfied we can apply Lemma 1 and conclude that neither \(\sigma (L_{T_{\mathcal {\overline{E}}}}(v))=\emptyset\) nor \(\sigma (L_{T_{\mathcal {\overline{E}}}}(w))=\emptyset.\) Let \(x \in V(T)\) with \(t(x)=\bullet.\) By Condition (O1) x has (at least two) children. Moreover, (O3) implies that there are (at least) two children v and w in T that are contained in distinct species V and W that are incomparable in S. Note, the edges (x, v) and (x, w) remain in \(T_{\mathcal {\overline{E}}}\), since only transfer edges are removed. Since no transfer is contained in \(T_{\mathcal {\overline{E}}}\), the genetic material v and w of V and W, respectively, is always vertically transmitted. Therefore, for any leaf \(v'\in L_{T_{\mathcal {\overline{E}}}}(v)\) we have \(\sigma (v')\preceq _S V\) and for any leaf \(w'\in L_{T_{\mathcal {\overline{E}}}}(w)\) we have \(\sigma (w')\preceq _S W\) in S. Assume now for contradiction, that \(\sigma (L_{T_{\mathcal {\overline{E}}}}(v))\cap \sigma (L_{T_{\mathcal {\overline{E}}}}(w)) \ne \emptyset\). Let \(z_1\in L_{T_{\mathcal {\overline{E}}}}(v)\) and \(z_2\in L_{T_{\mathcal {\overline{E}}}}(w)\) with \(\sigma (z_1) = \sigma (z_2) = Z\). Since \(Z\preceq _S V,W\) and S is a tree, the species V and W must be comparable in S; a contradiction to (O3). Hence, Condition (\({\varvec{\Sigma 1}}\)) is satisfied.
To see (\({\varvec{\Sigma 2}}\)), note that since (O2) is satisfied we can apply Lemma 1 and conclude that neither \(\sigma (L_{T_{\mathcal {\overline{E}}}}(v))=\emptyset\) nor \(\sigma (L_{T_{\mathcal {\overline{E}}}}(w))=\emptyset\). Let \((v,w) \in \mathcal {E}\). By (O3) the species containing V and W are incomparable in S. Now we can argue along the same lines as in the proof for (\({\varvec{\Sigma 2}}\)) to conclude that \(\sigma (L_{T_{\mathcal {\overline{E}}}}(v))\cap \sigma (L_{T_{\mathcal {\overline{E}}}}(w)) = \emptyset\). \(\square\)
From here on we simplify the notation a bit and write \(\sigma _{T_{\mathcal {\overline{E}}}}(u):=\sigma (L_{T_{\mathcal {\overline{E}}}}(u))\). We are aware of the fact that condition (O3) cannot be checked directly for a given eventlabeled gene tree. In contrast, (\({\varvec{\Sigma 1}}\)) and (\({\varvec{\Sigma 2}}\)) are easily determined. Hence, in the remainder of this paper we consider the more general case, that is, gene trees that satisfy (O1), (O2), (\({\varvec{\Sigma 1}}\)), and (\({\varvec{\Sigma 1}}\)).
DTLscenario and timeconsistent reconciliation maps
In case that the eventlabeling of T is unknown, but the gene tree T and a species tree S are given, the authors in [20, 24] provide an axiom set, called DTLscenario, to reconcile T with S. This reconciliation is then used to infer the eventlabeling t of T. Instead of defining a DTLscenario as octuple [20, 24], we use here the notation established above:
Definition 1

(I) For each leaf \(x\in \mathbb {G},\) \(\gamma (u) = \sigma (u)\).

(II) If \(u\in V(T)\setminus \mathbb {G}\) with children v, w, then
 (a)
\(\gamma (u)\) is not a proper descendant of \(\gamma (v)\) or \(\gamma (w)\), and
 (b)
At least one of \(\gamma (v)\) or \(\gamma (w)\) is a descendant of \(\gamma (u).\)
 (a)

(III) (u, v) is a transfer edge if and only if \(\gamma (u)\) and \(\gamma (v)\) are incomparable.

(IV) If \(u\in V(T)\setminus \mathbb {G}\) with children v, w, then
 (a)
\(t(u)=\triangle\) if and only if either (u, v) or (u, w) is a transferedge,
 (b)
If \(t(u)=\bullet\), then \(\gamma (u) = {\text {lca}}_S(\gamma (v),\gamma (w))\) and \(\gamma (v),\gamma (w)\) are incomparable,
 (c)
If \(t(u)=\square\), then \(\gamma (u)\succeq {\text {lca}}_S(\gamma (v),\gamma (w)).\)
 (a)
DTLscenarios are explicitly defined for fully resolved binary gene and species trees. Indeed, Fig. 1 (right) shows a valid reconciliation between a gene tree T and a species tree S that is not consistent with DTLscenario. To see this, let us call the duplication vertex v. The vertex v and the leaf a are both children of the speciation vertex \(\rho _T\). Condition (IVb) implies that a and v must be incomparable. However, this is not possible since \(\gamma (v)\succeq _S {\text {lca}}_S(B,C)\) (Cond. (IVc)) and \(\gamma (a)=A\) (Cond. (I)) and therefore, \(\gamma (v)\succeq _S {\text {lca}}_S(B,C) = {\text {lca}}_S(A,B,C) \succ _S \gamma (a)\).
The problem of reconciliations between gene trees and species tree is formalized in terms of socalled DTLscenarios in the literature [20, 24]. This framework, however, usually assumes that the event labels t on T are unknown, while a species tree S is given. The “usual” DTL axioms, furthermore, explicitly refer to binary, fully resolved gene and species trees. We therefore use a different axiom set here that is a natural generalization of the framework introduced in [16] for the HGTfree case:
Definition 2

(M1) Leaf Constraint. If \(t(v)=\odot\), then \(\mu (v)=\sigma (v)\).

(M2) Event Constraint.
 (i)
If \(t(v)=\bullet\), then \(\mu (v) = {\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(v))\).
 (ii)
If \(t(v) \in \{\square , \triangle \}\), then \(\mu (v)\in F\).
 (iii)
If \(t(v)=\triangle\) and \((v,w)\in \mathcal {E}\), then \(\mu (v)\) and \(\mu (w)\) are incomparable in S.
 (i)

(M3) Ancestor Constraint.

Suppose \(v,w\in V\) with \(v\prec _{T_{\mathcal {\overline{E}}}} w\).

(i) If \(t(v),t(w)\in \{\square , \triangle \}\), then \(\mu (v)\preceq _S \mu (w)\),

(ii) Otherwise, i.e., at least one of t(v) and t(w) is a speciation \(\bullet\), \(\mu (v)\prec _S\mu (w)\).

For the special case that gene and species trees are binary, Definition 2 is equivalent to the definition of a DTLscenario, which is summarized in the following
Theorem 1
For a binary gene tree \((T;t,\sigma )\) and a binary species tree S there is a DTLscenario if and only if there is a reconciliation \(\mu\) for \((T;t,\sigma )\) and S.
Condition (M1) ensures that each leaf of T, i.e., an extant gene in \(\mathbb {G}\), is mapped to the species in which it resides. Conditions (M2.i) and (M2.ii) ensure that each inner vertex of T is either mapped to a vertex or an edge in S such that a vertex of T is mapped to an interior vertex of S if and only if it is a speciation vertex. Condition (M2.i) might seem overly restrictive, an issue to which we will return below. Condition (M2.iii) satisfies condition (O3) and maps the vertices of a transfer edge in a way that they are incomparable in the species tree, since a HGT occurs between distinct (coexisting) species. It becomes void in the absence of HGT; thus Definition 2 reduces to the definition of reconciliation maps given in [16] for the HGTfree case. Importantly, condition (M3) refers only to the connected components of \(T_{\mathcal {\overline{E}}}\) since comparability w.r.t. \(\prec _{T_{\mathcal {\overline{E}}}}\) implies that the path between x and y in T does not contain transfer edges. It ensures that the ancestor order \(\preceq _T\) of T is preserved along all paths that do not contain transfer edges.
We will make use of the following bound that effectively restricts how close to the leafs the image of a vertex in the gene tree can be located.
Lemma 2
If \(\mu : (T;t,\sigma )\rightarrow S\) satisfies (M1) and (M3), then \(\mu (u)\succeq _S {\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u))\) for any \(u\in V(T).\)
Proof
If u is a leaf, then by Condition (M1) \(\mu (u)=\sigma (u)\) and we are done. Thus, let u be an interior vertex. By Condition (M3), \(z \preceq _S\mu (u)\) for all \(z\in \sigma _{T_{\mathcal {\overline{E}}}}(u)\). Hence, if \(\mu (u)\prec _S {\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u))\) or if \(\mu (u)\) and \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u)))\) are incomparable in S, then there is a \(z\ \in \sigma _{T_{\mathcal {\overline{E}}}}(u)\) such that z and \(\mu (u)\) are incomparable; contradicting (M3). \(\square\)
Condition (M2.i) implies in particular the weaker property “(M2.i’) if \(t(v)=\bullet\) then \(\mu (v)\in W\)”. In the light of Lemma 2, \(\mu (v)={\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(v))\) is the lowest possible choice for the image of a speciation vertex. Clearly, this restricts the possibly exponentially many reconciliation maps for which \(\mu (v)\succ _S{\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(v))\) for a speciation vertices v to only those that satisfy (M2.i). However, the latter is justified by the observation that if v is a speciation vertex with children u, w, then there is only one unique piece of information given by the gene tree to place \(\mu (v)\), that is, the unique vertex x in S with children y, z such that \(\sigma _{T_{\mathcal {\overline{E}}}}(u) \subseteq L_S(y)\) and \(\sigma _{T_{\mathcal {\overline{E}}}}(w) \subseteq L_S(z).\) The latter arguments easily generalizes to the case that v has more than two children in T. Moreover, any observable speciation node \(v'\succ _T v\) closer to the root than v must be mapped to a node ancestral to \(\mu (v)\) due to (M3.ii). Therefore, we require \(\mu (v) =x = {\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(v))\) here.
If S is a species tree for the gene tree \((T,t,\sigma )\) then there is no freedom in the construction of a reconciliation map \(\mu\) on the set \(\{x\in V(T)\mid t(x)\in \{\bullet , \odot \}\}\). The duplication and HGT vertices of T, however, can be placed differently. As a consequence there is a possibly exponentially large set of reconciliation maps from \((T,t,\sigma )\) to S.
Definition 3
(Time Map) The map \(\tau _T: V(T) \rightarrow \mathbb {R}\) is a time map for the rooted tree T if \(x\prec _T y\) implies \(\tau _T(x)>\tau _T(y)\) for all \(x,y\in V(T)\).
Definition 4

(C1) If \(t(u) \in \{\bullet , \odot \}\), then \(\tau _T(u) = \tau _S(\mu (u))\).

(C2) If \(t(u)\in \{\square ,\triangle \}\) and, thus \(\mu (u)=(x,y)\in E(S)\), then \(\tau _S(y)>\tau _T(u)>\tau _S(x)\).
Condition (C1) is used to identify the timepoints of speciation vertices and leaves u in the gene tree with the timepoints of their respective images \(\mu (u)\) in the species trees. In particular, all genes u that reside in the same species must be assigned the same time point \(\tau _T(u)=\tau _S(\sigma (u))\). Analogously, all speciation vertices in T that are mapped to the same speciation in S are assigned matching time stamps, i.e., if \(t(u)=t(v)=\bullet\) and \(\mu (u)=\mu (v)\) then \(\tau _T(u)=\tau _T(v)=\tau _S(\mu (u))\).
To understand the intuition behind (C2) consider a duplication or HGT vertex u. By construction of \(\mu\) it is mapped to an edge of S, i.e., \(\mu (u)=(x,y)\) in S. The time point of u must thus lie between time points of x and y. Now suppose \((u,v)\in \mathcal {E}\) is a transfer edge. By construction, u signifies the transfer event itself. The node v, however, refers to the next (visible) event in the gene tree. Thus \(\tau _T(u)<\tau _T(v)\). In particular, \(\tau _T(v)\) must not be misinterpreted as the time of introducing the HGTduplicate into the new lineage. While this time of course exists (and in our model coincides with the timing of the transfer event) it is not marked by a visible event in the new lineage, and hence there is no corresponding node in the gene tree T.
W.l.o.g. we fix the time axis so that \(\tau _T(\rho _T) = 0\) and \(\tau _S(\rho _S) = 1\). Thus, \(\tau _S(\rho _S)< \tau _T(\rho _T) < \tau _T(u)\) for all \(u\in V(T)\setminus \{\rho _T\}\).
Clearly, a necessary condition to have biologically feasible gene trees is the existence of a reconciliation map \(\mu\). However, not all reconciliation maps are timeconsistent, see Fig. 2.
Definition 5
An eventlabeled gene tree \((T;t,\sigma )\) is biologically feasible if there exists a timeconsistent reconciliation map from \((T;t,\sigma )\) to some species tree S.
Theorem 2

(D1) If \(\mu (u) = x\), for some \(u\in V(T)\), then \(\tau _T(u) = \tau _S(x)\).

(D2) If \(x \preceq _S {\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u))\) for some \(u \in V(T)\) with \(t(u)\in \{\square , \triangle \}\), then \(\tau _S(x) > \tau _T(u)\).

(D3) If \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u)\cup \sigma _{T_{\mathcal {\overline{E}}}}(v)) \preceq _S x\) for some \((u, v) \in \mathcal {E}\), then \(\tau _T(u) > \tau _S(x)\).
Proof
In what follows, x and u denote vertices in S and T, respectively.
Assume that there is a timeconsistent reconciliation map \(\mu\) from \((T;t, \sigma )\) to S, and thus two timemaps \(\tau _S\) and \(\tau _T\) for S and T, respectively, that satisfy (C1) and (C2).
To see (D1), observe that if \(\mu (u) = x\in V(S)\), then (M1) and (M2) imply that \(t(u) \in \{ \bullet , \odot \}\). Now apply (C1).
To prove the converse, assume that there exists a reconciliation map \(\mu\) that satisfies (D1)–(D3) for some timemaps \(\tau _T\) and \(\tau _S\). In the following we will make use of \(\tau _S\) and \(\tau _T\) to construct a timeconsistent reconciliation map \(\mu '\).
First we define “anchor points” by \(\mu '(v) =\mu (v)\) for all \(v \in V(T)\) with \(t(v) \in \{ \bullet , \odot \}\). Condition (D1) implies \(\tau _T(v) = \tau _S(\mu (v))\) for these vertices, and therefore \(\mu '\) satisfies (C1).
The next step will be to show that for each vertex \(u\in V(T)\) with \(t(u) \in \{ \square , \triangle \}\) there is a unique edge (x, y) along the path from \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u))\) to \(\rho _S\) with \(\tau _S(x)<\tau _T(u)<\tau _S(y)\). We set \(\mu '(u) =(x,y)\) for these points. In the final step we will show that \(\mu '\) is a valid reconciliation map.
Consider the unique path \(\mathcal {P}_u\) from \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u))\) to \(\rho _S\). By construction, \(\tau _S(\rho _S) <\tau _T(\rho _T) \le \tau _T(u)\) and by Condition (D2) we have \(\tau _T(u) < \tau _S({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u)))\). Since \(\tau _S\) is a time map for S, every edge \((x,y)\in E(S)\) satisfies \(\tau _S(x)<\tau _S(y)\). Therefore, there is a unique edge \((x_u,y_u)\in E(S)\) along \(\mathcal {P}_u\) such that either \(\tau _S(x_u)<\tau _T(u)<\tau _S(y_u)\), \(\tau _S(x_u)=\tau _T(u)<\tau _S(y_u)\), or \(\tau _S(x_u)<\tau _T(u)=\tau _S(y_u)\). The addition of a sufficiently small perturbation \(\epsilon _u\) to \(\tau _T(u)\) does not violate the conditions for \(\tau _T\) being a timemap for T. Clearly \(\epsilon _u\) can be chosen to break the equalities in the latter two cases in such a way that \(\tau _S(x_u)<\tau _T(u)<\tau _S(y_u)\) for each vertex \(u\in V(T)\) with \(t(u) \in \{ \square , \triangle \}\). We then continue with the perturbed version of \(\tau _T\) and set \(\mu '(u) =(x_u,y_u)\). By construction, \(\mu '\) satisfies (C2).
It remains to show that \(\mu '\) is a valid reconciliation map from \((T;t,\sigma _{T_{\mathcal {\overline{E}}}})\) to S. Again, let \(\mathcal {P}_u\) denote the unique path from \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u))\) to \(\rho _S\) for any \(u\in V(T)\).
By construction, Conditions (M1), (M2i), (M2ii) are satisfied. To check condition (M2iii), assume \((u,v)\in \mathcal {E}\). The original map \(\mu\) is a valid reconciliation map, and thus, Lemma 2 implies that \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u))\prec _S \mu (u)\) and \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(v))\preceq _S \mu (v)\). Since \(\mu (u)\) and \(\mu (v)\) are incomparable in S and \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u)\cup \sigma _{T_{\mathcal {\overline{E}}}}(v))\) lies on both paths \(\mathcal {P}_u\) and \(\mathcal {P}_v\) we have \(\mu (u), \mu (v) \preceq _S {\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u)\cup \sigma _{T_{\mathcal {\overline{E}}}}(v))=: x\). In particular, \(x\ne {\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u))\) and \(x\ne {\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(v))\).
Conditions (D1) and (D2) imply that \(\tau _S(x)<\tau _T(u)<\tau _S({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u)))\) and \(\tau _S(x)< \tau _T(v) \le \tau _S({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(v)))\). By construction of \(\mu '\), the vertex u is mapped to a unique edge \(e_u = (x_u,y_u)\) and v is mapped either to \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(v))\ne x\) or to the unique edge \(e_v=(x_v,y_v)\), respectively. In particular, \(\mu '(u)\) lies on the path \(\mathcal {P}'\) from x to \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u))\) and \(\mu '(v)\) lies one the path \(\mathcal {P}''\) from x to \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(v))\). The paths \(\mathcal {P}'\) and \(\mathcal {P}''\) are edgedisjoint and have x as their only common vertex. Hence, \(\mu '(u)\) and \(\mu '(v)\) are incomparable in S, and (M2iii) is satisfied.
In order to show (M3), assume that \(u\prec _{T_{\mathcal {\overline{E}}}} v\). Since \(u\prec _{T_{\mathcal {\overline{E}}}} v\), we have \(\sigma _{T_{\mathcal {\overline{E}}}}(u)\subseteq \sigma _{T_{\mathcal {\overline{E}}}}(v)\). Hence, \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u)) \preceq {\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(v))\preceq _S\rho _S\). In other words, \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(v))\) lies on the path \(\mathcal {P}_u\) and thus, \(\mathcal {P}_v\) is a subpath of \(\mathcal {P}_u\). By construction of \(\mu '\), both \(\mu '(u)\) and \(\mu '(v)\) are comparable in S. Moreover, since \(\tau _T(u)>\tau _T(v)\) and by construction of \(\mu '\), it immediately follows that \(\mu '(u) \preceq _S\mu '(v)\).
Its now an easy task to verify that (M3) is fulfilled by considering the distinct eventlabels in (M3i) and (M3ii), which we leave to the reader. \(\square\)
Interestingly, the existence of a timeconsistent reconciliation map from a gene tree T to a species tree S can be characterized in terms of a time map defined on T, only.
Theorem 3

(T1) If \(t(u) = t(v) \in \{ \bullet ,\odot \}\) then

(a) If \(\mu (u)=\mu (v)\), then \(\tau _T(u) = \tau _T(v)\).

(b) If \(\mu (u)\prec _S \mu (v)\), then \(\tau _T(u) > \tau _T(v)\).


(T2) If \(t(u)\in \{\bullet , \odot \}\), \(t(v)\in \{\square ,\triangle \}\) and \(\mu (u)\preceq _S {\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(v))\), then \(\tau _T(u)>\tau _T(v)\).

(T3) If \((u,v) \in \mathcal {E}\) and \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u) \cup \sigma _{T_{\mathcal {\overline{E}}}}(v)) \preceq _S {\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(w))\) for some \(w\in V(T)\), then \(\tau _T(u) > \tau _T(w).\)
Proof
If \(x \in V(S)\) with \(a\in \mu ^{1}(x)\), then (T2) implies (D2) [by (D1) and setting \(u=a\) in (T2) and (T3) implies (D3) [by (D1) and setting \(w=a\) in (T3)]. Thus, (D2) and (D3) is satisfied for all \(x \in V(S)\) with \(\mu ^{1}(x) \ne \emptyset\).
Using our choices \(\tau _S(\rho _T) = 0\) and \(\tau _S(\rho _S) = 1\) for the augmented root of S, we must have \(\mu ^{1}(\rho _S) = \emptyset\). Thus, \(\rho _S \succ _S {\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(v))\) for any \(v\in V(T)\). Hence, (D2) is trivially satisfied for \(\rho _S\). Moreover, \(\tau _T(\rho _T) = 0\) implies \(\tau _T(u) > \tau _S(\rho _S)\) for any \(u\in V(T)\). Hence, (D3) is always satisfied for \(\rho _S\).
In summary, Conditions (D1)–(D3) are met for any vertex \(x\in V(S)\) that up to this point has been assigned a value, i.e., \(\tau _S(x)\ne *\).
From the algorithmic point of view it is desirable to design methods that allow to check whether a reconciliation map is timeconsistent. Moreover, given a gene tree T and species tree S we wish to decide whether there exists a timeconsistent reconciliation map \(\mu\), and if so, we should be able to construct \(\mu\).
To this end, observe that any constraints given by Definition 3, Theorem 2 (D2)–(D3), and Definition 4 (C2) can be expressed as a total order on \(V(S)\cup V(T)\), while the constraints (C1) and (D1) together suggest that we can treat the preimage of any vertex in the species tree as a “single vertex”. In fact we can create an auxiliary graph in order to answer questions that are concerned with timeconsistent reconciliation maps.
Definition 6

(A1) For each \((u,v)\in E(T)\) we have \((u',v') \in E(A)\), whereand$$\begin{aligned} u' = {\left\{ \begin{array}{ll} \mu (u) {}\quad \text {if } t(u) \in \{\odot , \bullet \} \\ u {} \quad \text {otherwise} \end{array}\right. } \end{aligned}$$$$\begin{aligned} v' = {\left\{ \begin{array}{ll} \mu (v) \quad {} \text {if } t(v)\in \{ \odot , \bullet \} \\ v \quad {} \text {otherwise} \end{array}\right.}{,}\end{aligned}$$

(A2) For each \((x,y)\in E(S)\) we have \((x,y) \in E(A).\).

(A3) For each \(u \in V(T)\) with \(t(u) \in \{ \square , \triangle \}\) we have \((u, lca_S(\sigma _{T_{\mathcal {\overline{E}}}}(u))) \in E(A)\).

(A4) For each \((u,v) \in \mathcal {E}\) we have \((lca_S(\sigma _{T_{\mathcal {\overline{E}}}}(u)\cup \sigma _{T_{\mathcal {\overline{E}}}}(v)),u) \in E(A).\)

(A5) For each \(u\in V(T)\) with \(t(u) \in \{ \triangle , \square \}\) and \(\mu (u) = (x,y) \in E(S)\) we have \((x,u)\in E(A)\) and \((u,y)\in E(A)\).
We note that the edge sets defined by conditions (A1) through (A5) are not necessarily disjoint. The mapping of vertices in T to edges in S is considered only in condition (A5). The following two theorems are the key results of this contribution.
Theorem 4
Let \(\mu\) be a reconciliation map from \((T;t,\sigma )\) to S. The map \(\mu\) is timeconsistent if and only if the auxiliary graph \(A_1\) is a directed acyclic graph (DAG).
Proof
Assume that \(\mu\) is timeconsistent. By Theorem 2, there are two timemaps \(\tau _T\) and \(\tau _S\) satisfying (C1) and (C2). Let \(\tau = \tau _T\cup \tau _S\) be the map from \(V(T)\cup V(S)\rightarrow \mathbb {R}\). Let \(A'\) be the directed graph with \(V(A') = V(S) \cup V(T)\) and set for all \(x,y\in V(A')\): \((x,y)\in E(A')\) if and only if \(\tau (x) < \tau (y)\). By construction \(A'\) is a DAG since \(\tau\) provides a topological order on \(A'\) [25].
We continue to show that \(A'\) contains all edges of \(A_1\).
To see that (A1) is satisfied for \(E(A')\) let \((u,v)\in E(T)\). Note, \(\tau (v)>\tau (u)\), since \(\tau _T\) is a time map for T and by construction of \(\tau\). Hence, all edges \((u,v)\in E(T)\) are also contained in \(A'\), independent from the respective eventlabels t(u), t(v). Moreover, if t(u) or t(v) are speciation vertices or leaves, then (C1) implies that \(\tau _S(\mu (u)) = \tau _T(u) > \tau _T(v)\) or \(\tau _T(u) > \tau _T(v) = \tau _S(\mu (v))\). By construction of \(\tau\), all edges satisfying (A1) are contained in \(E(A')\). Since \(\tau _S\) is a time map for S, all edges as in (A2) are contained in \(E(A')\). Finally, (C2) implies that all edges satisfying (A5) are contained in \(E(A')\).
Although, \(A'\) might have more edges than required by (A1), (A2) and (A5), the graph \(A_1\) is a subgraph of \(A'\). Since \(A'\) is a DAG, also \(A_1\) is a DAG.
Theorem 5
Assume there is a reconciliation map \(\mu\) from \((T;t,\sigma )\) to S. There is a timeconsistent reconciliation map, possibly different from \(\mu\), from \((T;t, \sigma )\) to S if and only if the auxiliary graph \(A_2\) (defined on \(\mu\)) is a DAG.
Proof
Let \(\mu\) be a reconciliation map for \((T;t,\sigma )\) and S and \(\mu '\) be a timeconsistent reconciliation map for \((T;t,\sigma )\) and S. Let \(A_2\) and \(A'_2\) be the auxiliary graphs that satisfy Definition 6 (A1) – (A4) for \(\mu\) and \(\mu '\), respectively. Since \(\mu (u) = \mu '(u)\) for all \(u\in V(T)\) with \(t(u) \in \{\odot , \bullet \}\) and (A2) – (A4) don’t rely on the explicit reconciliation map, it is easy to see that \(A_2=A'_2\).
Now we can reuse similar arguments as in the proof of Theorem 4. Assume there is a timeconsistent reconciliation map \((T;t, \sigma )\) to S. By Theorem 2, there are two timemaps \(\tau _T\) and \(\tau _S\) satisfying (D1)(D3). Let \(\tau\) and \(A'\) be defined as in the proof of Theorem 4.
Analogously to the proof of Theorem 4, we show that \(A'\) contains all edges of \(A_2\). Application of (D1) immediately implies that all edges satisfying (A1) and (A2) are contained in \(E(A')\). By condition (D2), it yields \((u,lca_S(\sigma _{T_{\mathcal {\overline{E}}}}(u))) \in E(A')\) and (D3) implies \((lca_S(\sigma _{T_{\mathcal {\overline{E}}}}(u)\cup \sigma _{T_{\mathcal {\overline{E}}}}(v)),u)\in E(A')\). We conclude by the same arguments as before that the graph \(A_2\) is a DAG.
For the converse, assume we are given the directed acyclic graph \(A_2\). As before, there is is a topological order \(\tau\) on \(A_2\) with \(\tau (x) < \tau (y)\) only if \((x,y) \in E(A_2)\). The timemaps \(\tau _T\) and \(\tau _S\) are given as in the proof of Theorem 1.
By construction, it follows that (D1) is satisfied. Again, by construction and the Properties (A1) and (A2), \(\tau _S\) and \(\tau _T\) are valid timemaps for S and T respectively.
Thus \(\tau _T\) and \(\tau _S\) are valid time maps satisfying (D1)–(D3). \(\square\)
Naturally, Theorems 4 or 5 can be used to devise algorithms for deciding timeconsistency. To this end, the efficient computation of \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u))\) for all \(u\in V(T)\) is necessary. This can be achieved with Algorithm 2 in \(O(V(T)\log (V(S)))\). More precisely, we have the following statement:
Lemma 3
For a given gene tree \((T=(V,E);t,\sigma )\) and a species tree \(S=(W,F),\) Algorithm 2 correctly computes \(\ell (u) = {\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u))\) for all \(u \in V(T)\) in \(O(V\log (W))\) time.
Proof
Since T is a tree and the algorithm is in effect a depth first search through T, the while loop runs at most \(O(V(T)+ E(T))\) times, and thus in O(V(T)) time.
The only nonconstant operation within the while loop is the computation of \({\text {lca}}_S\) in Line (10). Clearly \({\text {lca}}_S\) of a set of vertices \(C = \{ c_1, c_2 \ldots c_k \}\), where \(c_i \in V(S)\), for all \(c_i \in C\) can be computed as sequence of \({\text {lca}}_S\) operations taking two vertices: \({\text {lca}}_S(c_1,{\text {lca}}_S(c_2, \ldots {\text {lca}}_S(c_{k1}, c_k)))\), each taking \(O(\lg (V(S)))\) time. Note however, that since Line (10) is called exactly once for each vertex in T, the number of \({\text {lca}}_S\) operations taking two vertices is called at most E(T) times through the entire algorithm. Hence, the total time complexity is \(O(V(T)\lg (V(S)))\). \(\square\)
Let S be a species tree for \((T;t,\sigma )\), that is, there is a valid reconciliation between the two trees. Algorithm 1 describes a method to construct a timeconsistent reconciliation map for \((T;t,\sigma )\) and S, if one exists, else “No timeconsistent reconciliation map exists” is returned. First, an arbitrary reconciliation map \(\mu\) that satisfies the condition of Definition 2 is computed. Second, Theorem 5 is utilized and it is checked whether the auxiliary graph \(A_2\) is not a DAG in which case no timeconsistent map \(\mu\) exists for \((T;t,\sigma )\) and S. Finally, if \(A_2\) is a DAG, then we continue to adjust \(\mu\) to become timeconsistent. The latter is based on Theorem 2, see the proof of Theorems 2 and 6 for details.
Theorem 6
Let \(S= (W,F)\) be species tree for the gene tree \((T=(V,E);t,\sigma )\) . Algorithm 1 correctly determines whether there is a timeconsistent reconciliation map \(\mu\) and in the positive case, returns such a \(\mu\) in \(O(V\log (W))\) time.
Proof
In order to produce a timeconsistent reconciliation map, we first construct some valid reconciliation map \(\mu\) from \((T;t,\sigma )\) to S. Using the \({\text {lca}}\)map \(\ell\) from Algorithm 2, \(\mu\) will be adjusted to become timeconsistent, if possible.
By assumption, there is a reconciliation map from \((T;t,\sigma )\) to S. The forloop (Line (3)–(5)) ensures that each vertex \(u\in V\) obtained a value \(\mu (u)\). We continue to show that \(\mu\) is a valid reconciliation map satisfying (M1)–(M3).
Now assume that \(u,v\in V\) and \(u \prec _{T_{\mathcal {\overline{E}}}} v\). Note that \(\sigma _{T_{\mathcal {\overline{E}}}}(u) \subseteq \sigma _{T_{\mathcal {\overline{E}}}}(v)\). It follows that \(\ell (u) = {\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u)) \preceq _S {\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(v)) = \ell (v)\). By construction, (M3) is satisfied. Thus, \(\mu\) is a valid reconciliation map.
By Theorem 5, two time maps \(\tau _T\) and \(\tau _S\) satisfying (D1)–(D3) only exists if the auxiliary graph A build on Line (7) is a DAG. Thus if \(A:=A_2\) contains a cycle, no such timemaps exists and the statement “No timeconsistent reconciliation map exists.” is returned (Line (7)). On the other hand, if A is a DAG, the construction in Line (8)–(11) is identical to the construction used in the proof of Theorem 5. Hence correctness of this part of the algorithm follows directly from the proof of Theorem 5.
Finally, we adjust \(\mu\) to become a timeconsistent reconciliation map.. By the latter arguments, \(\tau _T\) and \(\tau _S\) satisfy (D1)–(D3) w.r.t. to \(\mu\). Note, that \(\mu\) is chosen to be the “lowest point” where a vertex \(u \in V\) with \(t(u) \in \{ \square , \triangle \}\) can be mapped, that is, \(\mu (u)\) is set to (p(x), x) where \(x ={\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u))\). However, by the arguments in the proof of Theorem 2, there is a unique edge \((y,z)\in W\) on the path from x to \(\rho _S\) such that \(\tau _S(y)< \tau _T(u) < \tau _S(z)\). The latter is ensured by choosing a different value for distinct vertices in V(A), see comment in Line (9). Hence, Line (14) ensures, that \(\mu (u)\) is mapped on the correct edge such that (C2) is satisfied. It follows that adjusted \(\mu\) is a valid timeconsistent reconciliation map.
We are now concerned with the timecomplexity. By Lemma 3, computation of \(\ell\) in Line (1) takes \(O(V\log (W))\) time and the forloop (Line (3)(5)) takes O(V) time. We continue to show that the auxiliary graph A (Line (6)) can be constructed in \(O(V\log (W))\) time.
Since we know \(\ell (u) = {\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u))\) for all \(u\in V\) and since T and S are trees, the subgraph with edges satisfying (A1)–(A3) can be constructed in \(O(V+W + E + F)) = O(V+W)\) time. To ensure (A4), we must compute for a possible transfer edges \((u,v)\in \mathcal {E}\) the vertex \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u) \cup \sigma _{T_{\mathcal {\overline{E}}}}(v))\). which can be done in \(O(\log (W))\) time. Note, the number of transfer edges is bounded by the number of possible transfer event O(V). Hence, generating all edges satisfying (A4) takes \(O(V(\log (W))\) time. In summary, computing A can done in \(O(V+W+V(\log (W)) = O(V(\log (W))\) time.
To detect whether A contains cycles one has to determine whether there is a topological order \(\tau\) on V(A) which can be done via depth first search in \(O(V(A)+E(A))\) time. Since \(V(A) = V+W\) and \(O(E(A)) = O(F+E+W+V)\) and S, T are trees, the latter task can be done in \(O(V + W)\) time. Clearly, Line (10)(11) can be performed on \(O(V + W)\) time.
Finally, we have to adjust \(\mu\) according to \(\tau _T\) and \(\tau _S\). Note, that for each \(u\in V\) with \(t(u) \in \{ \square , \triangle \}\) (Line (12)) we have possibly adjust \(\mu\) to the next edge (p(x), x). However, the possibilities for the choice of (p(x), x) is bounded by by the height of S, which is in the worst case \(\log (W)\). Hence, the forloop in Line (12) has totaltime complexity \(O(V\log (W))\).
In summary, the overall time complexity of Algorithm 1 is \(O(V\log (W))\). \(\square\)
So far, we have shown how to find a time consistent reconciliation map \(\mu\) given a species tree S and a single gene tree T. In practical applications, however, one often considers more than one gene family, and thus, a set of gene trees \(F=\{(T_1;t_1,\sigma _1),\ldots ,(T_n;t_n,\sigma _n)\}\) that has to be reconciled with one and the same species tree S.
Finding a time consistent reconciliation for a species tree S and a set of gene trees F then corresponds to finding a time map \(\tau _S\) for S and a time map \(\tau _T\) for the aggregated gene tree \((T;t,\sigma )\), such that (D1)–(D3) are satisfied.
If there exists a time consistent reconciliation map \(\mu\) from \((T;t,\sigma )\) to S then, by Theorem 2, there exists the two time maps \(\tau _T\) and \(\tau _S\) that satisfy (D1)–(D3). But then \(\tau _T\) and \(\tau _S\) also satisfy (D1)–(D3) w.r.t. any \((T_i;t_i,\sigma _i) \in F\) and therefore, \(\mu\) immediately gives a timeconsistent reconciliation map for each \((T_i;t_i,\sigma _i) \in F\).
Outlook and summary
We have characterized here whether a given eventlabeled gene tree \((T; t,\sigma )\) and species tree S can be reconciled in a timeconsistent manner in terms of two auxiliary graphs \(A_1\) and \(A_2\) that must be DAGs. These are defined in terms of given reconciliation maps. This condition yields an \(O(V\log (W))\)time algorithm to check whether a given reconciliation map \(\mu\) is timeconsistent, and an algorithm with the same time complexity for the construction of a timeconsistent reconciliation maps, provided one exists.
Our results depend on three conditions on the eventlabeled gene trees that are motivated by the fact that eventlabels can be assigned to internal vertices of gene trees only if there is observable information on the event. The question which eventlabeled gene trees are actually observable given an arbitrary, true evolutionary scenario deserves further investigation in future work. Here we have used conditions that arguable are satisfied when gene trees are inferred using sequence comparison and synteny information. A more formal theory of observability is still missing, however.
Our results point to an efficient way of deciding whether a given pair of gene and species tree can be timeconsistently reconciled. Such gene and species trees can be obtained from genomic sequence data using the following workflow: (i) Estimate putative orthologs and HGT events using e.g. one of the methods detailed in [11, 12, 26–38], respectively. Importantly, this step uses only sequence data as input and does not require the construction of either gene or species trees. (ii) Correct these estimates in order to derive “biologically feasible” homology relations as described in [15, 16, 26, 39–44]. The result of this step are (not necessarily fully resolved) gene trees together with eventlabels. (iii) Extract “informative triples” from the eventlabeled gene tree. These imply necessary conditions for gene trees to be biologically feasible [15, 16].
In general, there will be exponentially many putative species trees. This begs the question whether there is at least one species tree S for a gene tree and if so, how to construct S. In the absence of HGT, the answer is known: timeconsistent reconciliation maps are fully characterized in terms of “informative triples” [16]. Hence, the central open problem that needs to be addressed in further research are sufficient conditions for the existence of a timeconsistent species tree given an eventlabeled gene tree with HGT.
Proof of Theorem 1
We show that Definition 2 is is equivalent to the traditional definition of a DTLscenario [20, 24] in the special case that both the gene tree and species trees are binary. To this end we establish a series of lemmas detailing some useful properties of reconciliation maps.
Lemma 4
 1.
If \(v,w\in V(T)\) are in the same connected component of \(T_{\mathcal {\overline{E}}},\) then \(\mu ({\text {lca}}_{T_{\mathcal {\overline{E}}}}(v,w)) \succeq _S {\text {lca}}_S(\mu (v),\mu (w)).\) Let u be an arbitrary interior vertex of T with children v, w, then:
 2.
\(\mu (u)\) and \(\mu (v)\) are incomparable in S if and only if \((u,v)\in \mathcal {E}.\)
 3.
If \(t(u)=\bullet,\) then \(\mu (v)\) and \(\mu (w)\) are incomparable in S.
 4.
If \(\mu (v), \mu (w)\) are comparable or \(\mu (u)\succ _S {\text {lca}}_S(\mu (v),\mu (w)),\) then \(t(u)=\square.\)
Proof
We prove the Items 1 – 4 separately. Recall, Lemma 1 implies that \(\sigma (L_{T_{\mathcal {\overline{E}}}}(x)) \ne \emptyset\) for all \(x\in V(T)\).
Proof of Item 1: Let v and w be distinct vertices of T that are in the same connected component of \(T_{\mathcal {\overline{E}}}\). Consider the unique path P connecting w with v in \(T_{\mathcal {\overline{E}}}\). This path P is uniquely subdivided into a path \(P'\) and a path \(P''\) from \({\text {lca}}_{T_{\mathcal {\overline{E}}}}(v,w)\) to v and w, respectively. Condition (M3) implies that the images of the vertices of \(P'\) and \(P''\) under \(\mu\), resp., are ordered in S with regards to \(\preceq _S\) and hence, are contained in the intervals \(Q'\) and \(Q''\) that connect \(\mu ({\text {lca}}_{T_{\mathcal {\overline{E}}}}(v,w))\) with \(\mu (v)\) and \(\mu (w)\), respectively. In particular, \(\mu ({\text {lca}}_{T_{\mathcal {\overline{E}}}}(v,w))\) is the largest element (w.r.t. \(\preceq _S\)) in the union of \(Q'\cup Q''\) which contains the unique path from \(\mu (v)\) to \(\mu (w)\) and hence also \({\text {lca}}_S(\mu (v),\mu (w))\).
Proof of Item 2: If \((u,v)\in \mathcal {E}\) then, \(t(u)=\triangle\) and (M2iii) implies that \(\mu (u)\) and \(\mu (v)\) are incomparable.
To see the converse, let \(\mu (u)\) and \(\mu (v)\) be incomparable in S. Item (M3) implies that for any edge \((x,y)\in E(T_{\mathcal {\overline{E}}})\) we have \(\mu (y)\preceq _S \mu (x)\). However, since \(\mu (u)\) and \(\mu (v)\) are incomparable it must hold that \((u,v)\notin E(T_{\mathcal {\overline{E}}})\). Since (u, v) is an edge in the gene tree T, \((u,v)\in \mathcal {E}\) is a transfer edge.
Proof of Item 3: Let \(t(u)=\bullet\). Since none of (u, v) and (u, w) are transferedges, it follows that both edges are contained in \(T_{\mathcal {\overline{E}}}\).
Then, since T is a binary tree, it follows that \(L_{T_{\mathcal {\overline{E}}}}(u) = L_{T_{\mathcal {\overline{E}}}}(v)\cup L_{T_{\mathcal {\overline{E}}}}(w)\) and therefore, \(\sigma _{T_{\mathcal {\overline{E}}}}(u) = \sigma _{T_{\mathcal {\overline{E}}}}(v)\cup \sigma _{T_{\mathcal {\overline{E}}}}(w).\)
Proof of Item 4: Let \(\mu (v), \mu (w)\) be comparable in S. Item 3 implies that \(t(u)\ne \bullet\). Assume for contradiction that \(t(u) = \triangle\). Since by (O2) only one of the edges (u, v) and (u, w) is a transfer edge, we have either \((u,v)\in \mathcal {E}\) or \((u,w)\in \mathcal {E}\). W.l.o.g. let \((u,v)\in \mathcal {E}\) and \((u,w)\in E(T_{\mathcal {\overline{E}}})\). By Condition (M3), \(\mu (u)\succeq _S\mu (w)\). However, since \(\mu (v)\) and \(\mu (w)\) are comparable in S, also \(\mu (u)\) and \(\mu (v)\) are comparable in S; a contradiction to Item 2. Thus, \(t(u)\ne \triangle\). Since each interior vertex is labeled with one event, we have \(t(u) = \square\).
Lemma 5
Proof
We first emphasize that, by construction, \(\mu (u)\succeq _S \gamma (u)\) for all \(u\in V(T)\). Moreover, \(\mu (u) =\mu (v)\) implies that \(\gamma (u) =\gamma (v)\), and \(\gamma (u) =\gamma (v)\) implies that \(\mu (u)\) and \(\mu (v)\) are comparable. Furthermore, \(\mu (u)\prec _S \mu (v)\) implies \(\gamma (u)\preceq _S \gamma (v)\), while \(\gamma (u)\prec _S \gamma (v)\) implies that \(\mu (u)\prec _S \mu (v)\). Thus, \(\mu (u)\) and \(\mu (v)\) are comparable if and only if \(\gamma (u)\) and \(\gamma (v)\) are comparable.
Item (I) and (M1) are equivalent.
For Item (II) let \(u\in V(T)\setminus \mathbb {G}\) be an interior vertex with children v, w. If \((u,w) \notin \mathcal {E}\), then \(w\prec _{T_{\mathcal {\overline{E}}}} u\). Applying Condition (M3) yields \(\mu (w)\preceq _{S} \mu (u)\) and thus, by construction, \(\gamma (w)\preceq _{S} \gamma (u)\). Therefore, \(\gamma (u)\) is not a proper descendant of \(\gamma (w)\) and \(\gamma (w)\) is a descendant of \(\gamma (u)\). If one of the edges, say (u, v), is a transfer edge, then \(t(u) = \triangle\) and by Condition (M2iii) \(\mu (u)\) and \(\mu (v)\) are incomparable. Hence, \(\gamma (u)\) and \(\gamma (v)\) are incomparable. Therefore, \(\gamma (u)\) is no proper descendant of \(\gamma (v)\). Note that (O2) implies that for each vertex \(u\in V(T)\setminus \mathbb {G}\) at least one of its outgoing edges must be a nontransfer edge, which implies that \(\gamma (w)\preceq _{S} \gamma (u)\) or \(\gamma (v)\preceq _{S} \gamma (u)\) as shown before. Hence, Item (IIa) and (IIb) are satisfied.
For Item (III) assume first that \((u,v) \in \mathcal {E}\) and therefore \(t(u) = \triangle\). Then, (M2iii) implies that \(\mu (u)\) and \(\mu (v)\) are incomparable and thus, \(\gamma (u)\) and \(\gamma (v)\) are incomparable. Now assume that (u, v) is an edge in the gene tree T and \(\gamma (u)\) and \(\gamma (v)\) are incomparable. Therefore, \(\mu (u)\) and \(\mu (v)\) are incomparable. Now, apply Lemma 4(2).
Lemma 6
Proof
Let \(\gamma :V(T)\rightarrow V(S)\) be a map a DTLscenario for the binary the gene tree \((T;t,\sigma )\) and the species tree S.
Condition (M1) is equivalent to (I).
For (M3) assume that \(v \preceq _{T_{\mathcal {\overline{E}}}} w\). The path P from v to w in \(T_{\mathcal {\overline{E}}}\) does not contain transfer edges. Thus, by (III) all vertices along P are comparable. Moreover, by (IIa) we have that \(\gamma (w)\) is not a proper descendant of the image of its child in S, and therefore, by repeating these arguments along the vertices x in \(P_{wv}\), we obtain \(\gamma (v)\preceq _S\gamma (x)\preceq _S\gamma (w)\).
If \(\gamma (v)\prec _S\gamma (w)\), then by construction of \(\mu\), it follows that \(\mu (v)\prec _S\mu (w)\). Thus, (M3) is satisfied, whenever \(\gamma (v)\prec _S\gamma (w)\). Assume now that \(\gamma (v)=\gamma (w)\). If \(t(v),t(w) \in \{\square ,\triangle \}\) then \(\mu (v)=(x,\gamma (v))=(x,\gamma (w))=\mu (w)\) and thus (M3i) is satisfied. If \(t(v)=\bullet\) and \(t(w)\ne \bullet\) then since \(\mu (v)=\gamma (v)\) and \(\mu (w) = (x,\gamma (w))\). Thus \(\mu (v) \prec _S \mu (w)\).
Now assume that \(\gamma (v) =\gamma (w)\) and w is a speciation vertex. Since \(t(w) = \bullet\), for its two children \(w'\) and \(w''\) the images \(\gamma (w')\) and \(\gamma (w'')\) must be incomparable due to (IVb). W.l.o.g. assume that \(w'\) is a vertex of \(P_{wv}\). Since \(\gamma (v)\preceq _S\gamma (x)\preceq _S\gamma (w)\) for any vertex x along \(P_{wv}\) and \(\gamma (v) =\gamma (w)\), we obtain \(\gamma (w') =\gamma (w)\). However, since \(\gamma (w'') \preceq _S \gamma (w)\), the vertices \(\gamma (w')\) and \(\gamma (w'')\) are comparable in S; contradicting (IVb). Thus, whenever w is a speciation vertex, \(\gamma (w') =\gamma (w)\) is not possible. Therefore, \(\gamma (v)\preceq _S \gamma (w') \prec _S\gamma (w)\) and, by construction of \(\mu\), \(\mu (v) \prec _S\mu (w)\). Thus, (M3ii) is satisfied.
Finally, we show that (M2) is satisfied. To this end, observe first that (M2ii) is fulfilled by construction of \(\mu\) and (M2iii) is an immediate consequence of (III). Thus, it remains to show that (M2i) is satisfied. Thus, for a given speciation vertex u we need to show that \(\mu (u)={\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u))\). By construction, \(\mu (u) = \gamma (u)\). Note, \(T_{\mathcal {\overline{E}}}\) does not contain transfer edges. Applying (III) implies that for all edges (x, y) in \(T_{\mathcal {\overline{E}}}\) the images \(\gamma (x)\) and \(\gamma (y)\) must be comparable. The latter and (IIa) implies that for all edges (x, y) in \(T_{\mathcal {\overline{E}}}\) we have \(\gamma (y) \preceq _S \gamma (x)\). Take the latter together, \(\sigma (z)=\gamma (z) \preceq _S\gamma (u)\) for any leaf \(z\in L_{T_{\mathcal {\overline{E}}}}(u)\). Therefore \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u)) \preceq _S\gamma (u) = \mu (u)\). Assume for contradiction that \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u)) \prec _S\gamma (u) = \mu (u)\). Consider the two children \(u'\) and \(u''\) of u in \(T_{\mathcal {\overline{E}}}\). Since neither \((u,u') \in \mathcal {E}\) nor \((u,u'') \in \mathcal {E}\) and T is a binary tree, it follows that \(L_{T_{\mathcal {\overline{E}}}}(u) = L_{T_{\mathcal {\overline{E}}}}(u')\cup L_{T_{\mathcal {\overline{E}}}}(u'')\) and we obtain that \(\sigma _{T_{\mathcal {\overline{E}}}}(u) = \sigma _{T_{\mathcal {\overline{E}}}}(u') \cup \sigma _{T_{\mathcal {\overline{E}}}}(u'')\). Moreover, reusing the arguments above, \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u')) \preceq _S\gamma (u')\) and \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u'')) \preceq _S\gamma (u'')\). By the arguments we used in the proof for (M3), we have \(\gamma (u')\prec _S \gamma (u)\) and \(\gamma (u'')\prec _S \gamma (u)\). In particular, \(\gamma (u')\) and \(\gamma (u'')\) must be contained in the subtree of S that is rooted in the child a of \(\gamma (u)\) in S with \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u))\preceq _S a\), as otherwise, \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u')) \not \preceq _S\gamma (u')\) or \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u'')) \not \preceq _S\gamma (u'')\). Moreover, neither \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u))\preceq _S {\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u'))\) nor \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u))\preceq _S {\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u''))\) is possible since then \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u')) \preceq _S\gamma (u')\) and \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u'')) \preceq _S\gamma (u'')\) implies that \(\gamma (u')\) and \(\gamma (u'')\) would be comparable; contradicting (IVb). Hence, there remains only one way to locate \(\gamma (u')\) and \(\gamma (u'')\), that is, they must be located in the subtree of S that is rooted in \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u))\). But then we have \({\text {lca}}_S(\gamma (u'), \gamma (u''))\preceq _S {\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u)) \prec _S \gamma (u)\); a contradiction to (IVb) \(\gamma (u) = {\text {lca}}_S(\gamma (u'), \gamma (u''))\). Therefore, \({\text {lca}}_S(\sigma _{T_{\mathcal {\overline{E}}}}(u)) = \gamma (u) = \mu (u)\) and (M2i) is satisfied. \(\square\)
Declarations
Authors' contributions
NN and MH designed the study. NN implemented and designed the algorithms. All authors collaborated in research and the writing of the manuscript. All authors read and approved the final manuscript.
Acknowledgements
We thank the organizers of the 32nd TBI Winterseminar 2017 in Bled (Slovenia), where the authors met and jointly drafted the main ideas of this paper with the help of an unknown number of cold and tasty cans of red Union, or was it green Laško?
Competing interests
The authors declare that they have no competing interests.
Availability of data and materials
Not applicable.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Funding
Supported in part by the Danish Council for Independent Research, Natural Sciences, Grants DFF132300247 and DFF701400041.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Dress A, Moulton V, Steel M, Wu T. Species, clusters and the ‘tree of life’: a graphtheoretic perspective. J Theor Biol. 2010;265:535–42.View ArticlePubMedGoogle Scholar
 Fitch WM. Homology: a personal view on some of the problems. Trends Genet. 2000;16:227–31.View ArticlePubMedGoogle Scholar
 Hellmuth M, Stadler PF, Wieseke N. The mathematics of xenology: dicographs, symbolic ultrametrics, 2structures and tree representable systems of binary relations. J Math Biol. 2016;75(1):199–237. https://doi.org/10.1007/s0028501610843.View ArticlePubMedGoogle Scholar
 Hellmuth M, Wieseke N. From sequence data including orthologs, paralogs, and xenologs to gene and species trees. In: Pontarotti P, editor. Evolutionary Biology: convergent evolution, evolution of complex traits, concepts and methods. Cham: Springer; 2016. p. 373–92.View ArticleGoogle Scholar
 Guigó R, Muchnik I, Smith T. Reconstruction of ancient molecular phylogeny. Mol Phylogenet Evol. 1996;6:189–213.View ArticlePubMedGoogle Scholar
 Page RDM, Charleston MA. Trees within trees: phylogeny and historical associations. Trends Ecol Evol. 1998;13:356–9.View ArticlePubMedGoogle Scholar
 Zmasek C, Eddy S. A simple algorithm to infer gene duplication and speciation events on a gene tree. Bioinformatics. 2001;17:821–8.View ArticlePubMedGoogle Scholar
 Vernot B, Stolzer M, Goldman A, Durand D. Reconciliation with nonbinary species trees. J Comput Biol. 2008;15:981–1006. https://doi.org/10.1089/cmb.2008.0092.View ArticlePubMedPubMed CentralGoogle Scholar
 Hellmuth M, Wieseke N, Lechner M, Lenhof HP, Middendorf M, Stadler PF. Phylogenomics with paralogs. Proc Natl Acad Sci. 2015;112(7):2058–63. https://doi.org/10.1073/pnas.1412770112.View ArticlePubMedPubMed CentralGoogle Scholar
 Roth ACJ, Gonnet GH, Dessimoz C. Algorithm of OMA for largescale orthology inference. BMC Bioinf. 2008;9:518.View ArticleGoogle Scholar
 Altenhoff AM, Dessimoz C. Phylogenetic and functional assessment of orthologs inference projects and methods. PLoS Comput Biol. 2009;5:1000262.View ArticleGoogle Scholar
 Lechner M, HernandezRosales M, Doerr D, Wieseke N, Thévenin A, Stoye J, Hartmann RK, Prohaska SJ, Stadler PF. Orthology detection combining clustering and synteny for very large datasets. PLoS ONE. 2014;9(8):105015.View ArticleGoogle Scholar
 Altenhoff AM, Boeckmann B, CapellaGutierrez S, Dalquen DA, DeLuca T, Forslund K, HuertaCepas J, Linard B, Pereira C, Pryszcz LP, Schreiber F, da Silva AS, Szklarczyk D, Train CM, Bork P, Lecompte O, von Mering C, Xenarios I, Sjölander K, Jensen LJ, Martin MJ, Muffato M, Gabaldón T, Lewis SE, Thomas PD, Sonnhammer E, Dessimoz C. Standardized benchmarking in the quest for orthologs. Nat Methods. 2016;13:425–30.View ArticlePubMedPubMed CentralGoogle Scholar
 Hellmuth M, HernandezRosales M, Huber KT, Moulton V, Stadler PF, Wieseke N. Orthology relations, symbolic ultrametrics, and cographs. J Math Biol. 2013;66(1–2):399–420.View ArticlePubMedGoogle Scholar
 Hellmuth M. Biologically feasible gene trees, reconciliation maps and informative triples. Algorithms Mol Biol. 2017;12(1):23.View ArticlePubMedPubMed CentralGoogle Scholar
 HernandezRosales M, Hellmuth M, Wieseke N, Huber KT, Moulton V, Stadler PF. From eventlabeled gene trees to species trees. BMC Bioinf. 2012;13(Suppl 19):6.Google Scholar
 Doyon JP, Ranwez V, Daubin V, Berry V. Models, algorithms and programs for phylogeny reconciliation. Brief Bioinf. 2011;12(5):392.View ArticleGoogle Scholar
 Merkle D, Middendorf M. Reconstruction of the cophylogenetic history of related phylogenetic trees with divergence timing information. Theor Biosci. 2005;4:277–99.View ArticleGoogle Scholar
 Charleston MA. Jungles: a new solution to the host/parasite phylogeny reconciliation problem. Math Biosci. 1998;149(2):191–223.View ArticlePubMedGoogle Scholar
 Tofigh A, Hallett M, Lagergren J. Simultaneous identification of duplications and lateral gene transfers. IEEE/ACM Trans Comput Biol Bioinf. 2011;8(2):517–35.View ArticleGoogle Scholar
 Böcker S, Dress AWM. Recovering symbolically dated, rooted trees from symbolic ultrametrics. Adv Math. 1998;138:105–25.View ArticleGoogle Scholar
 Hellmuth M, Wieseke N. On symbolic ultrametrics, cotree representations, and cograph edge decompositions and partitions., Proceedings COCOON 2015Cham: Springer; 2015. p. 609–23.Google Scholar
 Hellmuth M, Wieseke N. On tree representations of relations and graphs: Symbolic ultrametrics and cograph edge decompositions. J Comb Optim. 2017; https://doi.org/10.1007/s1087801701117.Google Scholar
 Bansal MS, Alm EJ, Kellis M. Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss. Bioinformatics. 2012;28(12):283–91.View ArticleGoogle Scholar
 Kahn AB. Topological sorting of large networks. Commun ACM. 1962;5(11):558–62.View ArticleGoogle Scholar
 Altenhoff AM, Gil M, Gonnet GH, Dessimoz C. Inferring hierarchical orthologous groups from orthologous gene pairs. PLoS ONE. 2013;8(1):53786.View ArticleGoogle Scholar
 Altenhoff AM, et al. The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements. Nucleic Acids Res. 2015;43(D1):240–9.View ArticleGoogle Scholar
 Chen F, Mackey AJ, Stoeckert CJ, Roos DS. OrthoMCLdb: querying a comprehensive multispecies collection of ortholog groups. Nucleic Acids Res. 2006;34(S1):363–8.View ArticleGoogle Scholar
 Lechner M, Findeiß S, Steiner L, Marz M, Stadler PF, Prohaska SJ. Proteinortho: detection of (co)orthologs in largescale analysis. BMC Bioinf. 2011;12:124.View ArticleGoogle Scholar
 Östlund G, Schmitt T, Forslund K, Köstler T, Messina DN, Roopra S, Frings O, Sonnhammer ELL. InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res. 2010;38(suppl 1):196–203.View ArticleGoogle Scholar
 Tatusov RL, Galperin MY, Natale DA, Koonin EV. The COG database: a tool for genomescale analysis of protein functions and evolution. Nucleic Acids Res. 2000;28(1):33–6.View ArticlePubMedPubMed CentralGoogle Scholar
 Trachana K, Larsson TA, Powell S, Chen WH, Doerks T, Muller J, Bork P. Orthology prediction methods: a quality assessment using curated protein families. BioEssays. 2011;33(10):769–80.View ArticlePubMedPubMed CentralGoogle Scholar
 Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Edgar R, Federhen S, Feolo M, Geer LY, Helmberg W, Kapustin Y, Khovayko O, Landsman D, Lipman DJ, Madden TL, Maglott DR, Miller V, Ostell J, Pruitt KD, Schuler GD, Shumway M, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchenko G, Tatusov RL, Tatusova TA, Wagner L, Yaschenko E. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2008;36:13–21.View ArticleGoogle Scholar
 Clarke GDP, Beiko RG, Ragan MA, Charlebois RL. Inferring genome trees by using a filter to eliminate phylogenetically discordant sequences and a distance matrix based on mean normalized BLASTP scores. J Bacteriol. 2002;184(8):2072–80.View ArticlePubMedPubMed CentralGoogle Scholar
 Dessimoz C, Margadant D, Gonnet GH. DLIGHT—lateral gene transfer detection using pairwise evolutionary distances in a statistical framework. In: Proceedings RECOMB 2008, pp. 315–330. Springer, Berlin; 2008.Google Scholar
 Lawrence JG, Hartl DL. Inference of horizontal genetic transfer from molecular data: an approach using the bootstrap. Genetics. 1992;131(3):753–60.PubMedPubMed CentralGoogle Scholar
 Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA. 1999;96(8):4285–8.View ArticlePubMedPubMed CentralGoogle Scholar
 Ravenhall M, Škunca N, Lassalle F, Dessimoz C. Inferring horizontal gene transfer. PLoS Comput Biol. 2015;11(5):1004095.View ArticleGoogle Scholar
 Dondi R, Lafond M, ElMabrouk N. Approximating the correction of weighted and unweighted orthology and paralogy relations. Algorithms Mol Biol. 2017;12(1):4.View ArticlePubMedPubMed CentralGoogle Scholar
 Lafond M, ElMabrouk N. Orthology and paralogy constraints: satisfiability and consistency. BMC Genom. 2014;15(6):12.View ArticleGoogle Scholar
 Lafond M, ElMabrouk N. Orthology relation and gene tree correction: complexity results. In: International workshop on algorithms in bioinformatics, Berlin: Springer; 2015. p. 66–79.Google Scholar
 Dondi R, ElMabrouk N, Lafond M. Correction of weighted orthology and paralogy relationscomplexity and algorithmic results. In: International workshop on algorithms in bioinformatics, Berlin: Springer; 2016. p. 121–36.Google Scholar
 Dondi R, Mauri G, Zoppis I. Orthology correction for gene tree reconstruction: Theoretical and experimental results. Procedia Computer Science. International Conference on Computational Science, ICCS 2017, 1214 June 2017, Zurich, Switzerland. p. 1115–24.Google Scholar
 Lafond M, Dondi R, ElMabrouk N. The link between orthology relations and gene trees: a correction perspective. Algorithms Mol Biol. 2016;11(1):1.View ArticleGoogle Scholar