Biologically feasible gene trees, reconciliation maps and informative triples

Background The history of gene families—which are equivalent to event-labeled gene trees—can be reconstructed from empirically estimated evolutionary event-relations containing pairs of orthologous, paralogous or xenologous genes. The question then arises as whether inferred event-labeled gene trees are biologically feasible, that is, if there is a possible true history that would explain a given gene tree. In practice, this problem is boiled down to finding a reconciliation map—also known as DTL-scenario—between the event-labeled gene trees and a (possibly unknown) species tree. Results In this contribution, we first characterize whether there is a valid reconciliation map for binary event-labeled gene trees T that contain speciation, duplication and horizontal gene transfer events and some unknown species tree S in terms of “informative” triples that are displayed in T and provide information of the topology of S. These informative triples are used to infer the unknown species tree S for T. We obtain a similar result for non-binary gene trees. To this end, however, the reconciliation map needs to be further restricted. We provide a polynomial-time algorithm to decide whether there is a species tree for a given event-labeled gene tree, and in the positive case, to construct the species tree and the respective (restricted) reconciliation map. However, informative triples as well as DTL-scenarios have their limitations when they are used to explain the biological feasibility of gene trees. While reconciliation maps imply biological feasibility, we show that the converse is not true in general. Moreover, we show that informative triples neither provide enough information to characterize “relaxed” DTL-scenarios nor non-restricted reconciliation maps for non-binary biologically feasible gene trees.


Background
The evolutionary history of genes is intimately linked with the history of the species in which they reside. Genes are passed from generation to generation to the offspring. Some of those genes are frequently duplicated, mutate, or get lost-a mechanism that also ensures that new species can evolve. In particular, genes that share a common origin (homologs) can be classified into the type of their "evolutionary event relationship", namely orthologs, paralogs and xenologs [1,2]. Two homologous genes are orthologous if at their most recent point of origin the ancestral gene is transmitted to two daughter lineages; a speciation event happened. They are paralogous if the ancestor gene at their most recent point of origin was duplicated within a single ancestral genome; a duplication event happened. Horizontal gene transfer (HGT) refers to the transfer of genes between organisms in a manner other than traditional reproduction and across different species and yield so-called xenologs. In contrast to orthology and paralogy, the definition of xenology is less well established and by no means consistent in the biological literature. One definition stipulates that two genes are xenologs if their history since their common ancestor involves horizontal transfer of at least one of them [2,3]. The mathematical framework for evolutionary event-relations relations in terms of symbolic ultrametrics, cographs and two-structures [4][5][6][7], on the other hand, naturally accommodates more than two types of events associated with the internal nodes of the gene tree. We follow the notion in [1,6] Hellmuth Algorithms Mol Biol (2017) 12:23 and call two genes xenologous, whenever their least common ancestor was a HGT event.
The knowledge of evolutionary event relations such as orthology, paralogy or xenology is of fundamental importance in many fields of mathematical and computational biology, including the reconstruction of evolutionary relationships across species [8][9][10][11][12], as well as functional genomics and gene organization in species [13][14][15]. The type of event relationship is determined by the true history of the genes and species. However, events of the past cannot be observed directly and hence, must be inferred from the genomic data available today. Tree-reconciliation methods are widely studied in the literature [9,[16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31] and provide one way to address this problem. Here, a gene tree is mapped into a species tree such that certain optimization criteria are fulfilled. This mapping, eventually, identifies inner vertices of the gene tree as a duplication, speciation or HGT. These methods usually require a gene and species tree as input. In most practical applications, however, neither the gene tree nor the species tree can be determined unambiguously. Intriguingly, there are methods to infer orthologs [14,[32][33][34][35][36][37][38][39][40] or to detect HGT [41][42][43][44][45] without the need to construct gene or species trees. Given empirical estimated event-relations one can infer the history of gene families which are equivalent to event-labeled gene trees [5,6,11,[46][47][48].
The crucial point is the following important result: For (tree-free estimated) event-relations there is an eventlabeled gene tree that represents this estimate if and only if the respective event-relations are directed cographs [5,6]. Usually, estimated event-relations violate this condition and must, therefore, be corrected [33,36,[46][47][48][49][50][51]. Such corrected event-relations can, in most cases, be represented by an event-labeled gene tree. However, these trees can still be error-prone in the sense that there is no species tree on which they can evolve. The latter strongly depends on the applied correction method, the presence or absence of HGT events and, in particular, the theoretical model that is used to define that "a gene tree evolves along a species tree" (reconciliation map). The method ParaPhylo [11] already uses many of the latter mentioned ideas for the reconstruction of species trees and event-labeled gene trees without HGT-events. Para-Phylo is based on the knowledge of estimated orthology relations which are cleaned up to the closest cograph and, afterwards, corrected to obtain biologically feasible gene trees.
For an event-labeled gene tree to be biologically feasible there must be a putative "true" history that can explain the inferred gene tree. However, in practice it is not possible to observe the entire evolutionary history as, e.g. gene losses eradicate the entire information on parts of the history. Therefore, the problem of determining whether an event-labeled gene tree is biologically feasible is reduced to the problem of finding a valid reconciliation map, also known as DTL-scenario [29,31], between the event-labeled gene trees and an arbitrary (possibly unknown) species tree. DTL-scenarios and its variants have been extensively studied [22,29,[52][53][54] and have also applications in the context of the host-parasite cophylogeny problem [55][56][57][58][59][60][61][62].
In this contribution, we assume that we have a given event-labeled gene tree T and wish to answer the question: Is T biologically feasible and how much information about the unknown species tree S and the reconciliation between T and S is already contained in the gene tree T?
To this end, we first provide a mathematical definition of the term "biologically feasible" and two types of reconciliation maps: DTL-scenarios (as used in, e.g. [29,31,63]) and a restricted version (as used in, e.g. [12,48]). Given the event-labeled gene-trees, it is possible to derive "informative" triples that are displayed in the gene tree T and provide information on the topology of the species tree S. In particular, we prove that consistency of informative triple sets characterize whether there are DTLscenarios and restricted maps for binary and non-binary gene trees, respectively. The latter generalizes results established for binary gene trees that do not contain HGT-events by Hernandez et al. [10]. Furthermore, we provide a polynomial-time algorithm to decide whether there is a species tree for a given event-labeled gene tree and, in the positive case, to construct the species tree and a respective (restricted) reconciliation map.
In addition to the established results, we discuss limitations of reconciliation maps to explain biological feasibility of gene trees. While any (restricted) reconciliation map gives an idea of a putative true history that can explain the given gene tree, the converse is in general not true. We provide simple examples that show that not all biologically feasible gene trees can be explained by (restricted) DTL-scenarios. This immediately raises the question whether generalization of reconciliation maps might be used to explain biological feasibility. We shortly discuss a mild generalization, so-called "relaxed" reconciliation maps. However, as it turns out such general maps cannot be characterized by informative triples. We close this contribution with a couple of open problems.

Preliminaries
A rooted tree T = (V , E) (on L) is an acyclic connected simple graph with leaf set L ⊆ V , set of edges E, and set of interior vertices V 0 = V \ L such that there is one distinguished vertex ρ T ∈ V , called the root of T.
A vertex v ∈ V is called a descendant of u ∈ V , v T u, and u is an ancestor of v, u T v, if u lies on the path from ρ T to v. As usual, we write v ≺ T u and u ≻ T v to mean v T u and u � = v. If u T v or v T u then u and v are comparable and otherwise, incomparable. For x ∈ V , we write L T (x) := {y ∈ L | y � x} for the set of leaves in the subtree T(x) of T rooted in x.
Remark 1 It will be convenient to use a notation for edges e that implies which of the vertex in e is closer to the root. Thus, the notation for edges (u, v) of a tree is always chosen such that u ≻ T v.
For our discussion below we need to extend the ancestor relation T on V to the union of the edge and vertex sets of T. More precisely, for the edge e = (u, v) ∈ E we put x ≺ T e if and only if x T v and e ≺ T x if and only if u T x. For edges e = (u, v) and f = (a, b) in T we put e T f if and only if v T b. In the latter case, the edges e and f are called comparable.
For a non-empty subset of leaves A ⊆ L, we define lca T (A), or the least common ancestor of A, to be the unique T -minimal vertex of T that is an ancestor of every vertex in A. In case A = {x, y}, we put lca T (x, y) := lca T ({x, y}) and if A = {x, y, z}, we put lca T (x, y, z) := lca T ({x, y, z}). We will make frequent use that for two non-empty vertex sets A, B of a tree, it always holds that lca(A ∪ B) = lca(lca(A), lca(B)).
A phylogenetic tree T (on L) is a rooted tree T = (V , E) (on L) such that no interior vertex v ∈ V 0 has degree two, except possibly the root ρ T . If L corresponds to a set of genes G or species S, we call a phylogenetic tree on L gene tree and species tree, respectively. The restriction T |L ′ of a phylogenetic tree T to L ′ ⊆ L is the rooted tree with leaf set L ′ obtained from T by first forming the minimal subtree of T with leaf set L ′ and then by suppressing all vertices of degree two with the exception of the root ρ T |L ′ . By construction, In other words, ℓ |L ′ keeps the vertex-labels of all nonsuppressed vertices and assigns the edge-label of the edge (u, v) in T to the edge (u, v) in T |L ′, if v = b and otherwise, to the edge (u, b) in T |L ′, where b is the first nonsuppressed vertex that lies on the unique path from v to b in T.
Rooted triples are phylogenetic trees on three leaves with precisely two interior vertices. They constitute an important concept in the context of supertree reconstruction [64][65][66] and will also play a major role here. A rooted tree T on L displays a triple (xy|z) if, x, y, z ∈ L and the path from x to y does not intersect the path from z to the root ρ T and thus, having lca T (x, y) ≺ T lca T (x, y, z). We denote by R(T ) the set of all triples that are displayed by the rooted tree T. A set R of triples is consistent if there is a rooted tree T on L R = ∪ r∈R L r (ρ r ) such that R ⊆ R(T ) and thus, T displays each triple in R. Not all sets of triples are consistent of course. Nevertheless, given a triple set R there is a polynomial-time algorithm, referred to in [64,67] as BUILD, that either constructs a phylogenetic tree T that displays R or that recognizes that R is not consistent [68]. The runtime of BUILD is O(|L R ||R|) [64]. Further practical implementations and improvements have been discussed in [69][70][71][72].
We will consider rooted trees T = (V , E) from which particular edges are removed. Let E ⊆ E and consider the forest T E := (V , E \ E). We can preserve the order T for all vertices within one connected component of T E and define T E as follows: x T E y iff x T y and x, y are in same connected component of T E . Since each connected component T ′ of T E is a tree, the ordering as the set of leaves in T E that are reachable from x. Hence, all y ∈ L T E (x) must be contained in the same connected component of T E . We say that the forest T E displays a triple r, if r is displayed by one of its connected components. Moreover, R(T E ) denotes the set of all triples that are displayed by the forest T E .

Biologically feasible and observable gene trees
A gene tree arises through a series of events (speciation, duplication, HGT, and gene loss) along a species tree. In a "true history" the gene tree T = (V , E) on a set of genes G is equipped with an event-labeling map t : V ∪ E → I ∪ {0, 1} with I = {s, d, t, ⊙, x} that assigns to each vertex v of T a value t(v) ∈ I indicating whether v is a speciation event (s), duplication event (d), HGT event (t), extant leaf (⊙) or a loss event (x). Note, in the figures we omitted the symbol ⊙ and used •, and △ for s, d and t, respectively.
Horizontal gene transfer is intrinsically a directional event, i.e., there is a clear distinction between the horizontally transferred "copy" and the "original" that continues to be vertically transferred. To this end, the edges in the gene tree are annotated by associating a label to the edge that points from the horizontal transfer event to the next event in the history of the copy. To be more precise, to each edge e a value t(e) ∈ {0, 1} is assigned that indicates whether e is a transfer edge (1) or not (0). Hence, e = (x, y) and t(e) = 1 iff t(x) = t and the genetic material is transferred from the species containing x to a species containing y. We remark that the restriction of t to the vertex set V was introduced as "symbolic dating map" in [4] and that there is a close relationship to socalled cographs [5,73,74]. Let G ⊆ G be the set of all extant genes in T, i.e., G contains all genes v of G with t(v) � = x. Hence, there is a map σ : G → S that assigns to each extant gene the extant species in which it resides.
We assume that the gene tree and its event labels are inferred from (sequence) data, i.e., T is restricted to those labeled trees that can be constructed at least in principle from observable data. Gene losses eradicate the entire information on parts of the history and thus, cannot directly be observed from extant sequences. Hence, in our setting the (observable) gene tree T is the restriction T |G to the set of extant genes equipped with the event-label t = t |G , see Fig. 1. Since all leaves of T are extant genes in G we don't need to specially label the leaves in G, and thus simplify the event-labeling map t : V 0 ∪ E → I ∪ {0, 1} by assigning only to the interior vertex an event in I = {s, d, t}. We assume here that all non-transfer edges transmit the genetic material vertically, that is, from an ancestral species to its descendants.
with event-labeling t and corresponding map σ. The set E = {e ∈ E | t(e) = 1} will always denote the set of transfer edges in (T ; t, σ ).
Additionally, we consider gene trees (T = (V , E); t, σ ) from which the transfer edges have been removed, resulting in the forest T E = (V , E \ E) in which we preserve the event-labeling t of all vertices.
We call a gene tree (T ; t, σ ) on G biologically feasible, if there is a true scenario such that T = T |G and t = t |G , that is, there is a true history that can explain (T ; t, σ ). By way of example, the gene tree in Fig. 1 (right) is biologically feasibly. However, so-far it is unknown whether there are gene trees (T ; t, σ ) that are not biologically feasible. Answering the latter might be a hard task, as many HGT or duplication vertices followed by losses can be inserted into T that may result in a putative true history that explains the event-labeled gene tree.
Following Nøjgaard et al. [63], we additionally restrict the set of observable gene trees (T ; t, σ ) to those gene trees that satisfy the following observability axioms: (O1) Every internal vertex v has degree at least three, except possibly the root which has degree at least two. (O2) Every HGT node has at least one transfer edge, t(e) = 1, and at least one non-transfer edge, Condition (O1) is justified by the restriction T = T |G of the true binary gene tree T to the set of extant genes G , since T = T |G is always a phylogenetic tree. In particular, (O1) ensures that every event leaves a historical trace in the sense that there are at least two children that have survived in at least two of its subtrees. Condition (O2) ensures that for an HGT event a historical trace remains of both the transferred and the non-transferred copy. Condition (O3.a) is a consequence of (O1), (O2) and a stronger Condition (O3.a') claimed in [63]: If x is a speciation vertex, then there are at least two distinct children v, w of x such that the species V and W that contain v and w, resp., are incomparable in S. Note, a speciation vertex x cannot be observed from data if it does not "separate" lineages, that is, there are two leaf descendants of distinct children of x that are in distinct species. Condition (O3.a') is even weaker and ensures that any "observable" speciation vertex x separates at least locally two lineages. As a result of (O3.a') one can obtain (O3.a) [63]. Intuitively, (O3.a) is satisfied since within a connected component of T E no genetic material is exchanged between non-comparable nodes. Thus, a gene separated in a speciation event necessarily ends up in distinct species in the absence of the transfer edges.
Condition (O3.b) is a consequence of (O1), (O2) and a stronger Condition (O3.b') claimed in [63]: If (v, w) is a transfer edge in T, then t(v) = t and the species V and W that contain v and w, resp., are incomparable in S. Note, Fig. 1 Left an example of a "true" history of a gene tree that evolves along the (tube-like) species tree. The set of extant genes G comprises a,a′,b,b′,c,c′,c″ and e and σ maps each gene in G to the species (capitals below the genes) A, B, C, E ∈ σ (G). For simplicity all speciation events followed by a loss along the path from v to a ′ in T are omitted. Left the observable gene tree (T ; t, σ ) is shown. Since there is a true scenario which explains (T ; t, σ ), the gene tree is biologically feasible. In particular, (T ; t, σ ) satisfies (O1), (O2) and (O3) Hellmuth Algorithms Mol Biol (2017) 12:23 if (v, w) ∈ E then v signifies the transfer event itself but w refers to the next (visible) event in the gene tree T. In a "true history" v is contained in a species V that transmits its genetic material (maybe along a path of transfers) to a contemporary species Z that is an ancestor of the species W containing w. In order to have evidence that this transfer happened, Condition (O3.b') is used and as a result one obtains (O3.b). The intuition behind (O3.b) is as follows: observe that T E (x) and T E (y) are subtrees of distinct connected components of T E whenever (x, y) ∈ E. Since HGT amounts to the transfer of genetic material across distinct species, the genes x and y are in distinct species, cf. (O3.b). However, since T E does not contain transfer edges and thus, there is no genetic material transferred across distinct species between distinct connected components in T E . We refer to [63] for further details.
We simplify the notation a bit and write Based on Axiom (O2) the following results was established in [63].
Lemma 3.1 particularly implies that σ T E (x) � = ∅ for all x ∈ V (T ). Note, T E might contain interior vertices (distinct from the root) that have degree two. Nevertheless, for each x T E y in T E we have x T y in T. Hence, partial information (that in particular is "undisturbed" by transfer edges) on the partial ordering of the vertices in T can be inferred from T E .

Reconciliation map
Before we define a reconciliation map that "embeds" a given gene tree into a given species tree we need a slight modification of the species tree. In order to account for duplication events that occurred before the first speciation event, we need to add an extra vertex and an extra edge "above" the last common ancestor of all species: hence, we add an additional vertex to W (that is now the new root ρ S of S) and the additional edge (ρ S , lca S (S)) ∈ F . Note that strictly speaking S is not a phylogenetic tree anymore. In case there is no danger of confusion, we will from now on refer to a phylogenetic tree on S with this extra edge and vertex added as a species tree on S.
is a gene tree with leaf set G and that σ : G → S and t : V 0 → {s, d, t} ∪ {0, 1} are the maps described above. Then we say that S is a species tree for Note, the latter implies that the path connecting x and y in T does not contain transfer edges. We distinguish two cases: Definition 2 is a natural generalization of the map defined in [10], that is, in the absence of horizontal gene transfer, Condition (M2.iii) vanishes and thus, the proposed reconciliation map precisely coincides with the one given in [10]. In case that the event-labeling of T is unknown, but a species tree S is given, the authors in [31,54] gave an axiom set, called DTL-scenario, to reconcile T with S. This reconciliation is then used to infer the event-labeling t of T. The "usual" DTL axioms explicitly refer to binary, fully resolved gene and species trees. We therefore use a different axiom set that is, nevertheless, equivalent to DTL-scenarios in case the considered gene trees are binary [63].
Condition (M1) ensures that each leaf of T, i.e., an extant gene in G, is mapped to the species in which it resides. Condition (M2.i) and (M2.ii) ensure that each vertex of T is either mapped to a vertex or an edge in S such that a vertex of T is mapped to an interior vertex of S if and only if it is a speciation vertex. We will discuss (M2.i) in further detail below. Condition (M2.iii) maps the vertices of a transfer edge in a way that they are incomparable in the species tree and is used to satisfy axiom (O3). Condition (M3) refers only to the connected components of T E and is used to preserve the ancestor order Hellmuth Algorithms Mol Biol (2017) 12:23 It needs to be discussed, why one should map a speciation vertex x to lca S (σ T E (x)) as required in (M2.i). The next lemma shows, that one can put µ(x) = lca S (σ T E (x)).
is the lowest possible choice for the image of a speciation vertex. Note that there are possibly exponentially many reconciliation maps, whenever µ(x) ≻ S lca S (σ T E (x)) is allowed for speciation vertices x. First, we we restrict our attention to those maps that satisfy (M2.i) only. In particular, as we shall see in "Binary gene trees" section, there is a neat characterization of maps that satisfy (M2.i) that does, however, not work for maps with "relaxed" (M2.i), as discussed in "Limitations of informative triples andreconciliation maps" section.
Moreover, we have the following result, which is a mild generalization of [63].
, µ(w)) and we are done. Now assume that v and w are incomparable in T E . Consider the unique path P connecting w with v in T E . This path P is uniquely subdivided into a path P ′ and a path P ′′ from lca T E (v, w) to v and w, respectively. Condition (M3) implies that the images of the vertices of P ′ and P ′′ under µ, resp., are ordered in S with regards to S and hence, are contained in the intervals Q ′ and Q ′′ that connect µ(lca T E (v, w)) with µ(v) and µ(w), respectively. In particular, µ(lca T E (v, w)) is the largest element (w.r.t. S ) in the union of Q ′ ∪ Q ′′ which contains the unique path from µ(v) to µ(w) and hence also lca S (µ(v), µ(w)).
Item 2 was already proven in [63]. Assume now that there is a reconciliation map µ from (T ; t, σ ) to S. From a biological point of view, however, it is necessary to reconcile a gene tree with a species tree such that genes do not "travel through time", a see Fig. 4 for an example.
Definition 4 A reconciliation map µ from (T ; t, σ ) to S is time-consistent if there are time maps τ T for T and τ S for S for all u ∈ V (T ) satisfying the following conditions: Condition (T1) is used to identify the time-points of speciation vertices and leaves u in the gene tree with the time-points of their respective images µ(u) in the species trees. Moreover, duplication or HGT vertices u are mapped to edges µ(u) = (x, y) in S and the time point of u must thus lie between the time points of x and y which is ensured by Condition (T2). Nøjgaard et al. [63] designed an O(|V (T )| log(|V (S)|))-time algorithm to check whether a given reconciliation map µ is time-consistent, and an algorithm with the same time complexity for the construction of a time-consistent reconciliation map, provided one exists. Clearly, a necessary condition for the existence of time-consistent reconciliation maps from (T ; t, σ ) to S is the existence of some reconciliation map from (T ; t, σ ) to S. In the next section, we first characterize the existence of reconciliation maps and discuss open time-consistency problems.

From gene trees to species trees
Since a gene tree T is uniquely determined by its induced triple set R(T ), it is reasonable to expect that a lot of information on the species tree(s) for (T ; t, σ ) is contained in the images of the triples in R(T ), or more precisely their leaves under σ. However, not all triples in R(T ) are informative, see Fig. 2 for an illustrative example. In the absence of HGT, it has already been shown by Hernandez-Rosales et al. [10] that the informative triples r ∈ R(T ) are precisely those that are rooted at a speciation event and where the genes in r reside in three distinct species. However, in the presence of HGT we need to further subdivide the informative triples as follows. as the subset of all triples displayed in T E such that the leaves are from pairwise distinct species. Let be the set of triples in R σ (T E ) that are rooted at a speciation event.
For each e i = (x, y) ∈ E define The informative triples of T are comprised in the set R(T ; t, σ ) = ∪ h i=0 R i (T E ). Finally, we define the informative species triple set that can be inferred from the informative triples of (T ; t, σ ).

Binary gene trees
In this section, we will be concerned only with binary, i.e., "fully resolved" gene trees, if not stated differently. This is justified by the fact that a speciation or duplication event instantaneously generates exactly two offspring. However, we will allow also non-binary species tree to model incomplete knowledge of the exact species phylogeny. Non-binary gene trees are discussed in "Non-binary gene trees" section. Hernandez et al. [10] established the following characterization for the HGT-free case. There is a species tree on S = σ (G) for (T ; t, σ ) if and only if the triple set S is consistent.
We emphasize that the results established in [10] are only valid for binary gene trees, although this was not explicitly stated. For an example that shows that Theorem 5.1 is not always satisfied for non-binary gene trees see Fig. 3. Lafond and El-Mabrouk [12,48] established a similar result as in Theorem 5.1 by using only species triples that can be obtained directly from a given orthology/paralogy-relation. However, they require a stronger version of axiom (O3.a), that is, the images of all children of a speciation vertex must be pairwisely incomparable in the species tree. We, too, will use this restriction in "Non-binary gene trees" section.
In what follows, we generalize the latter result and show that consistency of S(T ; t, σ ) characterizes whether there is a species tree S for (T ; t, σ ) even if (T ; t, σ ) contains HGT. First assume that (ab|c) ∈ R 0 , that is (ab|c) is displayed in T E and t(lca T E (a, b, c)) = s. For simplicity set u = lca T E (a, b, c) and let x, y be its children in T E . Since (ab|c) ∈ R 0 , we can assume that w.l.o.g. a, b ∈ L T E (x) and c ∈ L T E (y). Hence, x T E lca T E (a, b) and y T E c . Condition (M3) implies that µ(y) � S µ(c) = σ (c). Moreover, Condition (M3) and Lemma 4.2(1) imply that µ(x) � S µ(lca T E (a, b)) � S lca S (µ(a), µ(b)) = lca S (σ (a), σ (b)) . Since t(u) = s, we can apply Lemma 4.2(2) and conclude that µ(x) and µ(y) are incomparable in S. Hence, σ (c) Fig. 2 Left an example of a "true" history of a gene tree that evolves along the (tube-like) species tree (taken from [11]). The set of extant genes G comprises a, b, c 1 , c 2 and d and σ maps each gene in G to the species (capitals below the genes) A, B, C, D ∈ S. Upper right the observable gene tree (T ; t, σ ) is shown. To derive S(T ; t, σ ) we cannot use the triples R 0 (T ), that is, we need to remove the transfer edges. To be more precise, if we would consider R 0 (T ) we obtain the triples (ac 1 |d) and (c 2 d|a) which leads to the two contradicting species triples (AC|D) and (CD|A). Thus, we restrict R 0 to T E and obtain R 0 (T E ) = {(ac 1 |d)}. However, this triple alone would not provide enough information to obtain a species tree such that a valid reconciliation map µ can be constructed. Hence, we take R 1 (T E ) = {(bc 2 |d)} into account and obtain S(T ; t, σ ) = {(AC|D), (BC|D)}. Lower right a least resolved species tree S (obtained with BUILD) that displays all triples in S(T ; t, σ ) together with the reconciled gene tree (T ; t, σ ) is shown. Although S does not display the triple (AB|C) as in the true history, this tree S does not pretend a higher resolution than actually supported by (T ; t, σ ). Clearly, as more gene trees (gene families) are available as more information about the resolution of the species tree can be provided Hellmuth Algorithms Mol Biol (2017) 12:23 and lca S (σ (a), σ (b)) are incomparable. Thus, the triple (σ (a)σ (b)|σ (c)) must be displayed in S. Now assume that (ab|c) ∈ R i for some transfer edge e i = (x, y) ∈ E. For e i = (x, y) we either have a, b ∈ L T E (x) and c ∈ L T E (y) or c ∈ L T E (x) and a, b ∈ L T E (y). W.l.o.g. let a, b ∈ L T E (x) and c ∈ L T E (y).

Lemma 5.3
Let S = (W , F ) be a species tree on S. Then there is a reconciliation map µ from a gene tree (T ; t, σ ) to S whenever S displays all triples in S(T ; t, σ ). Fig. 3 Consider the "true" history (left) that is also shown in Fig. 1. The center-left gene tree (T ; t, σ ) is biologically feasible and obtained as the observable part of the true history. There is no reconciliation map for (T ; t, σ ) to any species tree according to Def. 2 because S(T ; t, σ ) is inconsistent (cf. Thm. 5.4). The graph in the lower-center depicts the orthology-relation that comprises all pairs (x, y) of vertices for which t(lca(x, y)) = s. The center-right gene tree (T ′ ; t, σ ) is non-binary and can directly be computed from the orthology-relation. Although S(T ′ ; t, σ ) is inconsistent, there is a valid reconciliation map µ to a species tree for (T ′ ; t, σ ) according to Def. 2 (right). Note, both trees (T ; t, σ ) and (T ′ ; t, σ ) satisfy axioms (O1)-(O3) and even (O3.A). However, the reconciliation map µ does not satisfy the extra Condition (M2.iv), since µ(z) and µ(a ′ ) = A are comparable, although z and a ′ are children of a common speciation vertex. Therefore, Axioms (O1)-(O3) and (O3.A) do not imply (M2.iv). Moreover, Thm. 5.7 implies that there is no restricted reconciliation map for (T ; t, σ ) as well as (T ′ ; t, σ ) and any species tree, since S(T ; t, σ ) and S(T ′ ; t, σ ) are inconsistent. See text for further details Fig. 4 From the binary gene tree (T ; t, σ ) (right) we obtain the species triples S(T ; t, σ ) = {(AB|D), (AC|D)}. Shown are two (tube-like) species trees (left and middle) that display S(T ; t, σ ). The respective reconciliation maps for T and S are given implicitly by drawing T within the species tree S. The left tree S is least resolved for S(T ; t, σ ). Although there is even a unique reconciliation map from T to S, this map is not time-consistent. Thus, no time-consistent reconciliation between T and S exists. On the other hand, for T and the middle species tree S ′ (that is a refinement of S) there is a time-consistent reconciliation map. Fig. 2 provides an example that shows that also least-resolved species trees can have a time-consistent reconciliation map with gene trees Hellmuth Algorithms Mol Biol (2017) 12:23 Proof Recall that G is the leaf set of T = (V , E) and, by Lemma 3.1, of T E . In what follows, we write L(u) instead of the more complicated writing L T E (u) and, for consistency and simplicity, we also often write σ (L(u)) instead of σ T E (u). Put S = (W , F ) and S = S(T ; t, σ ). We first consider the subset U = {x ∈ V | x ∈ G or t(x) = s}} of V comprising the leaves and speciation vertices of T.
In what follows we will explicitly construct µ : V → W ∪ F and verify that µ satisfies Conditions (M1), (M2) and (M3). To this end, we first set for all x ∈ U: Conditions (S1) and (M1), as well as (S2) and (M2.i) are equivalent.
Note, y must be an interior vertex, since x ≺ T E y.
We continue to extend µ to the entire set V. To this end, observe first that if t(x) ∈ {t, d} then we wish to map x on an edge µ(x) = (u, v) ∈ F such that Lemma 4.1 is satisfied: v S lca S (σ (L(x))). Such an edge exists for v = lca S (σ (L(x))) in S by construction. Every speciation vertex y with y ≻ T E x therefore necessarily maps on the vertex u or above, i.e., µ(y) S u must hold. Thus, we set: which now makes µ a map from V to W ∪ F.
We proceed to show that (M3) is satisfied.
If both x and y are speciation vertices, then we can apply the Claim 1 to conclude that µ(x) ≺ S µ(y). If x is a leaf, then we argue similarly as in the proof of Claim 1 to conclude that µ(x) S µ(y). Now assume that both x and y are interior vertices of T and at least one vertex of x, y is not a speciation vertex. Since, x ≺ T E y we have L(x) ⊆ L(y) and thus, σ (L(x)) ⊆ σ (L(y)).
Lemma 5.2 implies that consistency of the triple set S(T ; t, σ ) is necessary for the existence of a reconciliation map from (T ; t, σ ) to a species tree on S. Lemma 5.3, on the other hand, establishes that this is also sufficient. Thus, we have Theorem 5.4 There is a species tree on S = σ (G) for a binary gene tree (T ; t, σ ) on G if and only if the triple set S(T ; t, σ ) is consistent.

Non-binary gene trees
Now, we consider arbitrary, possibly non-binary gene trees that might be used to model incomplete knowledge of the exact genes phylogeny. Consider the "true" history of a gene tree that evolves along the (tube-like) species tree in Fig. 3 (left). The observable gene tree (T ; t, σ ) is shown in Fig. 3 (center-left). Since (ab|c), (b ′ c ′ |a ′ ) ∈ R 0 , we obtain a set of species triples S(T ; t, σ ) that contain the pair of inconsistent species triple (AB|C), (BC|A) . Thus, there is no reconciliation map for (T ; t, σ ) and any species tree, although (T ; t, σ ) is biologically feasible. Consider now the "orthology" graph G (shown below the gene trees) that has as vertex set G and two genes x, y are connected by an edge if lca(x, y) is a speciation vertex. Such graphs can be obtained from orthology inference methods [14,[36][37][38] and the corresponding non-binary gene tree (T ′ ; t, σ ) (center-right) is constructed from such estimates (see [5][6][7] for further details). Still, we can see that S(T ′ ; t, σ ) contains the two inconsistent species triples (AB|C), (BC|A). However, there is a reconciliation map µ according to Definition 2 and a species tree S, as shown in Fig. 3 (right). Thus, consistency of S(T ′ ; t, σ ) does not characterize whether there is a valid reconciliation map for non-binary gene trees.
In order to obtain a similar result as in Theorem 5.4 for non-binary gene trees we have to strengthen observability axiom (O3.a) to and to add an extra event constraint to Definition 2: (M2.iv) Let v 1 , . . . , v k be the children of the speciation vertex x. Then, µ(v i ) and µ(v j ) are incomparable in S, We call a reconciliation map that additionally satisfies (M2.iv) a restricted reconciliation map. Such restricted reconciliation maps satisfy the condition as required in [12,48] for the HGT-free case. It can be shown that restricted reconciliation maps imply Condition (O3.A), however, the converse is not true in general, see Fig. 3. Hence, we cannot use the axioms (O1)-(O3) and (O3.A) to derive Condition (M2.iv)-similar to Lemma 4.2(2)and thus, need to claim it. In particular, Condition (M2.iv) forbids ancestral relationships of the images µ(v i ) and µ(v i ) in S for any two distinct children v i and v j of a speciation vertex x. In Fig. 3 (right) a map µ is shown that violates Condition (M2.iv). Here, the images µ(z) and µ(a ′ ) are comparable. The latter might happen, if there are unrecognized HGT events followed by a loss. Condition (M2.iv) is a quite strong restriction, however, it is indispensable for the characterization of reconciliation maps for non-binary gene trees in terms of informative triples, as we shall see soon.
It is now straightforward to obtain the next result.
First assume that (ab|c) ∈ R 0 , that is (ab|c) is displayed in T E and t(lca T E (a, b, c)) = s. For simplicity set u = lca T E (a, b, c). Hence, there are two children x, y of u in T E such that w.l.o.g. a, b ∈ L T E (x) and c ∈ L T E (y). Now we can argue analogously as in the proof of Lemma 5.2 after replacing "we can apply Lemma 4.2(2)" by "we can apply Condition (M2.iv)". The proof for (ab|c) ∈ R i remains the same as in Lemma 5.2. Lemma 5.6 Let S be a species tree on S. Then, there is a restricted reconciliation map µ from a gene tree (T ; t, σ ) that satisfies also (O3.A) to S whenever S displays all triples in S(T ; t, σ ).
Proof The proof is similar to the proof of Lemma 5.6. However, note that a speciation vertex might have more than two children. In these cases, one simply has to apply Axiom (O3.A) instead of Lemma (O3.a) to conclude that (M1), (M2.i)-(M2.iii), (M3) are satisfied.
It remains to show that (M2.iv) is satisfied. To this end, let x be a speciation vertex in T and the set of its children Consider the following partition of Ch(x) into Ch 1 and Ch 2 that contain all vertices v i with |σ T E (v i )| = 1 and |σ T E (v i )| > 1 , respectively. By construction of µ, for all vertices in v i , v j ∈ Ch 1 , i � = j we have that µ(v i ) ∈ {σ (v i ), (u, σ (v i ))} and µ(v j ) ∈ {σ (v j ), (u ′ , σ (v j ))} are incomparable. Now let v i ∈ Ch 1 and v j ∈ Ch 2 . Thus, there are A, B ∈ σ T E (v j ) and σ (v i ) = C. Hence, (AB|C) ∈ S(T ; t, σ ) Therefore, lca S (A, B) must be incomparable to C in S. Since the latter is satisfied for all species in σ T E (v j ) , lca S (σ T E (v j )) and C must be incomparable in S. Again, by construction of µ, we see that then all triples (AB|C) and (CD|A) for all A, B ∈ σ T E (v j ) and C, D ∈ σ T E (v j ) are contained in S(T ; t, σ ) and thus, displayed by S. Hence, lca S (σ T E (v i )) and lca S (σ T E (v j )) must be incomparable in S. Again, by construction of µ, we obtain that µ(v i ) ∈ {lca S (σ T E (v i )), (u, lca S (σ T E (v i )))} and µ(v j ) ∈ {lca S (σ T E (v j )), (u ′ , lca S (σ T E (v j )))} are incomparable in S. Therefore, (M2.iv) is satisfied.
As in the binary case, we obtain Theorem 5.7 There is a restricted reconciliation map for a gene tree (T ; t, σ ) on G that satisfies also (O3.A) and some species tree on S = σ (G) if and only if the triple set S(T ; t, σ ) is consistent.

Algorithm
The proof of Lemmas 5.3 and 5.6 is constructive and we summarize the latter findings in Algorithm 1, see Fig. 2 for an illustrative example.
Lemma 5.8 Algorithm 1 returns a species tree S for a binary gene tree (T ; t, σ ) and a reconciliation map µ in polynomial time, if one exists and otherwise, returns that there is no species tree for (T ; t, σ ).
If (T ; t, σ ) is non-binary but satisfies Condition (O3.A), then Algorithm 1 returns a species tree S for (T ; t, σ ) and a restricted reconciliation map µ in polynomial time, if one exists and otherwise, returns that there is no species tree for (T ; t, σ ). Proof Theorem 5.4 and the construction of µ in the proof of Lemmas 5.3 and 5.6 implies the correctness of the algorithm.
For the runtime observe that all tasks, computing S(T ; t, σ ), using the BUILD algorithm [64,68] and the construction of the map µ [10, Cor.7] can be done in polynomial time.
In our examples, the species trees that display S(T ; t, σ ) is computed using the O(|L R ||R|) time algorithm BUILD, that either constructs a tree S that displays all triples in a given triple set R or recognizes that R is not consistent. However, any other supertree method might be conceivable, see [65] for an overview. The tree T returned by BUILD is least resolved, i.e., if T ′ is obtained from T by contracting an edge, then T ′ does not display R anymore. However, the trees generated by BUILD do not necessarily have the minimum number of internal vertices, i.e., the trees may resolve multifurcations in an arbitrary way that is not implied by any of the triples in R. Thus, depending on R, not all trees consistent with R can be obtained from BUILD. Nevertheless, in [11, Prop. 2(SI)] the following result was established.

Lemma 5.9
Let R be a consistent triple set. If the tree T obtained with BUILD applied on R is binary, then T is a unique tree on L R that displays R, i.e., for any tree T ′ on L R that displays R we have T ′ ≃ T.
So-far, we have shown that event-labeled gene trees (T ; t, σ ) for which a species tree exists can be characterized by a set of species triples S(T ; t, σ ) that is easily constructed from a subset of triples displayed in T. From a biological point of view, however, it is necessary to reconcile a gene tree with a species tree such that genes do not "travel through time". In [63], the authors gave algorithms to check whether a given reconciliation map µ is timeconsistent and for the construction of a time-consistent reconciliation maps, provided one exists. These algorithms require as input an event-labeled gene tree and species tree. Hence, a necessary condition for the existence of time-consistent reconciliation maps is given by consistency of the species triple S(T ; t, σ ) derived from (T ; t, σ ). However, there are possibly exponentially many species trees that are consistent with S(T ; t, σ ) for which some of them have a time-consistent reconciliation map with T and some not, see Fig. 4. The question therefore arises as whether there is at least one species tree S with time-consistent map, and if so, construct S.

Limitations of informative triples and reconciliation maps
In "Non-binary gene trees" section we have already discussed that consistency of S(T ; t, σ ) cannot be used to characterize whether there is a reconciliation map that doesn't need to satisfy (M2.iv) for some non-binary gene tree, see Fig. 3. In particular, Fig. 3 shows a biologically feasible binary gene trees (center-left) for which, however, neither a reconciliation map nor a restricted reconciliation map exists.
A further simple example is given in Fig. 5. Consider the "true" history of the gene tree that evolves along the (tube-like) species tree in Fig. 5 (left). The set of extant genes G comprises a, a ′ , b, b ′ , c and c ′ and σ maps each gene in G to the species (capitals below the genes) A, B, C ∈ S. For the observable gene tree (T ; t, σ ) in Fig. 5 (center) we observe that R 0 = {(ab|c), (b ′ c ′ |a ′ )} and thus, one obtains the inconsistent species triples S(T ; t, σ ) = {(AB|C), (BC|A)}. Hence, Theorem 5.4 implies that there is no species tree for (T ; t, σ ). Note, (T ; t, σ ) satisfies also Condition (O3.A). Hence, Theorem 5.7 implies that no restricted reconciliation map to any species tree exists for (T ; t, σ ). Nevertheless, (T ; t, σ ) is biologically feasible as there is a true scenario that explains the gene tree. Now consider the gene tree (T ; t, σ ) in Fig. 6 (right). The set S(T ; t, σ ) is consistent. Both, the species trees S that displays all informative triples and the reconciliation map µ from (T ; t, σ ) to S, are unique. However, µ is not timeconsistent. Uniqueness of S and µ implies that there is no time-consistent reconciliation map for (T ; t, σ ) to any species tree. Thus, consistency of S(T ; t, σ ) does not imply the existence of time-consistent reconciliation maps. It can be shown that (T ; t, σ ) is biologically feasible.

Fig. 5
Shown is a binary and biologically feasible gene tree (T ; t, σ ) (center) that is obtained as the observable part of the true scenario (left). However, there is no reconciliation map for (T ; t, σ ) to any species tree according to Def. 2 because S(T ; t, σ ) is inconsistent. Nevertheless, a relaxed reconciliation map µ between (T ; t, σ ) and the species tree exists (right). However, this map does not satisfy Lemma 4.2(2) since µ(a ′ ) = A and µ(lca T E (b ′ , c ′ )) are comparable. See text for further details

Conclusion and open problems
Event-labeled gene trees can be obtained by combining the reconstruction of gene phylogenies with methods for orthology and HGT detection. We showed that eventlabeled gene trees (T ; t, σ ) for which a species tree exists can be characterized by a set of species triples S(T ; t, σ ) that is easily constructed from a subset of triples displayed in T.
We have shown that biological feasibility of gene trees cannot be explained in general by reconciliation maps, that is, there are biologically feasible gene trees for which no reconciliation map to any species tree exists.
We close this contribution by stating some open problems that need to be solved in future work.
1. Are all event-labeled gene trees (T ; t, σ ) biologically feasible? If not, how are biologically feasible gene trees characterized and what is the computational complexity to recognize them? 2. The results established here are based on informative triples provided by the gene trees. If it is desired to find "non-restricted" reconciliation maps (those for which Condition (M2.iv) is not required) for nonbinary gene trees the following question needs to be answered: How much information of a non-restricted reconciliation map and a species tree is already contained in non-binary event-labeled gene trees (T ; t, σ ) ? The latter might also be generalized by considering relaxed reconciliation maps (those for which µ(x) ≻ S lca S (σ T E (x)) for speciation vertices x or any other relaxation is allowed). 3. Our results depend on three axioms (O1)-(O3) on the event-labeled gene trees that are motivated by the fact that event-labels can be assigned to internal vertices of gene trees only if there is observable information on the event. The question which event-labeled gene trees are actually observable given an arbitrary, true evolutionary scenario deserves further investigation in future work, since a formal theory of observability is still missing. 4. The definition of reconciliation maps is by no means consistent in the literature. For the results established here we considered three types of reconciliation maps, that is, the "usual" map as in Def. 2 (as used in, e.g. [10,31,54,63]), a restricted version (as used in, e.g. [12,48]) and a relaxed version. However, a unified framework for reconciliation maps is desirable and might be linked with a formal theory of observability. 5. "Satisfiable" event-relations R 1 , . . . , R k are those for which there is a representing gene tree (T ; t, σ ) such that (x, y) ∈ R i if and only if t(lca(x, y)) = i. They are Fig. 6 Shown is a (tube-like) species trees S with reconciled gene tree (T ; t, σ ) (taken from [63]). The informative triple set S(T ; t, σ ) is consistent and application of Lemma 5.9 shows that S is unique. Moreover, the reconciliation map µ is unique, however, not time-consistent. Thus, although S(T ; t, σ ) is consistent, there is no time-consistent reconciliation map for (T ; t, σ ) and S. Nevertheless, it can be shown that (T ; t, σ ) is biologically feasible