We now give a fast heuristic for Shortest Hyperpaths that always finds an s,thyperpath if one exists. While the heuristic is not guaranteed to find a shortest s,thyperpath in general, our later experiments on real cellsignaling hypergraphs show it quickly finds a hyperpath that is optimal or remarkably close to optimal on the vast majority of instances in comprehensive experiments over the two standard cellsignaling databases in the literature. Furthermore, we will prove that the heuristic is guaranteed to find a shortest s,thyperpath for the class of singletontail hypergraphs, where the tailsets of all hyperedges are single vertices.
We present the heuristic by providing detailed pseudocode at a level that can be directly implemented, as the heuristic is carefully designed and many of its component algorithms are surprisingly tricky to implement correctly. After describing the heuristic, we give a time analysis that shows it is always efficient, prove its feasibility, and then show that it finds optimal hyperpaths for singletontail hypergraphs.
While at a high level the heuristic has some aspects in common with Dijkstra’s algorithm for singlesource shortest paths in an ordinary directed graph (see [26, pp. 658–659])—in that the heuristic maintains a heap of elements prioritized by estimated path lengths—it has significant differences. In contrast to Dijkstra’s algorithm, the heuristic is edgebased, rather than vertexbased, and the heap maintains hyperedges e prioritized by the length of the shortest known hyperpath from the source s to edge e, which will be formally defined later. Also in contrast to Dijkstra’s algorithm, maintaining a single inedge to a vertex no longer suffices for recovering a path back to source s; instead, recovering an s,thyperpath now requires the heuristic to maintain a set of inedges to each hyperedge e that are candidates for the final edges on the path from s to e. Furthermore, the total length of a hyperpath P to e is no longer a simple function (like a minimum or a sum) of the lengths of hyperpaths to the inedges of e in P that cover the tail of e, since the constituent hyperpaths within P to these inedges of e can have arbitrarilycomplicated sharing of hyperedges. Simply determining the length of the best recovered hyperpath for a hyperedge e on the heap, using these stored inedges to each hyperedge, is itself now a hard combinatorial problem, which the heuristic tackles by a carefullyconstructed greedy procedure.
The overall structure of the heuristic is a breadthfirst search over the hyperedges e reachable from source s, ordered by the estimated length of the shortest hyperpath from s to e. (Admittedly a shortest s,thyperpath P is not necessarily composed of shortest hyperpaths from s to individual hyperedges e in P, which is partly why this approach is a heuristic.) The search repeatedly invokes a greedy procedure to recover the currently bestknown hyperpath to e in order to evaluate its length. As hyperpaths are by definition minimal superpaths, to determine minimality this greedy recovery procedure repeatedly tests reachability of hyperedges. Moreover, for efficiency, the overall breadthfirst search proceeds over a smaller subgraph of the input hypergraph that only contains hyperedges that are reachable both from source s and in reverse from sink t. Hence at base, the heuristic builds upon fast algorithms for computing reachability in a hypergraph.
Accordingly, to present the heuristic, we first give pseudocode for these fundamental algorithms for directed reachability. These algorithms use the following terminology of forwardreachable, backwardtraceable, and doublyreachable, which we define next.
Definition 4
(Reachability and Traceability) Vertex v is forward reachable from source s in hypergraph G if there is an s,vsuperpath in G. Hyperedge e is forward reachable from s if all vertices \(v \in \mathrm {tail}(e)\) are forward reachable from s.
Vertex v is backward traceable from sink t if \(v = t\), or recursively if \(v \in \mathrm {tail}(e)\) for an edge e where some \({w \in \mathrm {head}(e)}\) is backward traceable from t. Hyperedge e is backward traceable from t if some \({v \in \mathrm {head}(e)}\) is backward traceable from t.
A vertex v or hyperedge e is doubly reachable if v or e, respectively, is both forward reachable from s and backward traceable from t.\({\square}\)
To describe the heuristic, it will also be convenient to extend the definitions of superpath and hyperpath to a path from a source s to a hyperedge e.
Definition 5
(Superpath and Hyperpath from Source to Hyperedge) An s,esuperpath is an edge subset S with \(e \in S\) where all vertices in \(\mathrm {tail}(e)\) are forward reachable from source s using hyperedges in S. An s,ehyperpath is a minimal s,esuperpath.\({\square}\)
The pseudocode that we present accesses a hypergraph G through the fields G.vertices and G.edges. We access the tailset and headset of a hyperedge e through the fields e.head and e.tail. We access the set of inedges and outedges of a vertex v through the fields v.in and v.out. For a list Q that is handled as a queue, the operation Q.Put(x) appends item x to the rear of Q, while the operation Q.Get() removes and returns the item at the front of Q. For a minheap H, the operation H.Insert(x, k) inserts item x with key k into H, and returns a reference p to the heap node containing this pair (x, k) in H; the operation H.Extract() removes and returns the item in H with minimum key; and the operation H.Decrease(p, k) takes a reference p to a heap node in H and decreases its key to k if k is smaller than its current key. All functions assume hypergraph G is passed by reference.
Figure 2 gives pseudocode for the two functions ForwardReachable and BackwardTraceable. Function ForwardReachable returns the set of all hyperedges that are forward reachable from source s, while function BackwardTraceable returns the set of hyperedges that are backward traceable from sink t. Function ForwardReachable uses the Boolean vertex field v.reached, and the integer edge field e.count, which it assumes have already been initialized to the values \(v.\text {reached} = \textsf {false}\) for all \(v \in V\) and \(e.\text {count} = \bigl \mathrm {tail}(e)\bigr \) for all \(e \in E\). Function BackwardTraceable also uses the Boolean edge field e.marked, which it similarly assumes is initialized to false for all e. (This initialization will be done once for hypergraph G in the shortest hyperpath heuristic, which allows these functions when called repeatedly to run in time bounded by just the size of the forwardreachable or backwardtraceable subgraphs.) Function ForwardReachable uses the field e.count to detect when all vertices in \(\mathrm {tail}(e)\) have been reached from s, and hence e is now reached from s. Function BackwardTraceable performs a similar but simpler computation in reverse from sink t. The worstcase time for both these functions is linear in the size of the subgraph they explore, as analyzed in the following section on the timecomplexity of the heuristic.
Figure 3 gives pseudocode for the function ShortestHyperpathHeuristic, our heuristic. Like Dijkstra’s shortest path algorithm for ordinary graphs, this function maintains a heap H, but in contrast to Dijkstra’s algorithm this is now a heap of hyperedges e (rather than a heap of vertices), which are prioritized by keys that are the best known estimate of the length of a shortest s,ehyperpath. We refer to this estimate as the current path length for e. The heuristic starts from the outedges of source s, and in a reaching computation repeatedly extracts from heap H the hyperedge e with minimum key. When hyperedge e is removed from H, the estimated path length for e is evaluated, and stored in field e.length. To compute this length estimate, it must construct the best s,ehyperpath it can find, and evaluate its total weight. Of course, computing an optimal s,ehyperpath is NPcomplete, so it uses a greedy heuristic to construct this path by the function RecoverShortPath. This greedy pathconstruction heuristic consists of two steps: (1) recovering an s,esuperpath by recursively backwardtraversing hyperedges that enter \(\mathrm {tail}(e)\), followed by (2) finding a minimal subset of this superpath that is an s,ehyperpath while attempting to minimize its total weight.
Figure 4 gives pseudocode for the function RecoverShortHyperpath that implements this greedy pathconstruction heuristic. For the first step, recovering the s,esuperpath S is done by recursively backwardtraversing what we call inedges: those hyperedges whose headsets intersect the tailset of a given hyperedge. Function ShortestHyperpathHeuristic maintains for a hyperedge e the field e.inedges, which stores the subset of inedges f to e whose field f.length has been determined.
For the second step, function RecoverShortHyperpath attempts to find the minimum weight subset of S that is still a superpath by greedily considering hyperedges \(f \in S\) for removal in decreasing order of f.length, which is the estimated total length of a shortest s,fhyperpath. (Note this is more sophisticated than a naive greedy approach that simply removes hyperedges f in decreasing order of their edgeweight \(\omega (f)\), which would degenerate to removing edges in random order in real cellsignaling networks where hyperedges typically all have unit weight, and hence would all be tied for removal.) This greedy process for trimming superpath S repeatedly tests whether \(\mathrm {tail}(e)\) is still reachable from s on removing f by calling Boolean function IsReachable. Pseudocode for IsReachable is not given, but it simply implements a version of function ForwardReachable that halts and returns true as soon as it adds e to the set of hyperedges reachable from s, or returns false after collecting the entire reachable set without encountering e.
We note that most of the computation of the shortest hyperpath heuristic proceeds over a much smaller subgraph of the input hypergraph G: namely the subgraph induced by the hyperedges \({D \subseteq E}\) that are doubly reachable (both forward reachable from s and backward traceable from t). This preserves correctness, since hyperedges that are not doubly reachable cannot be on an s,thyperpath and can safely be ignored (as argued in the later section on feasibility of the heuristic in the proof of Theorem 2).
To summarize, the shortest hyperpath heuristic proceeds greedily like Dijkstra’s algorithm, but with some important differences: it maintains a heap of hyperedges prioritized by estimated shortest path lengths to tailsets, records a set of potential inedges to a given hyperedge used for recovering a hyperpath to the hyperedge, and recovering such a hyperpath now involves another greedy heuristic to find a minimal superpath of small total weight.
Our later section on experimental results shows this heuristic is remarkably close to optimal on real cellsignaling hypergraphs. Given that no practical exact algorithm exists for general shortest hyperpaths, we determine the optimum by enumerating all s,thyperpaths and taking the minimum of their lengths, using an algorithm we develop in the later section on tractably generating all sourcesink hyperpaths.
We note for this heuristic that the inapproximability of the shortest hyperpath problem [16], together with the polynomial time analysis of the next subsection, imply that unless \(\text {P} = \text {NP}\), the heuristic cannot be a constantfactor approximation algorithm for shortest hyperpaths.
In the following subsections, we first analyze the running time of the heuristic; then show it always finds a feasible solution whenever one exists; and finally prove it actually finds an optimal solution for the class of singletontail hypergraphs.
Time complexity of the heuristic
We now bound the time complexity of the shortest hyperpath heuristic. Our analysis is in terms of the following parameters measured on a hypergraph, or an induced subgraph. For a hypergraph G with vertices V and hyperedges E, we denote its number of vertices and hyperedges by
$$\begin{aligned} n\,\,:=\,\, & {} \,V \, , \\ m\,\,:=\,\, & {} \,E \, . \end{aligned}$$
We also use the size parameter
$$\begin{aligned} \ell \;\;:=\;\; \sum _{e \,\in \, E} \Bigl ( \bigl \mathrm {tail}(e)\bigr  \;+\; \bigl \mathrm {head}(e)\bigr  \Bigr ) \, , \end{aligned}$$
and degree parameter
$$\begin{aligned} d \;\;:=\;\; \max _{v \,\in \, V} \, \Bigl \{\, \bigl \text {in}(v)\bigr , \, \bigl \text {out}(v)\bigr  \,\Bigr \} \, . \end{aligned}$$
Note that in general, the space required to represent all hyperedges is \(\Theta (\ell )\). We assume all tail and head sets are nonempty, and every vertex is touched by a hyperedge, which implies \(m + n = O(\ell )\). When we need to refer to these measures for a particular hypergraph G, such as on an induced subgraph, we explicitly subscript the parameters by the specific hypergraph, such as \(n_G, \ldots , d_G\), where these parameters are then measured in terms of the vertices and edges of the subscripted hypergraph G.
The running time of the shortest hyperpath heuristic may be expressed as a function of parameters measured on both the input hypergraph and its doublyreachable subgraph (induced by the hyperedges that are simultaneously forward reachable from the source and backward traceable from the sink).
Theorem 1
(Time complexity of the heuristic) The time complexity of the shortest hyperpath heuristic, in terms of the number of hyperedges m and size parameter \(\ell\) for both the input hypergraph G and its doublyreachable subgraph H, is
$$\begin{aligned} O\Bigl ( \ell _G \,\,+\,\, \ell _H \, m_H^2 \Bigr ) \, . \end{aligned}$$
Proof
To bound the running time of the function ShortestHyperpathHeuristic, we analyze in turn its component functions ForwardReachable, BackwardTraceable, and RecoverShortHyperpath. The running time of the reachability computations ForwardReachable and BackwardTraceable (in Fig. 2) can be expressed in an outputsensitive way in terms of the size of the edge sets they return.
For ForwardReachable, let \(R \subseteq V\) be the set of vertices reachable from source s, and \(F \subseteq E\) be the set of hyperedges reachable from s that are returned. The total time for ForwardReachable is dominated by the time for its main whileloop, which takes time \(\Theta \bigl ( \, \sum _{v \in R} \, \bigl \text {out}(v)\bigr  \,\,+\,\, \sum _{e \in F} \, \bigl \mathrm {head}(e)\bigr  \, \bigr )\), or equivalently,
$$\begin{aligned} \Theta \biggl ( \, \sum _{e \,\in \, E} \, \bigl \mathrm {tail}(e) \,\cap \, R\bigr  \,\,\,+\,\,\, \sum _{f \,\in \, F} \, \bigl \mathrm {head}(f)\bigr  \, \biggr ) \,\,\,=\,\,\, O\bigl ( \ell _G \bigr ) \, . \end{aligned}$$
For BackwardTraceable, let \(B \subseteq V\) be the set of vertices it reaches from sink t, and \(F \subseteq E\) be the set of hyperedges traceable from t that are returned. A similar analysis shows the time for BackwardTraceable is
$$\begin{aligned} \Theta \biggl ( \, \sum _{e \,\in \, E} \, \bigl \mathrm {head}(e) \,\cap \, B\bigr  \,\,\,+\,\,\, \sum _{f \,\in \, F} \, \bigl \mathrm {tail}(f)\bigr  \, \biggr ) \,\,\,=\,\,\, O\bigl ( \ell _G \bigr ) \, . \end{aligned}$$
So the time for both ForwardReachable and BackwardTraceable on the input hypergraph G is \(O\bigl (\ell _G\bigr )\)— but can be bounded more tightly in terms of the subgraph of G they actually explore.
For the function RecoverShortHyperpath (in Fig. 4), when it is called by ShortestHyperpathHeuristic, all its computations are performed on G restricted to the edge subset \(D \subseteq E\) of doublyreachable hyperedges. We denote by hypergraph H the doublyreachable subgraph induced by D.
In RecoverShortHyperpath, the time to recover s,esuperpath S by tracing back from e is at most
$$\begin{aligned} O\biggl ( \, \sum _{f \,\in \, S} \,\, \sum _{v \,\in \, \mathrm {tail}(f)} \,\, \bigl  \text {in}(v) \bigr  \, \biggr ) \,\,\,=\,\,\, O\Bigl ( d_H \, \ell _H \Bigr ) \, . \end{aligned}$$
The time to greedily trim superpath S to s,ehyperpath \(P \subseteq S\), in terms of cardinality \(k = S\), is at most
$$\begin{aligned} O\Bigl ( m_H \,\,+\,\, k \,\log \, k \,\,+\,\, k \, \ell _H \Bigr ) \,\,\,=\,\,\, O\Bigl ( k \, \ell _H \Bigr ) \, . \end{aligned}$$
Thus the total time for RecoverShortHyperpath is
$$\begin{aligned} O\Bigl ( d_H \, \ell _H \Bigr ) \,\,+\,\, O\Bigl ( k \, \ell _H \Bigr ) \,\,\,=\,\,\, O\Bigl ( \ell _H \, m_H \Bigr ) \, . \end{aligned}$$
For the function ShortestHyperpathHeuristic (in Fig. 3), we break its time down into the following components. The time for the initialization, collecting the doublyreachable edges D by calling ForwardReachable and BackwardTraceable, and restricting G to its subgraph H induced by D, is \(O\bigl (\ell _G\bigr )\). The main whileloop executes for \(m_H\) iterations, and spends \(O\bigl (m_H \, \log \, m_H\bigr )\) time for all Extracts. The total time across all iterations to compute s,ehyperpath P for all extracted edges e by calling RecoverShortHyperpath is \(O\bigl (\ell _H \, m_H^2\bigr )\). The total time to collect the outedges F for the extracted e across all iterations is \(O\bigl ( \sum _{e \in D} \, \sum _{v \in \mathrm {head}(e)} \, \bigl  \text {out}(v) \bigr  \bigr ) \,=\, O\bigl (d_H \, \ell _H\bigr )\). The total time across all iterations for Decrease and Insert, which take O(1) amortized time per edge in F using a Fibonacci heap (see [26, pp. 510–522]), is also \(O\bigl (d_H \, \ell _H\bigr )\). The time to recover the best s,thyperpath \(P^*\) is \(O\bigl (d_H \, \ell _H \, m_H \bigr )\).
Finally, adding up the bounds for the above components, the total time for the shortest hyperpath heuristic is
$$\begin{aligned} O\bigl (\ell _G\bigr ) \,\,+\,\, O\bigl (m_H \,\log \, m_H\bigr ) \,\,+\,\, O\bigl (\ell _H \, m_H^2\bigr ) \,\,+\,\, O\bigl (d_H \, \ell _H\bigr ) \,\,+\,\, O\bigl (d_H \, \ell _H \, m_H\bigr ) \, , \end{aligned}$$
which is in turn \(O\bigl (\ell _G \,+\, \ell _H \, m_H^2\bigr )\).\({\square}\)
Notice that the overall running time of the heuristic is dominated by the total time to recover short hyperpaths, which requires invoking RecoverShortHyperpath whenever the path length to a hyperedge is updated. This is necessary in hypergraphs, since in contrast to ordinary graphs the length of the hyperpath to a hyperedge can no longer be expressed as a simple function (such as a minimum or a sum) of the lengths of the hyperpaths to its inedges.
As demonstrated in our later section on experimental results, for real biological instances the size of the doublyreachable subgraph H is significantly smaller than the full input hypergraph G, so designing the heuristic to compute mainly over the much smaller hypergraph H yields a significant performance speedup in practice.
Next we show the heuristic always finds a feasible solution.
Feasibility of the heuristic
The most basic property that a heuristic for a combinatorial optimization problem should satisfy is feasibility: that it always returns a feasible solution whenever one exists. In the context of Shortest Hyperpaths, a feasible solution is any s,thyperpath, while an optimal solution is a feasible solution of minimum total edgeweight.
For the hyperpath heuristic, we now show feasibility.
Theorem 2
(Feasibility of the heuristic) The shortest hyperpath heuristic finds a sourcesink hyperpath whenever one exists.
Proof
Function ShortestHyperpathHeuristic (in Fig. 3) first restricts the input hypergraph G to its doublyreachable subgraph, consisting of the hyperedges D that are both forward reachable from source s and backward traceable from sink t. Note that functions ForwardReachable and BackwardTraceable (in Fig. 2) together correctly collect these doublyreachable hyperedges D: function ForwardReachable explores breadthfirst the hyperedges that are forward reachable from s, maintaining a counter for each hyperedge e that records the number of vertices in its tail that have not yet been reached from s, and detecting when e is reached by this counter hitting zero; while function BackwardTraceable directly implements Definition 4 of backward traceability from t.
Furthermore, we claim that when restricting to the doublyreachable subgraph \({\widetilde{G}}\), the heuristic does not lose any hyperedges on sourcesink hyperpaths. Note that any hyperedge e on an s,thyperpath P in the input hypergraph G is forward reachable from s: consider the ordering of hyperedges in P from Definition 1, and take the prefix of this ordering up through e; this prefix is an s,esuperpath, so e is by definition forward reachable from s. Note also that any e on P in G is backward traceable from t as well: if \({t \in \mathrm {head}(e)}\), backward traceability immediately holds; otherwise, in the ordering of P there must be a hyperedge f following e with nonempty \(\mathrm {head}(e) \,\cap \, \mathrm {tail}(f)\) (else e can be removed from P, contradicting minimality); applying this same process again at f yields a subsequence of the ordering of P that ends in a hyperedge whose head contains t; considering this subsequence in reverse order satisfies Definition 4 for backward traceability of e from t. Hence restricting to the doublyreachable subgraph \({\widetilde{G}}\) is safe.
To show the implication of the theorem, notice ShortestHyperpathHeuristic explores all hyperedges that are forward reachable from s in \({\widetilde{G}}\), inserting hyperedge e into heap H when e is initially reached, again detecting when traversing e causes another hyperedge f to be first reached using counter f.count, and recording in field f.inedges all such e that have reached f. So if an s,thyperpath exists in G, which implies sink t has an inedge e that is forward reachable from s in \({\widetilde{G}}\), this e will eventually be inserted into H, making e.node nonnil, and at the end of the heuristic causing RecoverShortHyperpath to be called on e.
We claim that when function RecoverShortHyperpath (in Fig. 4) is ultimately called on an inedge to sink t, phase (I) first recovers an edge set S that is an s,tsuperpath in G. Considering the hyperedges of S in reverse order of their removal from queue Q, they satisfy the three conditions for an s,tsuperpath in Definition 1: the last hyperedge removed from Q solely has s in its tail, each hyperedge in S (other than this last one) has its tail set covered by hyperedges removed later from Q, and the first edge removed has t in its head.
Function RecoverShortHyperpath in phase (II) then trims S to a minimal s,tsuperpath, yielding an s,thyperpath. Finally, ShortestHyperpathHeuristic returns the shortest such hyperpath found.
Thus whenever a sourcesink hyperpath exists, the heuristic finds one.\({\square}\)
Next we prove the heuristic actually solves Shortest Hyperpaths when the input is a singletontail hypergraph.
Optimality of the heuristic for singletontail hypergraphs
While our heuristic does not necessarily find shortest hyperpaths in general hypergraphs, we can prove that it does find optimal solutions for the following class of hypergraphs.
A singletontail hypergraph is a directed hypergraph G where every hyperedge e in G has \(\bigl \mathrm {tail}(e)\bigr  = 1\). (The head sets of hyperedges can be arbitrary.) In other words, in singletontail hypergraphs, the tails of hyperedges are single vertices.
At a high level, the optimality argument for singletontail hypergraphs first shows that shortest sourcesink hyperpaths are composed of shortest s,ehyperpaths; then argues that the heuristic’s greedy superpath trimming recovers shortest s,ehyperpaths when the hyperedge fields hold shortest hyperpath lengths; and finally proves that the heuristic computes exact shortest s,ehyperpath lengths.
The following characterization states that in singletontail hypergraphs, a shortest s,thyperpath is composed of shortest s,ehyperpaths to its constituent hyperedges. This does not hold for general hypergraphs, and is partly why the special case of shortest singletontail hyperpaths is polynomialtime solvable.
Lemma 1
(Characterizing shortest singletontail hyperpaths) In singletontail hypergraphs with nonnegative edge weights, every shortest s,thyperpath can be ordered as a sequence \(e_1 \cdots e_k\) of hyperedges where

(i)
each \(\mathrm {head}(e_i) \,\supseteq \, \mathrm {tail}(e_{i+1})\), and

(ii)
every prefix \(e_1 \cdots e_i\) is a shortest \(s,e_i\)hyperpath.
Proof
Consider a shortest s,thyperpath P in a singletontail hypergraph. By definition, P is a minimal s,tsuperpath, so its edges can be ordered as a sequence \({e_1 \cdots e_k}\) where \(\mathrm {tail}(e_1) = \{s\}\), \({\mathrm {head}(e_k) \supseteq \{t\}}\), and since tail sets contain a single vertex, for every hyperedge \(e_j\) in this sequence other than the first one, there is a prior hyperedge \(e_i\) with \({\mathrm {head}(e_i) \,\supseteq \, \mathrm {tail}(e_j)}\).
Starting from the last hyperedge \(e_k\), and repeatedly picking a prior hyperedge whose head covers the tail of the current hyperedge until reaching tail \(\{s\}\), yields a subsequence \(f_1 \cdots f_\ell\) specifying subset \({Q \,=\, \{f_1, \ldots , f_\ell \} \,\subseteq \, P}\), where again \({\mathrm {tail}(f_1) = \{s\}}\), \({\mathrm {head}(f_\ell ) \supseteq \{t\}}\), and now \({\mathrm {head}(f_i) \,\supseteq \, \mathrm {tail}(f_{i+1})}\) for \({1 \!\le \! i \!<\! \ell }\). Furthermore \({Q = P}\), otherwise P is not minimal. So subsequence \(f_1 \cdots f_\ell\) is exactly sequence \(e_1 \cdots e_k\).
Clearly every prefix \(e_1 \cdots e_i\) is an \(s,e_i\)superpath. Moreover this prefix must be a minimal \(s,e_i\)superpath, otherwise P is not minimal. Thus every prefix ending in \(e_i\) is an \(s,e_i\)hyperpath.
Finally, every prefix \(e_1 \cdots e_i\) must be a shortest \(s,e_i\)hyperpath. Otherwise, replacing this prefix by a shortest \(s,e_i\)hyperpath yields an s,tsuperpath S of total weight less than P. Furthermore, trimming S to a minimal s,tsuperpath under nonnegative edge weights yields an s,thyperpath of total weight less than P, contradicting the optimality of P.\({\square}\)
In the following, the distance of hyperedge e from source s is the total weight of a shortest s,ehyperpath, which we denote by d(e). Recall that function ShortestHyperpathHeuristic (in Fig. 3) maintains the field e.length, that holds the total weight of the bestknown s,ehyperpath, which upper bounds d(e).
The next lemma states that in singletontail hypergraphs, given two key conditions, the greedy superpath trimming that is used by the heuristic to recover a hyperpath to hyperedge e in fact finds a shortest s,ehyperpath.
Lemma 2
(Recovering hyperpaths in singletontail hypergraphs) In a singletontail hypergraph with nonnegative edge weights, when the hyperpath heuristic recovers a hyperpath from source s to hyperedge e, suppose

(i)
field e.inedges contains among its hyperedges an inedge to e from a shortest s,ehyperpath, and

(ii)
in the s,esuperpath S found when recovering a hyperpath to e, for all hyperedges \(f \in S \!\! \{e\}\), field f.length holds distance d(f).
Then the hyperpath to e that the heuristic recovers is a shortest s,ehyperpath.
Proof
We first claim that under the assumptions of the lemma, when the hyperpath heuristic calls RecoverShortHyperpath (in Fig. 4) on a hyperedge e, its first phase recovers an s,esuperpath S that contains a shortest s,ehyperpath. By assumption (i), field e.inedges contains a hyperedge f on a shortest s,ehyperpath, and f will be in superpath S, hence by assumption (ii), the value of f.length is d(f). This value came from a shortest s,fhyperpath Q that was found in a prior call to RecoverShortHyperpath on f, by trimming an s,fsuperpath T. Notice that Q followed by e is an s,esuperpath \({\widetilde{P}}\), as \(\mathrm {head}(f) \supseteq \mathrm {tail}(e)\). Now trim \({\widetilde{P}}\) to an s,ehyperpath P, and let \(P^*\) be a shortest s,ehyperpath containing f that exists by assumption (i). By Lemma 1 and minimality of hyperpaths, \(P^*\) must consist of a shortest s,fhyperpath \(Q^*\) followed by e. Under nonnegative edge weights,
$$\begin{aligned} \omega (P) \,\,\le & \,\,\, \omega ({\widetilde{P}}) \\ \,\,= & \,\,\, \omega (Q) + \omega (e) \\ \,\,= & \,\,\, \omega (Q^*) + \omega (e)\\ \,\,= & \,\,\, \omega (P^*) \, . \end{aligned}$$
Thus P is also a shortest s,ehyperpath. Since f is in e.inedges, tracing back from e recovers the superpath
$$\begin{aligned} S \,\,\,\supseteq \,\,\, T \cup \{e\} \,\,\,\supseteq \,\,\, Q \cup \{e\} \,\,=\,\, {\widetilde{P}} \,\,\supseteq \,\, P \, , \end{aligned}$$
so the claim holds.
We next claim that when RecoverShortHyperpath in its second phase greedily trims superpath S, the resulting superpath \(T \subseteq S\) still contains a shortest hyperpath. To show this, we prove that each superpath \(S_i\) that remains after i iterations of greedy trimming contains a shortest s,ehyperpath, by induction on i. For the basis at \(i \!=\! 0\), the initial superpath \(S_0\) before any trimming contains a shortest hyperpath by our first claim on S. For the induction at \(i \!>\! 0\), let P be a shortest s,ehyperpath that superpath \(S_{i1}\) contains by our hypothesis, and let f be the hyperedge removed from \(S_{i1}\) at iteration i. If \(f \not \in P\), then \(S_i \,=\, S_{i1}  \{f\}\) trivially contains P. So we assume \(f \in P\). In the following, the core of hyperpath P consists of the tail vertices of its hyperedges.
In an ordering of shortest hyperpath P that satisfies Lemma 1, consider the hyperedges in the suffix of P that begins with f. As edge weights are nonnegative, by Lemma 1 the distances of these hyperedges must be at least d(f), so by assumption (ii) the values of the length field for these hyperedges must be at least f.length. Greedy trimming proceeds in decreasing order of lengthfield values, so the hyperedges in this suffix of P must either have been already considered for trimming before f, or not yet considered due to being tied with f (from having zero edgeweight). If they were considered before f, then since they were not trimmed, there must be no alternate s,ehyperpath in \(S_{i1}\) that enters their head vertices on the core of P. If they were not considered yet, then since f can be removed from \(S_{i1}\), there must be an alternate s,ehyperpath \(Q \subseteq S_i\) distinct from P that enters one of the core headvertices of the hyperedges in this suffix of P whose length field is tied with f. Moreover, this alternate hyperpath Q must enter P with the same lengthfield value as the edge of P sharing this core headvertex. (If Q enters P at a smaller lengthvalue, then P is not a shortest s,ehyperpath; if Q enters at a greater lengthvalue, hyperedge f would not be the next hyperedge removed, as instead a hyperedge from Q of greater length would be.) Since Q enters P at the same lengthvalue, hyperpath Q is also a shortest s,ehyperpath. Hence \(S_i \supseteq Q\) still contains a shortest hyperpath, which proves the second claim.
So the final trimmed s,esuperpath T returned by RecoverShortHyperpath contains a shortest s,ehyperpath \(P \subseteq T\). Since T is minimal (as no further edges could be trimmed), and P by definition is minimal, we must have \(T = P\), which proves the lemma. \({\square}\)
We now show that the hyperpath heuristic solves Shortest Hyperpaths for singletontail hypergraphs.
Theorem 3
(Optimality of the heuristic on singletontail hypergraphs) For singletontail hypergraphs with nonnegative edge weights, the hyperpath heuristic finds a shortest sourcesink hyperpath.
Proof
The key to proving optimality is showing that in singletontail hypergraphs, the estimates that the heuristic computes for shortest hyperpath lengths are exact. Recall that when function ShortestHyperpathHeuristic (in Fig. 3) removes hyperedge e from heap H, it calls RecoverShortHyperpath on e to recover an s,ehyperpath P, and sets the field \(e.\text {length}\) to \(\omega (P)\), the total weight of P.
We claim that when this assignment occurs, field e.length holds distance d(e), the total weight of a shortest s,ehyperpath. We now prove this claim by induction on the number of heap extractions. At a high level, the argument is similar to that for Dijkstra’s shortestpath algorithm (see [26, pp. 659–661]) on ordinary directed graphs.
For the basis, the first hyperedge extracted has \(\mathrm {tail}(e) = \{s\}\) and \(e.\text {key} = \omega (e)\), which equals d(e), as e itself is a shortest s,ehyperpath (since all edge weights are nonnegative). The recovered s,ehyperpath will consist of e (as e.inedges is empty), so after the assignment field e.length holds d(e).
For the induction, let e be the next hyperedge to be removed from the heap, and assume for all hyperedges h extracted prior to e that h.length holds d(h). Now consider a shortest s,ehyperpath P, and in the ordering of P given by Lemma 1, let f be the first hyperedge in P that has not yet been removed from the heap. Note that f exists, as e has not been removed yet.
We first show \(f.\text {key} = d(f)\). In the special case where f is the first edge of P, notice \(d(f) = \omega (f)\) by the same reasoning as in the basis. Furthermore \(f.\text {key} = \omega (f)\), as f.key starts at \(\omega (f)\), never increases, and cannot decrease below this minimum value. So \({f.\text {key} = d(f)}\) in this special case.
In the general case where f is not the first edge of P, let g be the inedge to f on P, and \(Q \subseteq P\) be the prefix of P ending in f, as illustrated in Fig. 5. Notice g has already been extracted from the heap (by the definition of f), so g is in f.inedges (as when a hyperedge is extracted, for all its outedges h it is added to h.inedges). Furthermore Q is a shortest s,fhyperpath by Lemma 1, so g is on a shortest hyperpath to f. For all hyperedges h extracted before e, by the induction hypothesis \(h.\text {length} = d(h)\), and only extracted h add themselves to the field inedges of their outedges. Hence when g was extracted, added itself to f.inedges, and updated f.key by recovering an s,fhyperpath, in the s,fsuperpath S first found during recovery, all hyperedges \(h \in S\) had \(h.\text {length} = d(h)\). Thus by Lemma 2, the recovered s,fhyperpath was a shortest hyperpath, so this updated f.key to d(f), and as argued before in the special case, this key will not change. So again \(f.\text {key} = d(f)\).
We next show,
$$\begin{aligned} e.\text {key}\,\le \,\,\,& {} f.\,\text {key} \end{aligned}$$
(1)
$$\begin{aligned}=\,\,\, & {} d(f) \end{aligned}$$
(2)
$$\begin{aligned}\le\,\,\, & {} d(e) \end{aligned}$$
(3)
$$\begin{aligned}\le\,\,\, & {} e.\text {key} \, \,. \end{aligned}$$
(4)
In the above, inequality (1) holds since e and f are both on the heap (as f was inserted in the heap either during initialization or when g was extracted), but e is removed before f. Equation (2) is from our prior analysis of f. Inequality (3) holds as Q and P are shortest s,f and s,ehyperpaths respectively, while \({Q \subseteq P}\) and edge weights are nonnegative. Lastly, inequality (4) holds since the key of e while it is on the heap is the total weight of some s,ehyperpath. Thus relations (1)–(4) must all be equalities, so \(e.\text {key} = d(e)\).
We now argue \(e.\text {length} = d(e)\) after e is extracted. Since \(e.\text {key} = d(e)\) is the weight of a hyperpath recovered earlier for e, notice (i) there was an inedge to e on a shortest s,ehyperpath in e.inedges; moreover (ii) all hyperedges h in the s,esuperpath collected while recovering a hyperpath for e were extracted earlier, and hence by the induction hypothesis had \(h.\text {length} = d(h)\). Furthermore, hyperedges are never removed from the field inedges, and h.length never changes after h is extracted. Thus the assumptions in Lemma 2 are still met upon extraction of e, so when ShortestHyperpathHeuristic assigns to e.length the total weight of the hyperpath P recovered for e, by Lemma 2 this recovered P will again be a shortest s,ehyperpath, hence \(e.\text {length} = d(e)\). This completes the inductive proof of our claim.
So for every hyperedge h in the doublyreachable subgraph explored by ShortestHyperpathHeuristic, after extracting h from the heap, the relation \(h.\text {length} = d(h)\) holds. Finally, when recovering the best s,thyperpath at the end of the heuristic by examining the inedges e to sink t, for each such hyperedge e the assumptions of Lemma 2 are still met (by the same reasoning as above), so the hyperpaths P obtained from calling RecoverShortHyperpath on these sink inedges e are again shortest s,ehyperpaths. Since a shortest s,thyperpath consists of doublyreachable hyperedges (by the proof of Theorem 2), and is a shortest s,ehyperpath for some inedge e to sink t, the best of these recovered hyperpaths P, which is the hyperpath returned by the heuristic, is a shortest s,thyperpath.\({\square}\)
Theorem 3 (in combination with Theorem 1) shows that, while Shortest Hyperpaths is NPcomplete for singletonhead hypergraphs [14], it is polynomialtime solvable for singletontail hypergraphs.