Yen's K-Shortest Paths

Enumerate the K shortest loopless paths between two nodes in increasing weight order

Category: pathfinding

Syntax

SELECT * FROM graph_yens(table_name, source_id, target_id [, k])

Description

Yen's algorithm enumerates the K shortest loopless paths between a source and a target in non-decreasing order of total weight. The first row is always the globally shortest path; each subsequent row is the cheapest path that differs from every preceding path by at least one edge. Use this when a single shortest-path answer is not enough: route diversification, alternative-route planning, backup-link analysis, top-K explanation queries, and any scenario that asks for the second-best, third-best, or kth-best traversal. The function loads edge data from the registered Delta table into a Compressed Sparse Row (CSR) representation. Source, target, and (optional) weight columns are auto-detected from standard column-name conventions. A_1 is computed by a Dijkstra search from source to target. For each subsequent A_k, the algorithm walks every spur position along A_{k-1}, runs a Dijkstra subproblem from the spur node with the previously committed prefix excluded as visited nodes and with the next-edge of every previously committed path sharing the same prefix excluded as a forbidden transition, and pushes the resulting full path into a candidate heap. The cheapest non-duplicate candidate is then promoted to A_k. The process terminates when k paths have been emitted or the candidate heap is exhausted, whichever comes first. Time complexity is O(K * V * (E + V log V)) in the worst case: K outer iterations, V spur positions per iteration, and a full Dijkstra inside each spur. The implementation matches Neo4j GDS `gds.shortestPath.yens` semantics for ties: when multiple candidate paths share the same total cost, the one whose CSR-index sequence is lexicographically smaller at the first divergence wins, which makes the output deterministic across repeated runs on the same input. GPU acceleration is not available for the SQL surface of this function. Yen's algorithm is structurally serial along the outer K loop because A_k depends on the entire set {A_1, ..., A_{k-1}} to build per-spur exclusion masks. The spur subproblems themselves are parallelisable, but they are bounded by V and individually cheap; the candidate heap is small (at most k * (V - 1) entries) and is maintained on the host. The graph cache (256 MB default budget, LRU eviction, 10-minute idle timeout) retains the CSR topology so repeated queries with different source/target/k arguments do not reload the edge table.

Parameters

Name	Type	Description
`table_name`		Specify the name of the registered Delta table containing edge data. The table must include source and target columns (auto-detected as src/source/src_id and dst/target/dst_id). If a weight column is present, it is used for path cost; otherwise every edge contributes 1.0 to the total cost. Edge direction is respected; symmetrise the input table if undirected semantics are required.
`source_id`		Specify the starting node ID for the path enumeration. The value must match an ID present in the edge table. If the node does not exist or the table is empty, an empty result set is returned.
`target_id`		Specify the destination node ID. The value must match an ID present in the edge table. If source == target the function returns a single rank-1 row with cost 0 and a trivial single-node path. If no path connects source to target the result is empty.
`k`		Set the maximum number of distinct shortest paths to return. Defaults to 3 when omitted. Must be a positive integer. The function returns at most k rows. When the graph contains fewer than k loopless paths between source and target, the result set is shorter than k without warning. Practical values are typically 1 to 16; the algorithmic cost grows linearly with k.

Examples

CREATE TABLE grid_edges AS
SELECT * FROM VALUES
  (1, 2, 1.0), (2, 3, 1.0), (3, 4, 1.0),
  (1, 5, 1.0), (5, 6, 1.0), (6, 4, 1.0),
  (1, 7, 2.0), (7, 8, 1.0), (8, 4, 2.0),
  (2, 5, 3.0), (3, 6, 3.0)
AS t(src, dst, weight);

SELECT rank, total_cost, nodes
FROM graph_yens('grid_edges', 1, 4, 3)
ORDER BY rank;

SELECT * FROM graph_yens('grid_edges', 1, 4);

SELECT total_cost AS backup_cost, nodes AS backup_route
FROM graph_yens('grid_edges', 1, 4, 2)
WHERE rank = 2;

WITH paths AS (
  SELECT rank, total_cost
  FROM graph_yens('grid_edges', 1, 4, 5)
)
SELECT
  MIN(total_cost) AS best,
  MAX(total_cost) AS kth_worst,
  MAX(total_cost) - MIN(total_cost) AS spread
FROM paths;

SELECT
  CASE WHEN COUNT(*) < 2 THEN 'fragile' ELSE 'redundant' END AS resilience
FROM graph_yens('grid_edges', 1, 4, 2);

Pitfalls

The result set is at most k rows but may be shorter. When fewer than k loopless paths exist (target unreachable, graph too small, or all routes share required edges and exhaust quickly), the function returns whatever it found and does not raise an error or emit a sentinel. Always check COUNT(*) before assuming you received exactly k alternatives.
Edge weights must be non-negative. Yen's algorithm is built on Dijkstra subproblems, which produce undefined behaviour on negative weights. Validate the weight column before invocation; use a Bellman-Ford-shaped approach for graphs with legitimate negative weights.
Edge direction is respected. An undirected problem must be encoded by inserting both (a, b) and (b, a) rows in the source table; otherwise the second-shortest path may be missed when the only alternative traverses an edge in the reverse direction.
The cost spread between A_1 and A_K can be very large on sparse graphs. The Kth path is the cheapest among non-duplicates, not the cheapest within a corridor of the global optimum. If only near-optimal alternatives are useful, apply a WHERE total_cost <= 1.5 * (SELECT MIN(total_cost) FROM ...) filter on the result.
Paths are returned as a Utf8 string in the nodes column. Downstream code that needs row-per-node output must parse this string (strip brackets, split on comma) or unnest it explicitly; the function does not emit one row per hop.
GPU acceleration is not wired through the SQL table function. The ON GPU execution hint is silently ignored for graph_yens; computation always runs CPU-only. The Cypher CALL surface exposes a separate GPU dispatch for top-K shortest paths via the cuGraph-pattern port, but that path is not selected for the table-function form.