Delta-Stepping Single-Source Shortest Paths

Parallel single-source shortest paths via bucketed light/heavy edge relaxation

Category: pathfinding

Syntax

SELECT * FROM graph_deltastepping(table_name, source_id [, delta])

Description

graph_deltastepping implements the Meyer-Sanders 2003 delta-stepping single-source shortest path algorithm: a bucketed Dijkstra variant whose per-bucket work is data-parallel. Tentative distances are bucketed by floor(dist[v] / delta). Edges with weight <= delta are called light and are relaxed eagerly within the current bucket; edges with weight > delta are called heavy and are deferred to a later bucket. Each bucket can be processed in parallel because relaxations within it never bridge buckets in a way that requires sequential ordering. This makes delta-stepping the standard parallel SSSP algorithm: on multi-core CPUs it parallelises well via rayon, and on GPUs it maps directly to the near-far pile pattern (Davidson et al. 2014) used by cuGraph. The CPU implementation keeps buckets in a BTreeMap keyed by the integer floor(dist/delta), so it always processes the lowest-key bucket next. Inside a bucket, light-edge relaxations are repeated until the bucket is empty (light edges may add nodes back into the same bucket), then heavy edges are flushed once. Relaxations only ever decrease tentative distances; concurrent same-bucket discoveries are reconciled by the standard min-update rule. The GPU implementation in delta-forge-graph-gpu/src/algorithms/delta_stepping.rs is a direct port of cuGraph's near-far pile from cugraph/cpp/src/traversal/sssp_impl.cuh. It uses four compute passes: relax (edge-parallel atomicMin on tentative distance), classify (route updated vertices into next-iteration near or far based on whether their distance is below cur_threshold), snapshot (commit near to the working frontier), and refill (advance cur_threshold by delta until at least one far vertex falls below it, then demote far -> near). Bucket compaction is implicit because the buckets are bitmaps; the host loop terminates when both bitmaps are empty. The single-GPU code path omits cuGraph's near-near / near-far subpartitioning (an SM-saturation tuning, not a correctness invariant). The delta parameter is the key tuning knob. Optimal delta depends on the edge-weight distribution. A reasonable starting point is the average edge weight, which keeps roughly half the edges in the light class. Too small a delta places almost every edge in the heavy class, and the bucket queue degenerates into a chain of single-element buckets - effectively a slow Dijkstra with high constant overhead. Too large a delta lumps every node into one or two buckets, which collapses the algorithm to a Bellman-Ford-style mass relaxation with high duplicate work and no parallel benefit. The default of 3.0 is a sensible mid-range value for unit-to-double-digit edge-weight ranges and should be retuned for graphs whose weights live at very different scales. Output shape is one row per node, including unreachable nodes (distance = +Infinity) and the source (distance 0). Schema matches graph_bellmanford so the two single-source shortest-path functions are interchangeable on the read side; use graph_deltastepping when all weights are non-negative and you want parallelism, graph_bellmanford when negatives may occur, and graph_shortest_path when you only need one source-target pair.

Parameters

Name	Type	Description
`table_name`		Name of the registered edge table. Source and target columns are auto-detected (src/source/src_id, dst/target/dst_id). A weight column, if present, is used as the edge cost; otherwise every edge has weight 1.0. Weights must be finite and non-negative or the search is rejected at runtime.
`source_id`		Starting node id. Must exist in the graph. If the id is out of range the function returns an empty result set; if the graph is empty the function returns an empty result set without dispatching any work.
`delta`		Bucket width that separates light edges (weight <= delta, processed eagerly within a bucket) from heavy edges (weight > delta, deferred to a later bucket). Must be strictly positive; a delta of zero or negative raises InvalidArgument. The value rules the parallelism vs. work tradeoff: too small collapses delta-stepping into a chain of Dijkstra-like serial passes, and too large degrades it into Bellman-Ford-style many-redundant-relaxations.

Examples

CREATE TABLE links AS
SELECT * FROM VALUES
  (1, 2, 1.0),
  (1, 3, 2.5),
  (2, 3, 0.5),
  (2, 4, 4.0),
  (3, 4, 1.0),
  (3, 5, 7.0),
  (4, 5, 2.0),
  (4, 6, 5.0),
  (5, 6, 1.5)
AS t(src, dst, weight);

SELECT node_id, distance
FROM graph_deltastepping('links', 1)
ORDER BY distance;

WITH stats AS (
  SELECT AVG(weight) AS avg_w FROM links
)
SELECT node_id, distance
FROM graph_deltastepping('links', 1, (SELECT avg_w FROM stats))
ORDER BY node_id;

SELECT 'delta=0.5' AS choice, MAX(distance) AS max_dist FROM graph_deltastepping('links', 1, 0.5)
UNION ALL
SELECT 'delta=3.0', MAX(distance) FROM graph_deltastepping('links', 1, 3.0)
UNION ALL
SELECT 'delta=20.0', MAX(distance) FROM graph_deltastepping('links', 1, 20.0);

SELECT node_id, distance
FROM graph_deltastepping('links', 1, 2.0)
WHERE distance < 5.0
ORDER BY distance;

SELECT node_id, distance FROM graph_deltastepping('links', 1, 3.0) /*+ ON GPU */;

Pitfalls

Delta tuning is non-trivial and graph-dependent. A delta that is too small relative to the typical edge weight collapses the algorithm into single-bucket behaviour (equivalent to Dijkstra with extra bookkeeping overhead). A delta that is too large lumps the entire frontier into one bucket, which causes excessive re-relaxation churn and erodes any parallel benefit. Start near the average edge weight; benchmark a few values if performance is critical.
Negative or non-finite edge weights are rejected at runtime. The CPU implementation raises InvalidArgument the first time it inspects such an edge; the GPU dispatch rejects them on the host before launching any kernel. Delta-stepping is mathematically undefined for negative weights because the bucket assignment assumes monotonically non-decreasing tentative distances. Use graph_bellmanford when negatives may legitimately occur.
Delta must be strictly positive. A delta of zero or negative raises InvalidArgument with the message "delta_stepping: delta must be positive". The default of 3.0 is used only when the caller omits the argument entirely; an explicit zero or negative value is a hard error, not a fallback.
Edge direction is respected. To search an undirected graph, insert both (a, b) and (b, a) rows in the source table; otherwise the algorithm only relaxes edges in the forward direction and may report +Infinity for nodes that are reachable in the undirected sense.
The result includes every node, including unreachable ones with distance = +Infinity. Add WHERE distance < 'Infinity'::DOUBLE if your downstream pipeline cannot tolerate infinite values.
Unlike graph_shortest_path, delta-stepping does not terminate early when one specific target is found. It computes distances to every node from the source. If you only need a single source-target pair on a non-negative-weight graph, graph_shortest_path with its early-exit Dijkstra is strictly faster in practice.
The GPU dispatch is the same algorithmic pattern (near-far pile) as the CPU but uses atomic min on the bit-pattern of f32 distances. Very small differences in result values (sub-ULP) are possible between CPU (f64) and GPU (f32) runs on graphs with extreme weight ratios; choose a delta that keeps the practical distance range within f32's precision band if exact agreement matters.