Depth-First Search

Traverse the graph in depth-first order from a source node

Category: pathfinding

Syntax

SELECT * FROM graph_dfs(table_name, source_id [, max_depth])

Description

Depth-first search (DFS) traverses a graph by exploring as far as possible along each branch before backtracking. Starting from a specified source node, DFS follows outgoing edges to unvisited nodes recursively, building a spanning tree that captures the traversal order. DFS is used for topological sorting, cycle detection, reachability analysis, and constructing spanning trees. The function returns the DFS tree (node, depth, parent) rather than discovery and finish timestamps; downstream edge classification must be derived from the parent-child relationships. The function loads edge data from a registered Delta table into a Compressed Sparse Row (CSR) representation. The traversal begins at the specified source_id and proceeds via a stack-based (or recursive) exploration of neighbors. Each discovered node records its depth in the DFS tree and its parent node. Column names for source and target are auto-detected from standard conventions (src/source/src_id, dst/target/dst_id). If the source_id does not exist in the graph, an empty result set is returned. The time complexity is O(V + E), where V is the number of vertices and E is the number of edges reachable from the source. When max_depth is set, the traversal prunes branches beyond the specified depth. Note that unlike BFS, DFS does not guarantee shortest paths; the depth value reflects the DFS tree structure, not the minimum hop distance. GPU acceleration is not currently available for DFS. All computation runs on the CPU. The graph cache (256 MB default budget, LRU eviction, 10-minute idle timeout) retains the CSR topology, so repeated DFS calls on the same table with different source nodes reuse the cached graph.

Parameters

NameTypeDescription
table_nameSpecify the name of the registered Delta table containing edge data. The table must include source and target columns (auto-detected as src/source/src_id and dst/target/dst_id). Edge weights are ignored; DFS uses only the graph topology.
source_idSpecify the node ID from which to start the depth-first traversal. This must be a valid node ID present in the edge table. If the node ID does not exist in the graph, an empty result set is returned.
max_depthSet the maximum traversal depth from the source node. Default is unlimited, meaning the entire reachable graph is traversed. Valid range is 1 or greater.

Examples

CREATE TABLE call_graph AS
SELECT * FROM VALUES
  (1, 2, 1.0),
  (1, 3, 1.0),
  (2, 4, 1.0),
  (2, 5, 1.0),
  (3, 6, 1.0),
  (4, 7, 1.0)
AS t(src, dst, weight);
SELECT * FROM graph_dfs('call_graph', 1);
SELECT node_id, depth, parent_id
FROM graph_dfs('call_graph', 1, 2);
SELECT d.node_id, d.depth
FROM graph_dfs('call_graph', 1) d
LEFT JOIN graph_dfs('call_graph', 1) child
  ON child.parent_id = d.node_id
WHERE child.node_id IS NULL;
SELECT depth, COUNT(*) AS node_count
FROM graph_dfs('call_graph', 1)
GROUP BY depth
ORDER BY depth;

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →