Breadth-First Search

Traverse the graph in breadth-first order from a source node

Category: pathfinding

Syntax

SELECT * FROM graph_bfs(table_name, source_id [, max_depth])

Description

Breadth-first search (BFS) traverses a graph layer by layer starting from a specified source node. It discovers all nodes at distance 1 (direct neighbors) before moving to distance 2, then distance 3, and so on. This level-order traversal guarantees that each node is reached via the shortest hop-count path from the source. BFS is fundamental for computing unweighted shortest paths, measuring graph diameter, and exploring node neighborhoods up to a specified depth. The function loads edge data from a registered Delta table into a Compressed Sparse Row (CSR) representation. The traversal begins at the specified source_id and proceeds outward using a queue-based frontier expansion. Each discovered node records its distance from the source and its parent in the BFS spanning tree. Column names for source and target are auto-detected from standard conventions (src/source/src_id, dst/target/dst_id). If the source_id does not exist in the graph, an empty result set is returned. The time complexity is O(V + E), where V is the number of vertices and E is the number of edges reachable from the source. When max_depth is specified, only the portion of the graph within that many hops is visited, which can significantly reduce computation for large graphs. BFS is the most efficient algorithm for unweighted shortest-path queries. GPU acceleration is not currently available for BFS. All computation runs on the CPU. The graph cache (256 MB default budget, LRU eviction, 10-minute idle timeout) retains the CSR topology, so repeated BFS calls from different source nodes on the same table reuse the cached graph without reloading.

Parameters

NameTypeDescription
table_nameSpecify the name of the registered Delta table containing edge data. The table must include source and target columns (auto-detected as src/source/src_id and dst/target/dst_id). Edge weights are ignored; BFS treats all edges as having unit cost.
source_idSpecify the node ID from which to start the breadth-first traversal. This must be a valid node ID present in the edge table. If the node ID does not exist in the graph, an empty result set is returned.
max_depthSet the maximum traversal depth (number of hops from the source). Default is unlimited, meaning the entire reachable graph is traversed. Set a lower value (e.g., 2 or 3) to restrict results to the immediate neighborhood of the source node. Valid range is 1 or greater.

Examples

CREATE TABLE road_network AS
SELECT * FROM VALUES
  (1, 2, 1.0),
  (1, 3, 1.0),
  (2, 4, 1.0),
  (3, 4, 1.0),
  (4, 5, 1.0),
  (5, 6, 1.0)
AS t(src, dst, weight);
SELECT * FROM graph_bfs('road_network', 1);
SELECT node_id, distance
FROM graph_bfs('road_network', 1, 2);
-- BFS guarantees shortest hop-count paths; trace parent_id chain
SELECT node_id, distance, parent_id
FROM graph_bfs('road_network', 1)
ORDER BY distance;
SELECT distance, COUNT(*) AS node_count
FROM graph_bfs('road_network', 1)
GROUP BY distance
ORDER BY distance;
SELECT b.node_id, b.distance, n.name
FROM graph_bfs('road_network', 1, 3) b
JOIN locations n ON b.node_id = n.id
ORDER BY b.distance, n.name;

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →