Strongly Connected Components

Find strongly connected components in a directed graph

Category: community-detection

Syntax

SELECT * FROM graph_scc(table_name)

Description

Strongly connected components (SCC) identifies maximal subsets of a directed graph where every node can reach every other node within the same subset by following directed edges. Unlike weakly connected components (graph_components), SCC respects edge direction. This algorithm is essential for detecting circular dependencies in software module graphs, finding feedback loops in causal networks, and analyzing mutual reachability in transportation or communication networks. The function loads edge data from a registered Delta table as a directed graph in Compressed Sparse Row (CSR) format. The algorithm uses Tarjan's depth-first-search-based approach, which identifies SCCs in a single pass through the graph by tracking discovery times and low-link values on a stack. Column names for source and target are auto-detected from standard conventions (src/source/src_id, dst/target/dst_id). The time complexity is O(V + E), where V is the vertex count and E is the edge count. Tarjan's algorithm is optimal for SCC detection and runs in linear time. It is suitable for graphs of any size that fit in memory, including directed graphs with millions of edges. GPU acceleration is not currently available for SCC computation. All processing runs on the CPU. The graph cache (256 MB default budget, LRU eviction, 10-minute idle timeout) retains the CSR topology, so repeated SCC queries on the same table avoid redundant graph construction.

Parameters

NameTypeDescription
table_nameSpecify the name of the registered Delta table containing edge data. The table must include source and target columns (auto-detected as src/source/src_id and dst/target/dst_id). The graph is loaded as directed, preserving edge direction. Edge weights are ignored for SCC computation.

Examples

CREATE TABLE dependency_edges AS
SELECT * FROM VALUES
  (1, 2, 1.0),
  (2, 3, 1.0),
  (3, 1, 1.0),
  (3, 4, 1.0),
  (4, 5, 1.0),
  (5, 4, 1.0),
  (6, 7, 1.0)
AS t(src, dst, weight);
SELECT * FROM graph_scc('dependency_edges');
SELECT COUNT(DISTINCT scc_id) AS num_sccs
FROM graph_scc('dependency_edges');
SELECT scc_id, COUNT(*) AS size
FROM graph_scc('dependency_edges')
GROUP BY scc_id
HAVING COUNT(*) > 1
ORDER BY size DESC;
SELECT s.scc_id, m.module_name, m.id AS node_id
FROM graph_scc('dependency_edges') s
JOIN modules m ON s.node_id = m.id
WHERE s.scc_id IN (
  SELECT scc_id
  FROM graph_scc('dependency_edges')
  GROUP BY scc_id
  HAVING COUNT(*) > 1
)
ORDER BY s.scc_id, m.module_name;

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →