Detect communities using the Louvain modularity optimization algorithm
SELECT * FROM graph_louvain(table_name [, resolution])
The Louvain algorithm detects communities in a graph by iteratively optimizing modularity, a measure of how densely connected nodes within a community are compared to a random graph with the same degree sequence. The algorithm proceeds in two alternating phases: a local phase where individual nodes are moved to neighboring communities to maximize modularity gain, and an aggregation phase where the discovered communities are collapsed into super-nodes to form a coarser graph. This process repeats until no further modularity improvement is possible. The function loads edge data from a registered Delta table as an undirected graph in Compressed Sparse Row (CSR) format. Edge weights, if present, influence the modularity calculation. The algorithm uses a convergence threshold of 0.0001 and a maximum of 10 aggregation levels. Column names for source and target are auto-detected from standard conventions (src/source/src_id, dst/target/dst_id). The time complexity is approximately O(E) per iteration level, with the total cost depending on the number of hierarchical levels before convergence (typically 3 to 8 levels for real-world graphs). The algorithm is near-linear in practice and handles graphs with millions of edges efficiently. The resolution parameter allows tuning the community size distribution without modifying the underlying data. GPU acceleration is available for Louvain community detection. When a GPU accelerator is registered, the engine attempts parallel modularity optimization on the GPU and falls back to CPU if GPU resources are unavailable. GPU execution is most beneficial for dense graphs exceeding 500,000 edges. The graph cache (256 MB default budget, LRU eviction, 10-minute idle timeout) retains the CSR topology for reuse across repeated invocations.
| Name | Type | Description |
|---|---|---|
table_name | Specify the name of the registered Delta table containing edge data. The table must include source and target columns (auto-detected as src/source/src_id and dst/target/dst_id). If a weight column is present, it is used to weight the modularity computation. The graph is loaded as undirected for community detection. | |
resolution | Set the resolution parameter that controls the granularity of detected communities. Default is 1.0. Values greater than 1.0 produce more, smaller communities; values less than 1.0 produce fewer, larger communities. Valid range is any positive number. A value of 0.5 tends to merge small clusters, while 2.0 tends to split large ones. |
CREATE TABLE friendships AS
SELECT * FROM VALUES
(1, 2, 1.0), (1, 3, 1.0), (2, 3, 1.0),
(4, 5, 1.0), (4, 6, 1.0), (5, 6, 1.0),
(3, 4, 0.5)
AS t(src, dst, weight);
SELECT * FROM graph_louvain('friendships');
SELECT * FROM graph_louvain('friendships', 2.0);
SELECT COUNT(DISTINCT community_id) AS num_communities
FROM graph_louvain('friendships');
SELECT community_id, COUNT(*) AS size
FROM graph_louvain('friendships')
GROUP BY community_id
ORDER BY size DESC;
SELECT
l.community_id,
p.node_id,
p.score AS pagerank
FROM graph_louvain('friendships') l
JOIN graph_pagerank('friendships') p
ON l.node_id = p.node_id
WHERE p.rank = (
SELECT MIN(p2.rank)
FROM graph_louvain('friendships') l2
JOIN graph_pagerank('friendships') p2
ON l2.node_id = p2.node_id
WHERE l2.community_id = l.community_id
)
ORDER BY l.community_id;