Louvain Community Detection

Detect communities using the Louvain modularity optimization algorithm

Category: community-detection

Syntax

SELECT * FROM graph_louvain(table_name [, resolution])

Description

The Louvain algorithm detects communities in a graph by iteratively optimizing modularity, a measure of how densely connected nodes within a community are compared to a random graph with the same degree sequence. The algorithm proceeds in two alternating phases: a local phase where individual nodes are moved to neighboring communities to maximize modularity gain, and an aggregation phase where the discovered communities are collapsed into super-nodes to form a coarser graph. This process repeats until no further modularity improvement is possible. The function loads edge data from a registered Delta table as an undirected graph in Compressed Sparse Row (CSR) format. Edge weights, if present, influence the modularity calculation. The algorithm uses a convergence threshold of 0.0001 and a maximum of 10 aggregation levels. Column names for source and target are auto-detected from standard conventions (src/source/src_id, dst/target/dst_id). The time complexity is approximately O(E) per iteration level, with the total cost depending on the number of hierarchical levels before convergence (typically 3 to 8 levels for real-world graphs). The algorithm is near-linear in practice and handles graphs with millions of edges efficiently. The resolution parameter allows tuning the community size distribution without modifying the underlying data. GPU acceleration is available for Louvain community detection. When a GPU accelerator is registered, the engine attempts parallel modularity optimization on the GPU and falls back to CPU if GPU resources are unavailable. GPU execution is most beneficial for dense graphs exceeding 500,000 edges. The graph cache (256 MB default budget, LRU eviction, 10-minute idle timeout) retains the CSR topology for reuse across repeated invocations.

Parameters

Name	Type	Description
`table_name`		Specify the name of the registered Delta table containing edge data. The table must include source and target columns (auto-detected as src/source/src_id and dst/target/dst_id). If a weight column is present, it is used to weight the modularity computation. The graph is loaded as undirected for community detection.
`resolution`		Set the resolution parameter that controls the granularity of detected communities. Default is 1.0. Values greater than 1.0 produce more, smaller communities; values less than 1.0 produce fewer, larger communities. Valid range is any positive number. A value of 0.5 tends to merge small clusters, while 2.0 tends to split large ones.

Examples

CREATE TABLE friendships AS
SELECT * FROM VALUES
  (1, 2, 1.0), (1, 3, 1.0), (2, 3, 1.0),
  (4, 5, 1.0), (4, 6, 1.0), (5, 6, 1.0),
  (3, 4, 0.5)
AS t(src, dst, weight);

SELECT * FROM graph_louvain('friendships');

SELECT * FROM graph_louvain('friendships', 2.0);

SELECT COUNT(DISTINCT community_id) AS num_communities
FROM graph_louvain('friendships');

SELECT community_id, COUNT(*) AS size
FROM graph_louvain('friendships')
GROUP BY community_id
ORDER BY size DESC;

SELECT
  l.community_id,
  p.node_id,
  p.score AS pagerank
FROM graph_louvain('friendships') l
JOIN graph_pagerank('friendships') p
  ON l.node_id = p.node_id
WHERE p.rank = (
  SELECT MIN(p2.rank)
  FROM graph_louvain('friendships') l2
  JOIN graph_pagerank('friendships') p2
    ON l2.node_id = p2.node_id
  WHERE l2.community_id = l.community_id
)
ORDER BY l.community_id;

Pitfalls

Louvain is non-deterministic. Node ordering during the local move phase can produce different community assignments across runs even on the same graph. Use a fixed sort order in the source edge table when reproducibility is required, and expect minor variation between runs.
The algorithm suffers from the well-known resolution limit: communities smaller than a threshold tied to the total edge count may be merged regardless of the resolution parameter. For fine-grained community discovery on large graphs, consider multi-resolution analysis or Leiden as a preferred alternative when available.
The graph is always loaded as undirected for modularity computation. Directed edges are collapsed into undirected pairs, which discards role asymmetries (e.g. follower vs followee). If directional semantics matter, use graph_scc or a directed community detection approach instead.
The level column exposes the final hierarchical level only; intermediate dendrogram levels are not returned. Applications that need to explore multiple resolution scales must call the function repeatedly with different resolution values.
Edge weights influence modularity computation, so edge-list tables containing duplicate edges produce different results than deduplicated tables with summed weights. Normalise the edge table before invocation if the semantics of parallel edges matter.
GPU acceleration falls back silently when the graph exceeds device memory. Verify device selection through execution hints or EXPLAIN when the ON GPU latency is critical.