Detect well-connected communities by combining Louvain local moves with a refinement phase
SELECT * FROM graph_leiden(table_name [, gamma] [, max_iterations] [, tolerance] [, max_levels])
Leiden (Traag, Waltman, and van Eck 2019) is a community detection algorithm that extends Louvain with a refinement phase to repair Louvain's biggest known defect: arbitrarily badly-connected communities at small resolutions. Each level of the Leiden hierarchy runs three phases. Phase 1 (local move) sweeps every node and moves it to the neighbouring community that yields the largest positive modularity gain, identical to a Louvain pass and seeded from the previous level's Louvain partition (Neo4j's seededCommunitiesForNextIteration behavior). Phase 2 (refinement) re-partitions each Louvain community by running a constrained, singleton-restart local move that may only merge sub-communities of nodes that are themselves well-connected to the rest of the parent Louvain community; gamma controls the connectedness threshold. Phase 3 (aggregation) collapses the refined sub-communities, not the Louvain communities, into super-nodes and feeds the next level. The hierarchy terminates when max_levels is reached, when a Phase 1 sweep produces no movement, when refinement leaves the partition uncoarsenable, or when the modularity ratchet flattens within tolerance. The function loads edge data from a registered Delta table as an undirected graph in Compressed Sparse Row format, auto-detecting source and destination columns from standard conventions (src/source/src_id, dst/target/dst_id). Edge weights, when present, scale the modularity computation. Output is the per-original-node community at the topmost level after walking the dendrogram top-down; community ids are renumbered contiguously from 0, and the level column reports the number of aggregation rounds actually run. The SQL surface for graph_leiden runs on the CPU only. The dispatcher in delta-forge-graph/src/table_funcs/extended_funcs.rs deliberately omits the GPU branch (the comment in the source file notes that Leiden's refinement phase is sequential at the SQL surface). A separate GPU port lives in delta-forge-graph-gpu/src/algorithms/leiden.rs and implements the same three-phase pipeline (Louvain local move, refinement with cuGraph's four well-connectedness predicates, aggregation via bitonic sort plus Hillis-Steele scan plus CAS-add compact under shaders leiden_agg_emit, leiden_agg_bitonic_step, leiden_agg_mark_boundaries, leiden_agg_scan_step, leiden_agg_compact_and_sum, and leiden_agg_build_row_ptr); it is consumed by the Cypher columnar executor's algorithms path under CALL algo.leiden, not by graph_leiden. CPU complexity is approximately O(E) per inner sweep, with the level count typically 3 to 8 on real graphs.
| Name | Type | Description |
|---|---|---|
table_name | Specifies the name of the registered Delta table containing edge data. The table must expose source and target columns (auto-detected as src/source/src_id and dst/target/dst_id). An optional weight column influences modularity; the graph is treated as undirected for community detection. | |
gamma | Sets the resolution parameter that scales the modularity null model. Default is 1.0, matching Neo4j GDS gds.leiden. Values above 1.0 favour finer-grained communities; values below 1.0 favour larger merged communities. Gamma also governs the well-connectedness gate in the refinement phase: a refined sub-community is admissible only when its cut to the rest of its Louvain community exceeds gamma * vol(sub) * (vol(parent) - vol(sub)) / 2m. | |
max_iterations | Sets the inner-loop iteration cap for the Phase 1 local-move sweep at every level. Default is 10. The local-move loop also stops early once no node moves in a sweep or once the modularity ratchet flattens. | |
tolerance | Sets the modularity improvement threshold between successive local-move iterations. Default is 1e-4. The Phase 1 outer loop continues only while the new modularity Q exceeds the current Q by more than this amount, matching cuGraph's louvain_impl.cuh ratchet. | |
max_levels | Sets the maximum number of hierarchy levels (local-move plus refine plus aggregate cycles). Default is 10. The hierarchy also terminates early when a level produces no movement, when refinement leaves every node in its own sub-community (no coarsening possible), or when modularity flattens. |
CREATE TABLE collab AS
SELECT * FROM VALUES
(1, 2, 1.0), (1, 3, 1.0), (2, 3, 1.0), (2, 4, 1.0), (3, 4, 1.0),
(5, 6, 1.0), (5, 7, 1.0), (6, 7, 1.0), (6, 8, 1.0), (7, 8, 1.0),
(4, 5, 0.2)
AS t(src, dst, weight);
SELECT * FROM graph_leiden('collab');
SELECT * FROM graph_leiden('collab', 1.5, 20, 1e-6, 15);
SELECT * FROM graph_leiden('collab', 0.5);
SELECT community_id, COUNT(*) AS members
FROM graph_leiden('collab')
GROUP BY community_id
ORDER BY members DESC, community_id;
WITH parts AS (
SELECT node_id, community_id
FROM graph_leiden('collab')
)
SELECT *
FROM graph_modularity(
'collab',
(SELECT STRING_AGG(node_id::TEXT || ':' || community_id::TEXT, ',')
FROM parts)
);