Detect communities by iterative neighbor-label voting until labels stabilize
SELECT * FROM graph_labelpropagation(table_name [, max_iterations])
Label Propagation (Raghavan, Albert, and Kumara 2007) assigns each node the community label that occurs most frequently among its neighbours, iterating until labels stabilize. Each node starts in its own community. On every round every node tallies a weighted vote across its neighbours (and, on directed graphs, its in-neighbours as well), then adopts the label with the highest total vote; ties break to the smallest label id for determinism, matching the Neo4j GDS gds.labelPropagation tie-break rule. The implementation uses a synchronous Gauss-Jacobi update: every node in a round reads the previous round's labels, so the round result does not depend on traversal order. Communities are renumbered contiguously starting from 0 before being returned. The function loads edge data from a registered Delta table as a graph in Compressed Sparse Row format. Column names are auto-detected from standard conventions (src/source/src_id, dst/target/dst_id). When an edge weight column is present, weights scale each vote contribution; otherwise every neighbour contributes 1.0. Time complexity is O(max_iterations times E) for E edges, dominated by a single sparse pass per round; the CPU path parallelizes the per-vertex vote tally with Rayon over a per-thread AHashMap accumulator. GPU acceleration is available. The dispatcher routes to GPU when the active accelerator accepts the graph and the node count meets or exceeds 5,000 (delta-forge-graph-gpu thresholds::LABEL_PROPAGATION); smaller graphs run on the CPU path because the GPU transfer overhead dominates. The GPU kernel selects a per-vertex strategy by degree: a private fixed-capacity dedup table for degree at or below 32 (LOW_DEG path), and a sampled aggregation that draws up to 1,024 deterministic per-vertex samples per round for higher-degree vertices (SAMPLE_CAP path). The sampling cap exists specifically to prevent Windows TDR timeouts on hub-heavy graphs where an O(deg^2) dedup scan over a multi-thousand-degree vertex would otherwise stall a single workgroup past the 2-second driver budget; the dispatch is further sliced into 4,096-vertex chunks so each submission is bounded in wall time regardless of how concentrated the hub workload is. The GPU path also adds a parity-gated oscillation guard (only vertices whose id parity matches the iteration parity are permitted to flip labels in a given round), mirroring cuGraph's Louvain up_down flag, which prevents the well-known two-cycle that synchronous LPA produces on bipartite substructure.
| Name | Type | Description |
|---|---|---|
table_name | Specifies the name of the registered Delta table containing edge data. The table must include source and target columns (auto-detected as src/source/src_id and dst/target/dst_id). An optional weight column scales each neighbor's vote when present; on directed graphs both outgoing and incoming neighbours contribute to the vote, matching Neo4j GDS gds.labelPropagation default orientation behavior. | |
max_iterations | Sets the maximum number of synchronous voting rounds. Default is 10, matching the Neo4j GDS gds.labelPropagation default. The loop exits early once an entire round produces no label changes. Real graphs typically converge in 3 to 5 rounds, so increasing this value rarely changes the final labelling. |
CREATE TABLE coauthors AS
SELECT * FROM VALUES
(1, 2, 1.0), (1, 3, 1.0), (2, 3, 1.0),
(4, 5, 1.0), (4, 6, 1.0), (5, 6, 1.0),
(3, 4, 0.2)
AS t(src, dst, weight);
SELECT * FROM graph_labelpropagation('coauthors');
SELECT * FROM graph_labelpropagation('coauthors', 3);
SELECT COUNT(DISTINCT community_id) AS num_communities
FROM graph_labelpropagation('coauthors');
SELECT community_id, COUNT(*) AS members
FROM graph_labelpropagation('coauthors')
GROUP BY community_id
ORDER BY members DESC, community_id;
SELECT l.community_id, COUNT(*) AS researchers, MAX(p.affiliation) AS sample_affiliation
FROM graph_labelpropagation('coauthors') l
JOIN people p ON l.node_id = p.id
GROUP BY l.community_id
ORDER BY researchers DESC;