Modularity

Compute the modularity score Q for a fixed node-to-community partition

Category: community-detection

Syntax

SELECT * FROM graph_modularity(table_name, community_property)

Description

Modularity (Newman and Girvan 2004) measures how much more densely connected nodes within the same community are compared to a configuration-model null, where degree is preserved but edges are otherwise random. For a partition of the vertex set into communities, the score is Q = sum over communities c of [ e_c / m - (a_c / 2m)^2 ], where e_c is the sum of edge weights with both endpoints in community c, a_c is the sum of incident edge weights of nodes in c, and m is the total edge weight. The function evaluates Q for a partition the caller supplies; it does not search for communities itself. The partition is provided as a comma-separated string of integer community ids, one per node, in the engine's internal node-id order (the order produced by graph_louvain, graph_leiden, graph_label_propagation, graph_components, graph_degree, and similar table functions on the same edge table). The length of the list must exactly match the node count of the loaded graph; a mismatch is rejected with a clear error. Community ids are arbitrary i64 values and are densely remapped to a contiguous range internally before reduction. The CPU implementation walks the doubled-CSR adjacency once, accumulating sigma_in[c] and sigma_tot[c] in two hash maps, and emits Q in O(E) time and O(C) memory, where C is the number of distinct communities. The Neo4j GDS convention is followed: GDS halves the textbook Newman value so that Zachary's karate-club two-faction split returns 0.1791 rather than 0.3582; both quantities rank partitions identically. The GPU implementation is a wgpu port of cuGraph's compute_modularity reduction. It dense-remaps the partition to u32, scales edge weights into u32 fixed-point (2^20 scale, ~6 decimal digits of precision for weights in [0, 4096)), and runs a single per-vertex compute pass that atomicAdds sigma_in and sigma_tot into per-community buckets keyed by the source vertex's community. The final Newman sum is reconstructed on the host in f64. The dispatch refuses inputs that would overflow the u32 fixed-point envelope (per-edge weight at or above 4096, or aggregate edge weight that could overflow a single community bucket); in those cases the dispatch reports used_gpu = false and the engine falls back to CPU rather than returning a corrupted scalar. GPU acceleration is most useful for quickly verifying the quality of large partitions produced by Louvain or Leiden; the partition itself is computed by those algorithms, not by graph_modularity. The graph cache (256 MB default budget, LRU eviction, 10-minute idle timeout) retains the CSR topology across repeated invocations, so scoring many candidate partitions for the same edge table reuses a single load.

Parameters

NameTypeDescription
table_nameSpecify the name of the registered Delta table containing edge data. The table must include source and target columns (auto-detected as src/source/src_id and dst/target/dst_id). If a weight column is present, its values are used as edge weights in the modularity sum. The graph is loaded as undirected for the modularity computation.
community_propertyProvide a comma-separated list of integer community ids, one per node, in the same node-id order the engine uses internally (the order returned by graph_degree or graph_louvain on the same edge table). The list length must equal the node count of the loaded graph; mismatched lengths surface as 'modularity: partition length N != node count M'. Community ids are arbitrary i64 values and are densely remapped internally, so non-contiguous ids (1, 7, 42) work the same as contiguous ones.

Examples

CREATE TABLE friendships AS
SELECT * FROM VALUES
  (1, 2, 1.0), (1, 3, 1.0), (2, 3, 1.0),
  (4, 5, 1.0), (4, 6, 1.0), (5, 6, 1.0),
  (3, 4, 0.5)
AS t(src, dst, weight);
WITH partition AS (
  SELECT node_id, community_id
  FROM graph_louvain('friendships')
)
SELECT modularity
FROM graph_modularity(
  'friendships',
  (SELECT string_agg(CAST(community_id AS VARCHAR), ',' ORDER BY node_id) FROM partition)
);
SELECT 'louvain' AS algorithm, modularity FROM graph_modularity(
  'friendships',
  (SELECT string_agg(CAST(community_id AS VARCHAR), ',' ORDER BY node_id) FROM graph_louvain('friendships'))
)
UNION ALL
SELECT 'leiden' AS algorithm, modularity FROM graph_modularity(
  'friendships',
  (SELECT string_agg(CAST(community_id AS VARCHAR), ',' ORDER BY node_id) FROM graph_leiden('friendships'))
);
SELECT modularity
FROM graph_modularity(
  'friendships',
  '0,1,0,1,0,1'
);

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →