Compute the per-node local clustering coefficient and triangle count
SELECT * FROM graph_lcc(table_name)
The local clustering coefficient (Watts and Strogatz 1998) measures how tightly a node's neighbourhood is interconnected. For a node v with deg(v) distinct undirected neighbours, the engine computes LCC(v) = 2 * triangles(v) / (deg(v) * (deg(v) - 1)), the fraction of neighbour pairs of v that are themselves connected. The result is bounded in [0.0, 1.0]: 1.0 means every neighbour of v is connected to every other neighbour (a complete subgraph on v's neighbourhood); 0.0 means no neighbour pair is connected. Nodes with deg(v) < 2 have no neighbour pairs, so LCC is defined as 0.0 by convention. This matches Neo4j GDS gds.localClusteringCoefficient. The function loads edge data from a registered Delta table as an undirected graph in Compressed Sparse Row (CSR) format. The CPU implementation computes a global triangle count first (per-edge intersection of endpoint neighbour sets), then divides by deg * (deg - 1) / 2. For directed inputs, the neighbour set is taken as the union of out- and in-neighbours so the LCC matches an undirected interpretation, which is what GDS does. Column names for source and target are auto-detected from standard conventions (src/source/src_id, dst/target/dst_id). Edge weights are ignored. The GPU implementation uses the Chiba-Nishizeki degree-ordered DAG to avoid the O(deg^2) per-vertex enumeration cost that the naive shape would incur on hubs. Each triangle is oriented so that the vertex with the smallest (degree, id) tuple owns the enumeration step; the other two vertices are reached by a two-pointer sorted intersection of their adjacency slices. Each triangle is therefore visited exactly once, and the per-thread cost is bounded by min-degree rather than degree, which prevents a single celebrity hub from serialising the whole pass. Triangle counts are accumulated as atomic<u32> on device, and a separate normalise pass divides by deg * (deg - 1) on the GPU before readback. The dispatch also collapses parallel edges per row on the host before upload (CSR's within-bucket sort guarantees duplicates cluster contiguously), so the simple-graph contract that LCC requires holds regardless of whether the input edge table contains duplicate (src, dst) pairs. A warning is logged when any duplicates are dropped. The per-vertex u32 triangle counter is safe up to a post-dedup degree of 92681 (a hub with a complete neighbourhood at that degree saturates u32). The dispatch emits a tracing warning above that threshold; WGSL has no portable atomic<u64> so widening is not possible today, and clustering results on adversarial hubs near that bound should be treated as suspect. The graph cache (256 MB default budget, LRU eviction, 10-minute idle timeout) retains the CSR topology across repeated invocations.
| Name | Type | Description |
|---|---|---|
table_name | Specify the name of the registered Delta table containing edge data. The table must include source and target columns (auto-detected as src/source/src_id and dst/target/dst_id). The graph is loaded as undirected for triangle enumeration. Edge weights are ignored: clustering coefficient is a topological quantity. |
CREATE TABLE social_network AS
SELECT * FROM VALUES
(1, 2, 1.0),
(2, 3, 1.0),
(3, 1, 1.0),
(3, 4, 1.0),
(4, 5, 1.0),
(5, 3, 1.0)
AS t(src, dst, weight);
SELECT * FROM graph_lcc('social_network')
ORDER BY coefficient DESC;
SELECT node_id, triangle_count, coefficient
FROM graph_lcc('social_network')
WHERE coefficient >= 0.5
ORDER BY coefficient DESC;
SELECT AVG(coefficient) AS avg_clustering
FROM graph_lcc('social_network');
SELECT l.node_id, l.coefficient, d.total_degree
FROM graph_lcc('social_network') l
JOIN graph_degree('social_network') d
ON l.node_id = d.node_id
WHERE l.coefficient = 0.0
ORDER BY d.total_degree;