K-Core Decomposition

Assign each node its coreness number, the largest k for which the node belongs to a k-core

Category: community-detection

Syntax

SELECT * FROM graph_kcore(table_name)

Description

K-core decomposition labels every node with the largest integer k for which the node still belongs to a subgraph where every member has at least k connections. The k-core itself is the maximal subgraph in which every vertex has degree at least k inside that subgraph. Coreness is a robust, parameter-free density measure: nodes deep inside a tight cluster get high coreness values, peripheral nodes get low values, and isolated nodes get 0. Use this for graph summarisation, dense-subgraph mining, identifying central or 'core' actors in a network, pruning low-importance vertices before applying a more expensive community-detection algorithm, and visual layout where central nodes should be placed first. The function loads edge data from a registered Delta table into a Compressed Sparse Row (CSR) representation. Source and target column names are auto-detected from standard conventions. The CPU implementation uses the linear-time Batagelj-Zaversnik bucket-peeling algorithm: nodes are initially placed in degree-indexed buckets, and the algorithm repeatedly removes the lowest-degree node, records its current degree as its coreness, and decrements the bucket position of each surviving neighbour. Each edge is touched once, giving an O(V + E) total cost. The output matches Neo4j GDS `gds.kcore` semantics: directed graphs are treated with combined in plus out degree (the UNDIRECTED projection). GPU acceleration is available for k-core decomposition. The GPU implementation follows cuGraph's level-synchronous peeling pattern: for each k starting at 0, all currently-alive vertices with live degree at most k are dispatched as a frontier, each frontier thread peels its vertex and atomically decrements the live degree of every surviving neighbour, and the threads that pushed a neighbour from above-k to at-or-below-k claim that neighbour for the next-round frontier. The loop advances k and repeats until every vertex is peeled. Self-loops are filtered, multi-edges are counted. GPU execution is most beneficial for graphs exceeding a few hundred thousand edges; smaller graphs are dominated by host-to-device transfer cost and may run faster on the CPU. The graph cache (256 MB default budget, LRU eviction, 10-minute idle timeout) retains the CSR topology across repeated invocations.

Parameters

NameTypeDescription
table_nameSpecify the name of the registered Delta table containing edge data. The table must include source and target columns (auto-detected as src/source/src_id and dst/target/dst_id). Edge weights, if present, are ignored: k-core operates strictly on graph topology. For directed graphs the node degree is computed as in-degree plus out-degree, matching the Neo4j GDS UNDIRECTED projection semantics.

Examples

CREATE TABLE coauthor AS
SELECT * FROM VALUES
  (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4),
  (4, 5), (5, 6), (5, 7), (6, 7),
  (8, 9)
AS t(src, dst);
SELECT node_id, coreness
FROM graph_kcore('coauthor')
ORDER BY coreness DESC, node_id;
WITH cores AS (
  SELECT node_id, coreness FROM graph_kcore('coauthor')
)
SELECT node_id
FROM cores
WHERE coreness = (SELECT MAX(coreness) FROM cores)
ORDER BY node_id;
SELECT coreness, COUNT(*) AS node_count
FROM graph_kcore('coauthor')
GROUP BY coreness
ORDER BY coreness;
SELECT node_id
FROM graph_kcore('coauthor')
WHERE coreness < 2
ORDER BY node_id;
SELECT
  l.community_id,
  AVG(k.coreness) AS mean_coreness,
  MAX(k.coreness) AS peak_coreness
FROM graph_louvain('coauthor') l
JOIN graph_kcore('coauthor') k ON l.node_id = k.node_id
GROUP BY l.community_id
ORDER BY peak_coreness DESC;

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →