K-Core Decomposition

Assign each node its coreness number, the largest k for which the node belongs to a k-core

Category: community-detection

Syntax

SELECT * FROM graph_kcore(table_name)

Description

K-core decomposition labels every node with the largest integer k for which the node still belongs to a subgraph where every member has at least k connections. The k-core itself is the maximal subgraph in which every vertex has degree at least k inside that subgraph. Coreness is a robust, parameter-free density measure: nodes deep inside a tight cluster get high coreness values, peripheral nodes get low values, and isolated nodes get 0. Use this for graph summarisation, dense-subgraph mining, identifying central or 'core' actors in a network, pruning low-importance vertices before applying a more expensive community-detection algorithm, and visual layout where central nodes should be placed first. The function loads edge data from a registered Delta table into a Compressed Sparse Row (CSR) representation. Source and target column names are auto-detected from standard conventions. The CPU implementation uses the linear-time Batagelj-Zaversnik bucket-peeling algorithm: nodes are initially placed in degree-indexed buckets, and the algorithm repeatedly removes the lowest-degree node, records its current degree as its coreness, and decrements the bucket position of each surviving neighbour. Each edge is touched once, giving an O(V + E) total cost. The output matches Neo4j GDS `gds.kcore` semantics: directed graphs are treated with combined in plus out degree (the UNDIRECTED projection). GPU acceleration is available for k-core decomposition. The GPU implementation follows cuGraph's level-synchronous peeling pattern: for each k starting at 0, all currently-alive vertices with live degree at most k are dispatched as a frontier, each frontier thread peels its vertex and atomically decrements the live degree of every surviving neighbour, and the threads that pushed a neighbour from above-k to at-or-below-k claim that neighbour for the next-round frontier. The loop advances k and repeats until every vertex is peeled. Self-loops are filtered, multi-edges are counted. GPU execution is most beneficial for graphs exceeding a few hundred thousand edges; smaller graphs are dominated by host-to-device transfer cost and may run faster on the CPU. The graph cache (256 MB default budget, LRU eviction, 10-minute idle timeout) retains the CSR topology across repeated invocations.

Parameters

Name	Type	Description
`table_name`		Specify the name of the registered Delta table containing edge data. The table must include source and target columns (auto-detected as src/source/src_id and dst/target/dst_id). Edge weights, if present, are ignored: k-core operates strictly on graph topology. For directed graphs the node degree is computed as in-degree plus out-degree, matching the Neo4j GDS UNDIRECTED projection semantics.

Examples

CREATE TABLE coauthor AS
SELECT * FROM VALUES
  (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4),
  (4, 5), (5, 6), (5, 7), (6, 7),
  (8, 9)
AS t(src, dst);

SELECT node_id, coreness
FROM graph_kcore('coauthor')
ORDER BY coreness DESC, node_id;

WITH cores AS (
  SELECT node_id, coreness FROM graph_kcore('coauthor')
)
SELECT node_id
FROM cores
WHERE coreness = (SELECT MAX(coreness) FROM cores)
ORDER BY node_id;

SELECT coreness, COUNT(*) AS node_count
FROM graph_kcore('coauthor')
GROUP BY coreness
ORDER BY coreness;

SELECT node_id
FROM graph_kcore('coauthor')
WHERE coreness < 2
ORDER BY node_id;

SELECT
  l.community_id,
  AVG(k.coreness) AS mean_coreness,
  MAX(k.coreness) AS peak_coreness
FROM graph_louvain('coauthor') l
JOIN graph_kcore('coauthor') k ON l.node_id = k.node_id
GROUP BY l.community_id
ORDER BY peak_coreness DESC;

Pitfalls

The output shape is identical to a centrality result (one row per node, one numeric value per node), but the value is a coreness number, not a score, rank, or probability. Coreness is an integer in the range [0, max_degree] and is a structural label rather than a relative ranking; treating it like a PageRank score will produce misleading downstream comparisons.
Coreness depends only on topology, not on edge weights. A weight column on the input table is silently ignored. If a weighted notion of core membership is required, prefilter the edge table to remove low-weight edges before invocation; the function will then operate on the surviving subgraph.
On directed graphs the implementation treats degree as in-degree plus out-degree, matching the Neo4j GDS UNDIRECTED projection. This means a directed graph and its symmetrised undirected version produce coreness values that are off by exactly the multiplicity of bidirectional edges per node. If single-direction semantics matter, projection-shape the input table before invocation.
Maximum coreness is bounded by the densest subgraph, not by the number of nodes. On sparse real-world graphs the top coreness is typically much smaller than V; do not assume a wide range of distinct values when bucketing for visualisation or for thresholds.
Coreness is a monotone hierarchy: every k-core is contained in every (k-1)-core. Filtering by `coreness >= k` therefore extracts the exact k-core subgraph, but the result is the node set, not the edge-induced subgraph. To materialise the edges of the k-core, join the result back to the original edge table on both endpoints.
GPU execution silently falls back to CPU when the GPU accelerator is not registered or when the graph does not meet the engine's minimum-size threshold for GPU dispatch. Use EXPLAIN ANALYZE or the execution hints surface if the ON GPU latency is critical.