Closeness Centrality

Compute closeness centrality measuring how close a node is to all others

Category: centrality

Syntax

SELECT * FROM graph_closeness(table_name)

Description

Closeness centrality measures how near a node is to all other nodes in the graph by computing the reciprocal of the average shortest-path distance. Nodes with high closeness can reach every other node in fewer hops or with lower total weight, making them ideal candidates for facility placement, information dissemination origins, or efficient relay points in communication networks. The function loads edge data from a registered Delta table into a Compressed Sparse Row (CSR) representation. For each node, a single-source shortest-path computation (BFS for unweighted, Dijkstra for weighted) determines the distances to all other reachable nodes. The closeness score is then the inverse of the mean distance. In disconnected graphs, only the reachable portion of the graph contributes to the score; isolated nodes receive a closeness of 0.0. The time complexity is O(V * (V + E)) for unweighted graphs and O(V * (E + V log V)) for weighted graphs, where V is the vertex count and E is the edge count. Like betweenness centrality, this requires an all-pairs computation and can be expensive on large graphs. For graphs exceeding 50,000 nodes, consider restricting the computation to a subgraph or using PageRank as a less costly proxy for centrality. GPU acceleration is not wired through the graph_closeness table function, so an ON GPU hint on a SQL query that uses it is silently ignored and it runs on the CPU. The Cypher CALL surface (USE ... ON GPU ... CALL algo.closeness) does expose a GPU dispatch, returning the same node_id, score, closeness, and rank columns as the CPU path. The graph cache (256 MB default budget, LRU eviction, 10-minute idle timeout) retains the CSR topology between calls, so repeated closeness queries on the same table reuse the cached graph structure without reloading from the Delta table.

Parameters

Name	Type	Description
`table_name`		Specify the name of the registered Delta table containing edge data. The table must include source and target columns (auto-detected as src/source/src_id and dst/target/dst_id). An optional weight column is used for weighted distance computation if present.

Examples

CREATE TABLE city_routes AS
SELECT * FROM VALUES
  (1, 2, 10.0),
  (1, 3, 15.0),
  (2, 3, 5.0),
  (2, 4, 20.0),
  (3, 4, 8.0),
  (4, 5, 12.0)
AS t(src, dst, weight);

SELECT * FROM graph_closeness('city_routes');

SELECT node_id, score
FROM graph_closeness('city_routes')
ORDER BY score DESC
LIMIT 1;

SELECT
  c.node_id,
  c.score AS closeness,
  b.score AS betweenness
FROM graph_closeness('city_routes') c
JOIN graph_betweenness('city_routes') b
  ON c.node_id = b.node_id
ORDER BY c.score DESC;

SELECT node_id, score, rank
FROM graph_closeness('city_routes')
WHERE score > (
  SELECT AVG(score) FROM graph_closeness('city_routes')
)
ORDER BY score DESC;

Pitfalls

Isolated nodes and members of disconnected components receive a closeness score of 0.0 rather than an error. Analyses that rank nodes globally should split the graph into components first, or filter out zero-score nodes to avoid treating unreachable and peripheral nodes identically.
Closeness does not apply the Wasserman-Faust normalisation by default. When the graph has multiple components, raw classical closeness under-penalises nodes in small components relative to nodes in a single large component. Compute per-component closeness if cross-component comparison is required.
Time complexity is O(V * (V + E)) for unweighted graphs and O(V * (E + V log V)) for weighted. Graphs above roughly 50,000 nodes produce long runtimes with no sampling approximation available; budget compute accordingly or use PageRank as a cheaper proxy.
Weighted closeness uses Dijkstra with positive edge weights only. Negative or zero weights produce undefined behaviour and silently distort results. Pre-filter the weight column to exclude non-positive values before invoking the function.
GPU acceleration is not wired through the graph_closeness table function, so an ON GPU hint on a SQL query that uses it is silently ignored and it runs on the CPU. The Cypher CALL surface (USE ... ON GPU ... CALL algo.closeness) does expose a GPU dispatch, returning the same node_id, score, closeness, and rank columns as the CPU path.