Compute closeness centrality measuring how close a node is to all others
SELECT * FROM graph_closeness(table_name)
Closeness centrality measures how near a node is to all other nodes in the graph by computing the reciprocal of the average shortest-path distance. Nodes with high closeness can reach every other node in fewer hops or with lower total weight, making them ideal candidates for facility placement, information dissemination origins, or efficient relay points in communication networks. The function loads edge data from a registered Delta table into a Compressed Sparse Row (CSR) representation. For each node, a single-source shortest-path computation (BFS for unweighted, Dijkstra for weighted) determines the distances to all other reachable nodes. The closeness score is then the inverse of the mean distance. In disconnected graphs, only the reachable portion of the graph contributes to the score; isolated nodes receive a closeness of 0.0. The time complexity is O(V * (V + E)) for unweighted graphs and O(V * (E + V log V)) for weighted graphs, where V is the vertex count and E is the edge count. Like betweenness centrality, this requires an all-pairs computation and can be expensive on large graphs. For graphs exceeding 50,000 nodes, consider restricting the computation to a subgraph or using PageRank as a less costly proxy for centrality. GPU acceleration is not currently available for closeness centrality. All computation runs on the CPU. The graph cache (256 MB default budget, LRU eviction, 10-minute idle timeout) retains the CSR topology between calls, so repeated closeness queries on the same table reuse the cached graph structure without reloading from the Delta table.
| Name | Type | Description |
|---|---|---|
table_name | Specify the name of the registered Delta table containing edge data. The table must include source and target columns (auto-detected as src/source/src_id and dst/target/dst_id). An optional weight column is used for weighted distance computation if present. |
CREATE TABLE city_routes AS
SELECT * FROM VALUES
(1, 2, 10.0),
(1, 3, 15.0),
(2, 3, 5.0),
(2, 4, 20.0),
(3, 4, 8.0),
(4, 5, 12.0)
AS t(src, dst, weight);
SELECT * FROM graph_closeness('city_routes');
SELECT node_id, score
FROM graph_closeness('city_routes')
ORDER BY score DESC
LIMIT 1;
SELECT
c.node_id,
c.score AS closeness,
b.score AS betweenness
FROM graph_closeness('city_routes') c
JOIN graph_betweenness('city_routes') b
ON c.node_id = b.node_id
ORDER BY c.score DESC;
SELECT node_id, score, rank
FROM graph_closeness('city_routes')
WHERE score > (
SELECT AVG(score) FROM graph_closeness('city_routes')
)
ORDER BY score DESC;