Compute betweenness centrality measuring how often a node lies on shortest paths
SELECT * FROM graph_betweenness(table_name)
Betweenness centrality quantifies the extent to which a node lies on the shortest paths between other pairs of nodes in the graph. A node with high betweenness centrality acts as a bridge or broker, controlling the flow of information or resources between otherwise distant parts of the network. This metric is widely used in social network analysis to identify influential intermediaries, in transportation networks to detect critical junctions, and in biological networks to find key regulatory genes. The function loads edge data from a registered Delta table into a Compressed Sparse Row (CSR) graph representation. The algorithm computes all-pairs shortest paths using Brandes' algorithm, which accumulates dependency scores via a BFS-based traversal from each source node. Column names for source and target nodes are auto-detected from standard conventions (src/source/src_id, dst/target/dst_id). If a weight column is present, weighted shortest paths are used. The time complexity is O(V * E) for unweighted graphs and O(V * E + V^2 * log V) for weighted graphs, where V is the vertex count and E is the edge count. Because the algorithm requires an all-pairs computation, it is significantly more expensive than PageRank or degree centrality. For large graphs (over 50,000 nodes), consider sampling strategies or restrict analysis to subgraphs. GPU acceleration is available for this algorithm. When a GPU accelerator is registered, the engine attempts GPU-based parallel BFS traversals first and falls back to CPU if GPU memory is insufficient or unavailable. The graph cache (256 MB default, LRU eviction, 10-minute idle timeout) stores the CSR topology so that repeated betweenness computations on the same table avoid redundant graph construction.
| Name | Type | Description |
|---|---|---|
table_name | Specify the name of the registered Delta table containing edge data. The table must include source and target columns (auto-detected as src/source/src_id and dst/target/dst_id). An optional weight column is used for weighted shortest-path computation if present. |
CREATE TABLE social_edges AS
SELECT * FROM VALUES
(1, 2, 1.0),
(1, 3, 1.0),
(2, 4, 1.0),
(3, 4, 1.0),
(4, 5, 1.0),
(5, 6, 1.0),
(5, 7, 1.0)
AS t(src, dst, weight);
SELECT * FROM graph_betweenness('social_edges');
SELECT node_id, score
FROM graph_betweenness('social_edges')
WHERE score > 0
ORDER BY score DESC
LIMIT 5;
SELECT
b.node_id,
b.score AS betweenness,
d.total_degree
FROM graph_betweenness('social_edges') b
JOIN graph_degree('social_edges') d
ON b.node_id = d.node_id
WHERE b.score > 0
ORDER BY b.score DESC;
SELECT b.node_id, u.name, b.score, b.rank
FROM graph_betweenness('social_edges') b
JOIN users u ON b.node_id = u.id
ORDER BY b.rank;