Betweenness Centrality

Compute betweenness centrality measuring how often a node lies on shortest paths

Category: centrality

Syntax

SELECT * FROM graph_betweenness(table_name)

Description

Betweenness centrality quantifies the extent to which a node lies on the shortest paths between other pairs of nodes in the graph. A node with high betweenness centrality acts as a bridge or broker, controlling the flow of information or resources between otherwise distant parts of the network. This metric is widely used in social network analysis to identify influential intermediaries, in transportation networks to detect critical junctions, and in biological networks to find key regulatory genes. The function loads edge data from a registered Delta table into a Compressed Sparse Row (CSR) graph representation. The algorithm computes all-pairs shortest paths using Brandes' algorithm, which accumulates dependency scores via a BFS-based traversal from each source node. Column names for source and target nodes are auto-detected from standard conventions (src/source/src_id, dst/target/dst_id). If a weight column is present, weighted shortest paths are used. The time complexity is O(V * E) for unweighted graphs and O(V * E + V^2 * log V) for weighted graphs, where V is the vertex count and E is the edge count. Because the algorithm requires an all-pairs computation, it is significantly more expensive than PageRank or degree centrality. For large graphs (over 50,000 nodes), consider sampling strategies or restrict analysis to subgraphs. GPU acceleration is available for this algorithm. When a GPU accelerator is registered, the engine attempts GPU-based parallel BFS traversals first and falls back to CPU if GPU memory is insufficient or unavailable. The graph cache (256 MB default, LRU eviction, 10-minute idle timeout) stores the CSR topology so that repeated betweenness computations on the same table avoid redundant graph construction.

Parameters

Name	Type	Description
`table_name`		Specify the name of the registered Delta table containing edge data. The table must include source and target columns (auto-detected as src/source/src_id and dst/target/dst_id). An optional weight column is used for weighted shortest-path computation if present.

Examples

CREATE TABLE social_edges AS
SELECT * FROM VALUES
  (1, 2, 1.0),
  (1, 3, 1.0),
  (2, 4, 1.0),
  (3, 4, 1.0),
  (4, 5, 1.0),
  (5, 6, 1.0),
  (5, 7, 1.0)
AS t(src, dst, weight);

SELECT * FROM graph_betweenness('social_edges');

SELECT node_id, score
FROM graph_betweenness('social_edges')
WHERE score > 0
ORDER BY score DESC
LIMIT 5;

SELECT
  b.node_id,
  b.score AS betweenness,
  d.total_degree
FROM graph_betweenness('social_edges') b
JOIN graph_degree('social_edges') d
  ON b.node_id = d.node_id
WHERE b.score > 0
ORDER BY b.score DESC;

SELECT b.node_id, u.name, b.score, b.rank
FROM graph_betweenness('social_edges') b
JOIN users u ON b.node_id = u.id
ORDER BY b.rank;

Pitfalls

Betweenness is an all-pairs computation with O(V * E) time and O(V) memory per source BFS. Graphs above roughly 100,000 nodes produce long runtimes (minutes to hours). Use sampling-based variants or restrict analysis to subgraphs for large inputs.
Scores are normalised by the total number of node pairs ((n-1)(n-2)), so values are comparable across graphs of different sizes. Raw, unnormalised scores are not exposed, which can surprise users comparing against external implementations that return absolute counts.
Directionality affects results. A node that is a critical bridge in the undirected projection may have near-zero directed betweenness if shortest paths rarely traverse it along the allowed direction. Always verify the graph directionality matches the question being asked.
Weighted betweenness uses Dijkstra-style shortest paths when a weight column is present. Zero or negative weights produce undefined behaviour and may cause incorrect shortest-path selection. Filter out non-positive weights before calling the function.
GPU fallback is silent. When the graph exceeds GPU memory the engine runs the CPU path without raising an error, so users expecting GPU speed-up may observe CPU-level latency without feedback. Check execution hints via EXPLAIN to confirm the chosen device.
Rank ties are broken by node_id order, not score order. For graphs with many symmetric nodes, two runs on the same data return identical ranks, but small topological changes can flip ranks between near-equal nodes in ways that look like instability.