PageRank

Compute PageRank centrality scores for all nodes in a graph

Category: centrality

Syntax

SELECT * FROM graph_pagerank(table_name [, damping] [, iterations])

Description

PageRank computes a ranking of nodes in a directed graph based on the structure of incoming links. Originally developed for ranking web pages, it assigns each node a score proportional to the number and quality of edges pointing to it. The algorithm models a random walker that follows outgoing edges with probability equal to the damping factor and teleports to a uniformly random node with the remaining probability. Nodes that receive links from other high-scoring nodes accumulate higher scores. The function reads edge data from a Delta table registered in the current session. The table is loaded into an in-memory Compressed Sparse Row (CSR) representation, which provides O(1) neighbor lookups per node. If a weight column is present, edge weights influence the transition probabilities. The graph loader auto-detects standard column naming conventions (src/source/src_id for source, dst/target/dst_id for destination). PageRank uses a power-iteration approach with time complexity of O(iterations * (V + E)), where V is the number of nodes and E is the number of edges. Each iteration scans the full adjacency structure once. For graphs with millions of edges, 20 iterations typically suffice for convergence. Increasing the iteration count beyond 40 rarely changes the relative ranking of nodes. GPU acceleration is available for this algorithm. When a GPU accelerator is registered with the session, the engine attempts GPU execution first and falls back to CPU automatically if the GPU path is unavailable or if the graph exceeds GPU memory. GPU execution is most beneficial for graphs exceeding 100,000 edges. The graph cache (256 MB default budget, LRU eviction, 10-minute idle timeout) retains the CSR topology between repeated invocations, so successive PageRank calls on the same table skip the CSR construction phase.

Parameters

NameTypeDescription
table_nameSpecify the name of the registered Delta table that contains edge data. The table must have source and target columns (auto-detected as src/source/src_id and dst/target/dst_id). An optional weight column is used if present.
dampingSet the damping factor that controls the probability of following an outgoing edge versus teleporting to a random node. Valid range is 0.0 to 1.0. Default is 0.85. Lower values increase the teleportation probability and reduce the influence of graph structure on final scores.
iterationsSet the number of power-iteration rounds to execute. Default is 20. Higher values improve convergence accuracy at the cost of additional computation time. Most graphs converge well within 20 to 40 iterations.

Examples

CREATE TABLE web_links AS
SELECT * FROM VALUES
  (1, 2, 1.0),
  (1, 3, 1.0),
  (2, 3, 1.0),
  (3, 1, 1.0),
  (4, 3, 1.0)
AS t(src, dst, weight);
SELECT * FROM graph_pagerank('web_links');
SELECT * FROM graph_pagerank('web_links', 0.90, 50);
SELECT node_id, score
FROM graph_pagerank('web_links')
ORDER BY score DESC
LIMIT 10;
SELECT p.node_id, n.page_url, p.score, p.rank
FROM graph_pagerank('web_links') p
JOIN pages n ON p.node_id = n.id
ORDER BY p.rank;
SELECT
  a.node_id,
  a.score AS score_085,
  b.score AS score_050
FROM graph_pagerank('web_links', 0.85, 20) a
JOIN graph_pagerank('web_links', 0.50, 20) b
  ON a.node_id = b.node_id
ORDER BY a.score DESC;

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →