HITS (Hubs and Authorities)

Compute mutually reinforcing hub and authority scores for every node

Category: centrality

Syntax

SELECT * FROM graph_hits(table_name [, iterations] [, tolerance])

Description

HITS (Hyperlink-Induced Topic Search, Kleinberg 1999) computes two complementary per-node scores on a directed graph: a hub score that grows with the authority scores of the nodes a vertex points to, and an authority score that grows with the hub scores of the nodes pointing in. The two scores are co-iterated to a mutual fixed point, then independently L2-normalized at every step. The function follows Neo4j GDS gds.alpha.hits semantics exactly: authority[v] equals the sum of hub[u] over all incoming edges u to v, hub[v] equals the sum of authority[w] over all outgoing edges v to w, with hubs at step t consuming the previous step's authority vector in classic Jacobi style. Convergence is declared once both the per-component L-infinity delta on the hub vector and on the authority vector fall below tolerance. The function loads edge data from a registered Delta table as a directed graph in Compressed Sparse Row format. The CSR loader auto-detects source and destination columns from standard names (src/source/src_id, dst/target/dst_id). On undirected projections the hub and authority vectors evolve identically by symmetry and reduce to eigenvector centrality up to scale; for directed inputs the two scores diverge and expose distinct roles (curator versus referenced source). Time complexity is O(iterations times (V + E)) for V vertices and E edges. Each iteration performs four full vector traversals (incoming-edge sum, outgoing-edge sum, two L2 normalizations, plus a convergence check). The CPU path parallelizes every per-vector pass with Rayon. Most real graphs converge in well under 20 iterations. GPU acceleration is available. The dispatcher routes to GPU when the active accelerator registration accepts the graph and the node count meets or exceeds 1,000 (delta-forge-graph-gpu thresholds::HITS); below that the CPU path runs directly because the GPU transfer overhead dominates the compute. The GPU kernel mirrors the cuGraph link_analysis/hits_impl.cuh recipe: pull-style incoming-edge transform-reduce for authority, pull-style outgoing-edge transform-reduce for hub, two two-stage partial-sum-of-squares reductions for the L2 norms, and a two-stage max-abs-delta reduction for convergence, so per-iteration host involvement collapses to an 8-byte readback rather than a full vector round-trip. When the GPU's per-iteration L2 norm is zero on either vector (fully disconnected snapshot) the dispatch surfaces an algorithm error rather than silently returning all zeros, matching cuGraph's CUGRAPH_EXPECTS(hubs_norm > 0) semantic.

Parameters

NameTypeDescription
table_nameSpecifies the name of the registered Delta table containing edge data. The table must expose source and target columns (auto-detected as src/source/src_id and dst/target/dst_id). An optional weight column is consumed when present. Edge direction is preserved: incoming edges feed authority scores, outgoing edges feed hub scores.
iterationsSets the cap on power-iteration rounds. Default is 20, matching the Neo4j GDS gds.alpha.hits default. The loop exits early once the per-iteration L-infinity delta on both the hub and authority vectors falls below the tolerance.
toleranceSets the L-infinity convergence threshold applied independently to the hub and authority vectors after every iteration. Default is 1e-7. The loop continues only while either vector's maximum per-component change still exceeds this value.

Examples

CREATE TABLE citations AS
SELECT * FROM VALUES
  (1, 2, 1.0),
  (1, 3, 1.0),
  (2, 3, 1.0),
  (4, 3, 1.0),
  (4, 5, 1.0),
  (5, 3, 1.0)
AS t(src, dst, weight);
SELECT * FROM graph_hits('citations');
SELECT * FROM graph_hits('citations', 100, 1e-9);
SELECT node_id, authority
FROM graph_hits('citations')
ORDER BY authority DESC
LIMIT 5;
SELECT node_id, hub
FROM graph_hits('citations')
ORDER BY hub DESC
LIMIT 5;
SELECT
  node_id,
  hub,
  authority,
  (authority - hub) AS role_skew
FROM graph_hits('citations')
ORDER BY role_skew DESC;

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →