FastRP Node Embeddings

Compute Fast Random Projection node embeddings via iterated sparse projections of the row-normalised adjacency matrix

Category: embeddings

Syntax

SELECT * FROM graph_fastrp(table_name [, embedding_dim [, normalization_strength [, seed]]])

Description

FastRP (Fast Random Projection, Chen, Hamilton, Sun and Sosic 2019) generates dense node embeddings cheaply by repeatedly projecting the row-normalised adjacency matrix through a sparse random matrix. The algorithm matches Neo4j GDS `gds.fastRP` semantics and produces embeddings that are competitive with DeepWalk or node2vec at a small fraction of the cost: linear time in the edge count and the embedding dimension, with no iterative training loop. The pipeline has three stages. First, an initial sparse projection R_0 is constructed: each row of R_0 is a d-dimensional Achlioptas-style ±√(3/d) vector with density 1/3 (two nonzeros per row on average), with element values drawn deterministically from `(node_id, dimension, seed)` via SplitMix64. Second, for each iteration `t = 1..k` the engine computes `R_t = L2_normalise_rows(W · R_{t-1})`, where `W` is the transition probability matrix derived from the row-normalised adjacency (directed graphs walk the reverse CSR matching Neo4j's NATURAL projection; undirected graphs walk the forward CSR). Each neighbour contribution is scaled by `out_degree(u)^normalization_strength` before aggregation, then divided by the aggregating degree of `v` and L2-normalised per row. Third, the final embedding is the weighted sum `E = Σ_t iteration_weights[t] · R_t` with the engine-fixed schedule `[0.0, 1.0, 1.0, 1.0]` (skip identity, equal-weight 3-hop), giving four iterations total. The time complexity is O(k · (E + V · d)), linear in edges and in the embedding dimension; the per-iteration memory footprint is two `V × d` f64 buffers plus the accumulator. CPU execution uses Rayon to parallelise the per-node aggregation loop; the projection-matrix construction is also parallel. GPU acceleration is available and is typically a substantial win on dense graphs and on high embedding dimensions. The GPU port (one workgroup per (v, d) cell in `fastrp_aggregate.wgsl`) splits the dispatch into a 2-D grid using the adapter's advertised per-axis workgroup limit at runtime, so the dispatch scales with whatever the device exposes rather than a hardcoded 65535 ceiling. Init, neighbour aggregation, L2 row normalisation, and weighted accumulation are four separate pipelines chained on the device; only the final embedding is read back. The graph cache (256 MB default budget, LRU eviction, 10-minute idle timeout) retains the CSR so repeated invocations skip graph construction.

Parameters

NameTypeDescription
table_nameSpecify the name of the registered Delta table containing edge data. The table must include source and target columns (auto-detected as src/source/src_id and dst/target/dst_id). Edge weights are honoured when present and are used as the transition weight in the neighbour-aggregation step. The graph is loaded directed when the source table is directed (the reverse CSR drives the propagation kernel) and undirected otherwise.
embedding_dimSet the embedding dimension d. Every node receives a d-wide f64 vector serialised as a JSON-style string. Matches the Neo4j GDS default of 128. Larger values capture more structural variation at the cost of linear memory in d. Setting `embedding_dim = 0` returns an empty result (no nodes are emitted). Useful small values for debugging or for human-readable output sit in the 4 to 16 range.
normalization_strengthSet the exponent applied to each node's aggregating degree when blending neighbour contributions. The contribution of neighbour `u` into node `v` is scaled by `out_degree(u)^normalization_strength` (or in-degree for the directed-NATURAL projection). Zero (the Neo4j default) collapses to uniform weighting. Positive values bias the embedding toward high-degree neighbours; negative values bias it toward low-degree neighbours. Typical range is `-1.0` to `+1.0`.
seedSet the seed for the SplitMix64 PRNG that builds the sparse Achlioptas projection matrix R_0. Embeddings are deterministic for a fixed `(table_name, embedding_dim, normalization_strength, seed)` tuple. Change the seed to draw an independent embedding for the same graph (useful for variance estimates or ensemble downstream models).

Examples

CREATE TABLE frp_demo AS
SELECT * FROM VALUES
  (1, 2, 1.0), (2, 3, 1.0), (3, 1, 1.0),
  (4, 5, 1.0), (5, 6, 1.0), (6, 4, 1.0),
  (3, 4, 0.5)
AS t(src, dst, weight);
SELECT * FROM graph_fastrp('frp_demo', 4) ORDER BY node_id;
SELECT * FROM graph_fastrp('frp_demo');
SELECT * FROM graph_fastrp('frp_demo', 32, 0.5);
SELECT * FROM graph_fastrp('frp_demo', 32, 0.0, 1234);
CREATE TABLE node_embeddings AS
SELECT node_id, embedding
FROM graph_fastrp('frp_demo', 64);

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →