CREATE GRAPHCSR

Pre-builds and writes the Compressed Sparse Row (CSR) topology for a named graph to disk, enabling subsequent Cypher queries to load the graph in approximately 200ms instead of rebuilding from Delta tables.

Category: graphDeltaForge extension

Syntax

CREATE GRAPHCSR <graph_name>

Description

## Overview CREATE GRAPHCSR explicitly builds the Compressed Sparse Row (CSR) topology for an existing graph definition and writes it as a binary .dcsr file to disk. This command is the manual counterpart to the automatic CSR caching that occurs when AUTO CACHE CSR is enabled (the default). The CSR format is a compact adjacency representation where all neighbors of a given vertex are stored in a contiguous memory region. This layout enables O(1) neighbor lookups and is critical for the performance of graph algorithms such as PageRank, BFS, shortest path, and community detection. ## When to Use Use CREATE GRAPHCSR in the following scenarios: - **After bulk data loads**: When ingesting large batches of edges, disable auto-caching with NO AUTO CACHE CSR during the load, then run CREATE GRAPHCSR once after the load completes. This avoids rebuilding the CSR after each intermediate write. - **Controlled refresh cycles**: In high-frequency write workloads, the CSR cache becomes stale after every Delta table modification. Rather than rebuilding automatically on each query, operators can schedule CREATE GRAPHCSR at defined intervals (e.g., hourly, after nightly ETL). - **Pre-warming compute nodes**: Run CREATE GRAPHCSR during deployment or startup to ensure the first Cypher query executes at full speed without an initial rebuild penalty. ## Execution Steps 1. The engine resolves the graph name to its definition in the graph definition registry (by simple name or fully qualified entity reference). 2. A CypherExecutor is instantiated with the session registries and graph buffer pool. 3. The full CSR topology is built by scanning the vertex and edge Delta tables. All vertex properties and edge properties requested by the query requirements are included. 4. The CSR is written as a compressed .dcsr file to `{edge_table_path}/_deltaforge/{graph_name}.dcsr`. The file includes the current Delta table versions for both the vertex and edge tables. 5. The engine reports the node count, edge count, and build time. ## Disk Format The .dcsr file uses a versioned binary format with zstd compression (format v2). It stores: - A magic header for format identification. - The Delta table version numbers for both the vertex table and edge table. These versions are checked at load time; a version mismatch triggers a rebuild. - The CSR offset array (one entry per vertex plus one sentinel), which encodes where each vertex's neighbor list begins. - The CSR target array (one entry per edge), which stores the target vertex index for each edge. - Optional edge weight and label arrays. The file is co-located with the edge table data under the `_deltaforge/` directory. This ensures any compute node that can read the edge table also finds the CSR cache, including when tables are stored on cloud object storage (S3, GCS, Azure Blob). ## Performance Characteristics | Operation | Typical Latency | |-----------|----------------| | CSR build from Delta tables (full rebuild) | 6-14 seconds for large graphs | | CSR load from .dcsr disk file | ~200ms | | CSR load from memory buffer pool | sub-millisecond | CREATE GRAPHCSR always writes to disk, even when the graph definition has NO AUTO CACHE CSR set. This is the intended behavior: the command gives operators explicit control over when the cache is populated. ## Access Control | Privilege | Object | Notes | |-----------|--------|-------| | READ or higher | Vertex table | Required to scan vertex data during CSR construction. | | READ or higher | Edge table | Required to scan edge data during CSR construction. | | WRITE | Edge table storage path | Required to write the .dcsr file to the _deltaforge directory. | ## Compatibility CREATE GRAPHCSR is a DeltaForge extension. There is no equivalent in standard SQL or openCypher. It complements the CREATE GRAPH command by separating graph definition from physical topology materialization.

Parameters

NameTypeDescription
nameGraph name (must exist).

Examples

-- Build the CSR cache for a graph after initial creation
CREATE GRAPH social_network
    VERTEX TABLE gold.social.persons ID COLUMN person_id
    EDGE TABLE gold.social.friendships SOURCE COLUMN src TARGET COLUMN dst
    NO AUTO CACHE CSR
    DIRECTED;

CREATE GRAPHCSR social_network;
-- Refresh the CSR cache after loading new edge data
INSERT INTO gold.social.friendships
SELECT src, dst, weight FROM staging.new_connections;

CREATE GRAPHCSR social_network;
-- Build CSR for a fully qualified graph name
CREATE GRAPHCSR gold.social.social_network;
-- Typical workflow: disable auto-cache, load data, build CSR once
CREATE GRAPH IF NOT EXISTS batch_graph
    VERTEX TABLE gold.etl.nodes ID COLUMN node_id
    EDGE TABLE gold.etl.edges SOURCE COLUMN src TARGET COLUMN dst
    NO AUTO CACHE CSR
    DIRECTED;

-- ... load millions of rows into edges table ...

-- Build the CSR once, after all data is loaded
CREATE GRAPHCSR batch_graph;

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →