CREATE GRAPH

Creates a named property graph definition that maps a vertex table and an edge table from the Delta lakehouse into a queryable graph structure.

Category: graphDeltaForge extension

Syntax

CREATE GRAPH [IF NOT EXISTS] <name>
  VERTEX TABLE <table> ID COLUMN <col> [NODE TYPE COLUMN <col>] [NODE NAME COLUMN <col>]
  EDGE TABLE <table> SOURCE COLUMN <col> TARGET COLUMN <col>
    [WEIGHT COLUMN <col>] [EDGE TYPE COLUMN <col>]
  [DIRECTED|UNDIRECTED]
  [VERTEX PROPERTIES ...] [EDGE PROPERTIES ...]

Description

## Overview CREATE GRAPH defines a named property graph by mapping a vertex table and an edge table from the Delta lakehouse into a graph topology. The graph definition is registered in the session for immediate Cypher query use and persisted to the catalog so it appears in the Graph Tables UI and survives session restarts. Graphs in DeltaForge follow the property graph model: vertices carry properties (columns or JSON fields), edges carry properties, and both vertices and edges can have categorical labels. The underlying Delta tables are never copied or modified; the graph definition is a logical overlay that references existing table data. ## Execution Steps 1. The vertex table configuration (ID column, optional type/name columns, property mode) is registered in the session graph config registry. 2. The edge table configuration (source/target columns, optional weight/label columns, property mode) is registered in the session graph config registry. 3. A graph definition record is created in the graph definition registry, which resolves the entity reference (zone.schema.graph_name) for Cypher USE clause lookups. 4. The definition is persisted to the catalog via the catalog router (best-effort). If the catalog is unavailable, the graph remains usable within the current session. ## CSR Topology When a Cypher query first references the graph, the engine builds a Compressed Sparse Row (CSR) representation from the Delta tables. The CSR is an adjacency structure that enables O(1) neighbor lookups per vertex. The build process has three tiers: 1. **Memory cache** -- If the CSR is already in the session graph buffer pool, it is reused immediately (sub-millisecond). 2. **Disk cache** -- If a .dcsr file exists at the edge table path and the Delta table versions match, the CSR is loaded from disk (approximately 200ms). 3. **Full rebuild** -- The engine scans the vertex and edge Delta tables, constructs the CSR in memory, and (when AUTO CACHE CSR is enabled) writes the result to disk for future queries. ## Property Storage Modes Three modes control how vertex and edge properties are resolved during Cypher property access (e.g., n.department, r.weight): - **Flattened** (default): Every column in the table is available as a named property. This mode benefits from predicate pushdown and columnar scan optimizations. - **JSON**: A single STRING column contains a JSON object. Property access extracts values at query time using JSON path evaluation. This is the most flexible mode for schema-on-read workloads. - **Hybrid**: A set of frequently accessed columns are available as flattened properties, while additional properties overflow into a JSON column. This balances query performance with schema flexibility. ## Cypher Integration Once defined, the graph is referenced via the Cypher USE clause: ```sql USE zone.schema.graph_name MATCH (a)-[r]->(b) RETURN a.id, b.id, r.weight ``` The full suite of graph algorithms is available through CALL statements: pageRank, betweenness, closeness, louvain, connectedComponents, scc, shortestPath, allShortestPaths, bfs, dfs, triangleCount, mst, knn, similarity, and degree. ## Access Control | Privilege | Object | Notes | |-----------|--------|-------| | READ or higher | Vertex table | Required to scan vertex data during CSR build and Cypher queries. | | READ or higher | Edge table | Required to scan edge data during CSR build and Cypher queries. | ## Compatibility CREATE GRAPH is a DeltaForge extension. It implements the SQL/PGQ (Property Graph Queries) concept of named graph definitions mapped to relational tables. The syntax is designed for interoperability with openCypher query semantics.

Parameters

NameTypeDescription
nameSpecifies the identifier for the graph definition. May be a simple name (e.g., social_network) or a fully qualified three-part name (zone.schema.graph_name). When a simple name is used, the zone and schema are derived from the vertex table reference. The graph name is used as the target of the Cypher USE clause.
vertex_tableVertex table (fully qualified).
vertex_id_columnVertex ID column.
edge_tableEdge table (fully qualified).
source_columnSource column for edges.
target_columnTarget column for edges.
directedWhether the graph is directed (default: true).

Examples

-- Create a basic directed graph from person and friendship tables
CREATE GRAPH social_network
    VERTEX TABLE gold.social.persons ID COLUMN person_id
    EDGE TABLE gold.social.friendships SOURCE COLUMN src TARGET COLUMN dst
    DIRECTED;
-- Create a graph with vertex labels, edge weights, and edge types
CREATE GRAPH IF NOT EXISTS org_chart
    VERTEX TABLE gold.hr.employees ID COLUMN id
        NODE TYPE COLUMN department
        NODE NAME COLUMN full_name
    EDGE TABLE gold.hr.reports_to SOURCE COLUMN employee_id TARGET COLUMN manager_id
        WEIGHT COLUMN influence_score
        EDGE TYPE COLUMN relationship
    DIRECTED;
-- Create an undirected graph for symmetric relationships
CREATE GRAPH collaboration_net
    VERTEX TABLE gold.research.authors ID COLUMN author_id
    EDGE TABLE gold.research.coauthorships SOURCE COLUMN author_a TARGET COLUMN author_b
    UNDIRECTED;
-- Create a graph with hybrid property storage (core columns plus JSON extras)
CREATE GRAPH customer_graph
    VERTEX TABLE gold.crm.customers ID COLUMN cust_id
        NODE TYPE COLUMN segment
        NODE NAME COLUMN name
    EDGE TABLE gold.crm.transactions SOURCE COLUMN buyer_id TARGET COLUMN seller_id
        WEIGHT COLUMN amount
        EDGE TYPE COLUMN txn_type
    DIRECTED
    VERTEX PROPERTIES HYBRID COLUMNS (name, segment) JSON COLUMN extras
    EDGE PROPERTIES HYBRID COLUMNS (amount, txn_type) JSON COLUMN metadata;
-- Create a graph with JSON-only property storage
CREATE GRAPH flexible_graph
    VERTEX TABLE gold.analytics.nodes ID COLUMN node_id
        NODE TYPE COLUMN category
    EDGE TABLE gold.analytics.links SOURCE COLUMN from_id TARGET COLUMN to_id
    DIRECTED
    VERTEX PROPERTIES JSON COLUMN props
    EDGE PROPERTIES JSON COLUMN props;
-- Create a graph with automatic CSR caching disabled for frequent writes
CREATE GRAPH IF NOT EXISTS live_feed
    VERTEX TABLE gold.events.entities ID COLUMN entity_id
    EDGE TABLE gold.events.interactions SOURCE COLUMN src TARGET COLUMN dst
    DIRECTED
    NO AUTO CACHE CSR;

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →