Creates a named property graph definition that maps a vertex table and an edge table from the Delta lakehouse into a queryable graph structure.
CREATE GRAPH [IF NOT EXISTS] <name>
VERTEX TABLE <table> ID COLUMN <col> [NODE TYPE COLUMN <col>] [NODE NAME COLUMN <col>]
EDGE TABLE <table> SOURCE COLUMN <col> TARGET COLUMN <col>
[WEIGHT COLUMN <col>] [EDGE TYPE COLUMN <col>]
[DIRECTED|UNDIRECTED]
[VERTEX PROPERTIES ...] [EDGE PROPERTIES ...]
## Overview CREATE GRAPH defines a named property graph by mapping a vertex table and an edge table from the Delta lakehouse into a graph topology. The graph definition is registered in the session for immediate Cypher query use and persisted to the catalog so it appears in the Graph Tables UI and survives session restarts. Graphs in DeltaForge follow the property graph model: vertices carry properties (columns or JSON fields), edges carry properties, and both vertices and edges can have categorical labels. The underlying Delta tables are never copied or modified; the graph definition is a logical overlay that references existing table data. ## Execution Steps 1. The vertex table configuration (ID column, optional type/name columns, property mode) is registered in the session graph config registry. 2. The edge table configuration (source/target columns, optional weight/label columns, property mode) is registered in the session graph config registry. 3. A graph definition record is created in the graph definition registry, which resolves the entity reference (zone.schema.graph_name) for Cypher USE clause lookups. 4. The definition is persisted to the catalog via the catalog router (best-effort). If the catalog is unavailable, the graph remains usable within the current session. ## CSR Topology When a Cypher query first references the graph, the engine builds a Compressed Sparse Row (CSR) representation from the Delta tables. The CSR is an adjacency structure that enables O(1) neighbor lookups per vertex. The build process has three tiers: 1. **Memory cache** -- If the CSR is already in the session graph buffer pool, it is reused immediately (sub-millisecond). 2. **Disk cache** -- If a .dcsr file exists at the edge table path and the Delta table versions match, the CSR is loaded from disk (approximately 200ms). 3. **Full rebuild** -- The engine scans the vertex and edge Delta tables, constructs the CSR in memory, and (when AUTO CACHE CSR is enabled) writes the result to disk for future queries. ## Property Storage Modes Three modes control how vertex and edge properties are resolved during Cypher property access (e.g., n.department, r.weight): - **Flattened** (default): Every column in the table is available as a named property. This mode benefits from predicate pushdown and columnar scan optimizations. - **JSON**: A single STRING column contains a JSON object. Property access extracts values at query time using JSON path evaluation. This is the most flexible mode for schema-on-read workloads. - **Hybrid**: A set of frequently accessed columns are available as flattened properties, while additional properties overflow into a JSON column. This balances query performance with schema flexibility. ## Cypher Integration Once defined, the graph is referenced via the Cypher USE clause: ```sql USE zone.schema.graph_name MATCH (a)-[r]->(b) RETURN a.id, b.id, r.weight ``` The full suite of graph algorithms is available through CALL statements: pageRank, betweenness, closeness, louvain, connectedComponents, scc, shortestPath, allShortestPaths, bfs, dfs, triangleCount, mst, knn, similarity, and degree. ## Access Control | Privilege | Object | Notes | |-----------|--------|-------| | READ or higher | Vertex table | Required to scan vertex data during CSR build and Cypher queries. | | READ or higher | Edge table | Required to scan edge data during CSR build and Cypher queries. | ## Compatibility CREATE GRAPH is a DeltaForge extension. It implements the SQL/PGQ (Property Graph Queries) concept of named graph definitions mapped to relational tables. The syntax is designed for interoperability with openCypher query semantics.
| Name | Type | Description |
|---|---|---|
name | Specifies the identifier for the graph definition. May be a simple name (e.g., social_network) or a fully qualified three-part name (zone.schema.graph_name). When a simple name is used, the zone and schema are derived from the vertex table reference. The graph name is used as the target of the Cypher USE clause. | |
vertex_table | Vertex table (fully qualified). | |
vertex_id_column | Vertex ID column. | |
edge_table | Edge table (fully qualified). | |
source_column | Source column for edges. | |
target_column | Target column for edges. | |
directed | Whether the graph is directed (default: true). |
-- Create a basic directed graph from person and friendship tables
CREATE GRAPH social_network
VERTEX TABLE gold.social.persons ID COLUMN person_id
EDGE TABLE gold.social.friendships SOURCE COLUMN src TARGET COLUMN dst
DIRECTED;
-- Create a graph with vertex labels, edge weights, and edge types
CREATE GRAPH IF NOT EXISTS org_chart
VERTEX TABLE gold.hr.employees ID COLUMN id
NODE TYPE COLUMN department
NODE NAME COLUMN full_name
EDGE TABLE gold.hr.reports_to SOURCE COLUMN employee_id TARGET COLUMN manager_id
WEIGHT COLUMN influence_score
EDGE TYPE COLUMN relationship
DIRECTED;
-- Create an undirected graph for symmetric relationships
CREATE GRAPH collaboration_net
VERTEX TABLE gold.research.authors ID COLUMN author_id
EDGE TABLE gold.research.coauthorships SOURCE COLUMN author_a TARGET COLUMN author_b
UNDIRECTED;
-- Create a graph with hybrid property storage (core columns plus JSON extras)
CREATE GRAPH customer_graph
VERTEX TABLE gold.crm.customers ID COLUMN cust_id
NODE TYPE COLUMN segment
NODE NAME COLUMN name
EDGE TABLE gold.crm.transactions SOURCE COLUMN buyer_id TARGET COLUMN seller_id
WEIGHT COLUMN amount
EDGE TYPE COLUMN txn_type
DIRECTED
VERTEX PROPERTIES HYBRID COLUMNS (name, segment) JSON COLUMN extras
EDGE PROPERTIES HYBRID COLUMNS (amount, txn_type) JSON COLUMN metadata;
-- Create a graph with JSON-only property storage
CREATE GRAPH flexible_graph
VERTEX TABLE gold.analytics.nodes ID COLUMN node_id
NODE TYPE COLUMN category
EDGE TABLE gold.analytics.links SOURCE COLUMN from_id TARGET COLUMN to_id
DIRECTED
VERTEX PROPERTIES JSON COLUMN props
EDGE PROPERTIES JSON COLUMN props;
-- Create a graph with automatic CSR caching disabled for frequent writes
CREATE GRAPH IF NOT EXISTS live_feed
VERTEX TABLE gold.events.entities ID COLUMN entity_id
EDGE TABLE gold.events.interactions SOURCE COLUMN src TARGET COLUMN dst
DIRECTED
NO AUTO CACHE CSR;