EXTRACT LINEAGE

Parses a SQL script and extracts table-level data lineage, returning inbound (read) and outbound (write) dependencies with catalog, schema, table, operation type, and dependency classification.

Category: utilityDeltaForge extension

Syntax

EXTRACT LINEAGE FROM '<sql_script>' [SOURCE '<filename>']
EXTRACT LINEAGE FROM FILE '<path>'

Description

## Overview EXTRACT LINEAGE statically analyzes SQL text to determine which tables are read from (inbound dependencies) and which tables are written to (outbound dependencies). The analysis is purely syntactic, parsing the SQL without executing it, and is used for dependency graphs, impact analysis, and catalog lineage metadata. ## Behavior The command operates in two variants: ### Inline SQL The SQL script is provided as a string literal. The engine parses all statements and extracts table references from FROM clauses (inbound) and INSERT INTO, CREATE TABLE AS, MERGE INTO targets (outbound). ### FROM FILE The engine reads the contents of the specified file path, then performs the same analysis as inline SQL. The file path is used as the source identifier in the result. ### Result Set The result contains one row per dependency with eight columns: | Column | Type | Description | |--------|------|-------------| | direction | UTF8 | "INBOUND" (table is read) or "OUTBOUND" (table is written). | | catalog | UTF8, nullable | The catalog (zone) portion of the table reference. | | schema | UTF8, nullable | The schema portion of the table reference. | | table | UTF8 | The table name. | | alias | UTF8, nullable | The table alias used in the query, if any. | | physical_path | UTF8, nullable | The resolved physical storage path, if available. | | operations | UTF8 | Comma-separated list of operations performed (e.g., SELECT, INSERT, MERGE). | | dependency_type | UTF8 | Classification of the dependency (e.g., Direct, Transitive). | The execution result message reports the source identifier and counts of inbound and outbound dependencies. ## Compatibility EXTRACT LINEAGE is a DeltaForge extension. There is no equivalent in standard SQL. The static analysis covers DeltaForge SQL, standard SQL DML, and Cypher MATCH statements.

Parameters

NameTypeDescription
sqlSQL script content or file path.
source_filenameSource filename for identification.
from_fileIf true, sql is a file path.

Examples

-- Extract lineage from inline SQL
EXTRACT LINEAGE FROM 'INSERT INTO gold.summary.daily_orders SELECT order_date, COUNT(*) FROM silver.sales.orders GROUP BY order_date';
-- Extract lineage with a source identifier
EXTRACT LINEAGE FROM 'SELECT * FROM t1 JOIN t2 ON t1.id = t2.id' SOURCE 'etl/transform.sql';
-- Extract lineage from a file on disk
EXTRACT LINEAGE FROM FILE '/workspace/pipelines/daily_etl.sql';
-- Analyze a multi-statement script
EXTRACT LINEAGE FROM '
  INSERT INTO target_a SELECT * FROM source_a;
  INSERT INTO target_b SELECT * FROM source_a JOIN source_b ON source_a.id = source_b.id;
';

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →