DELETED

Returns rows that existed in a Delta table at one version but not at a later version, by diffing two time-travel snapshots. One of the DELETED / UPDATED / INSERTED change-extraction trio.

Category: time-travelReturns: SETOF row-of(table)Dialect: DeltaForgePrivilege: readDeltaForge extension

Syntax

DELETED(table_name TEXT [, from_version BIGINT, to_version BIGINT [, key_array TEXT[]]])

Description

## Overview Returns rows that existed in a Delta table at `from_version` but not at `to_version`. Implemented as a snapshot diff: the engine loads two time-travel snapshots of the same table (at the two versions) and computes the set difference between them. The result schema matches the source table's row schema, so the function composes naturally with regular SELECT, JOIN, INSERT, and DELETE. DELETED is one of the **DELETED / UPDATED / INSERTED** change-extraction trio. Together they let you partition any commit (or version range) into deletes, updates, and inserts using the same KEY columns; each operator surfaces exactly one slice and the three slices are non-overlapping. This is the standard primitive for incremental ETL on Delta tables. Unlike CDF-based change extraction, DELETED works on every Delta table without requiring `delta.enableChangeDataFeed = true` to have been set at write time. The cost is a scan of both versions' active file sets, which DataFusion can partition-prune and projection-prune in the usual way. ## Behavior - The 1-arg form `DELETED('schema.table')` returns rows removed by the most recent commit (versions `latest - 1` → `latest`). - The 3-arg form `DELETED('schema.table', from, to)` returns rows present at `from` but absent at `to`. With no KEY argument and no declared PRIMARY KEY, the diff is a full-row EXCEPT. - The 4-arg form `DELETED('schema.table', from, to, ARRAY['id'])` uses anti-join semantics on the named key columns. A row is returned if its key value existed at `from` and has no matching key at `to`. This distinguishes hard deletes from UPDATE pre-images. - A declared PRIMARY KEY constraint is auto-used when the user does not pass an explicit KEY argument. Explicit KEY always wins. - Versions older than the table's log-retention horizon are not readable. After VACUUM removes log files past the retention period, those snapshots become inaccessible and DELETED raises a planning error. ## Identity resolution The function resolves identity in this order: 1. Explicit `KEY` argument: `DELETED(..., ARRAY['col1', 'col2'])`. Always wins. 2. The source table's PRIMARY KEY constraint (declared via `delta.constraints.pk` or `ALTER TABLE ... ADD CONSTRAINT ... PRIMARY KEY`). 3. None: the function falls back to full-row EXCEPT. Full-row EXCEPT is structurally correct (`(SELECT * FROM t VERSION AS OF a) EXCEPT (SELECT * FROM t VERSION AS OF b)`) but treats two duplicate rows as a single row, so the count of returned rows can be smaller than the number of physically-removed rows on a non-unique schema. ## Versioning semantics Version bounds are inclusive on both ends. `DELETED('t', 5, 5)` returns rows removed by the single commit that produced version 5 (i.e. present at version 4, absent at version 5; this is the difference between commits 4 and 5). ## Composition with the trio DELETED, UPDATED, and INSERTED share an identical surface (table_name, from_version, to_version, key_array) and identical identity-resolution rules. Used together on the same version range and the same KEY they cleanly partition every change in the commit window: - A row whose key exists at `from` and not at `to` → DELETED. - A row whose key exists at both versions, with at least one non-key column differing → UPDATED. - A row whose key exists at `to` and not at `from` → INSERTED. - A row whose key exists at both versions with identical non-key columns → no operator returns it (genuinely unchanged). This property is exploited by the canonical incremental-ETL template: run all three against `(:last_seen, :current)`, route each result set to the corresponding sink action (DELETE, MERGE/UPSERT, INSERT), advance `:last_seen` to `:current`, repeat. ## Compatibility DeltaForge-native. The semantic equivalent in standard SQL is `(SELECT * FROM t VERSION AS OF :from) EXCEPT (SELECT * FROM t VERSION AS OF :to)` for the no-key path, or `SELECT v_from.* FROM t VERSION AS OF :from v_from LEFT ANTI JOIN t VERSION AS OF :to v_to ON v_from.id = v_to.id` for the key path. DELETED is the readable shorthand. ## Access Control | Privilege | Object | Notes | |-----------|--------|-------| | read | Table | Required on the source table. The function reads two snapshots of the same table; a single read grant suffices. |

Parameters

NameTypeDescription
table_nameSpecifies the source Delta table as a string literal. Accepts a one-, two-, or three-part name (table, schema.table, or zone.schema.table). The table must be a Delta table; external tables and in-memory tables are rejected because they have no version history.
from_versionInclusive lower-bound version of the range to diff. The function compares the snapshot at this version against the snapshot at to_version and returns rows present at from_version but absent at to_version. When omitted (1-arg form), defaults to latest - 1, so the function returns rows removed by the most recent commit.
to_versionInclusive upper-bound version of the range to diff. Must be greater than or equal to from_version. When omitted (1-arg form), defaults to the table's current latest version.
key_arrayIdentity columns used for anti-join semantics. When supplied, the function returns each V_from row whose key has no match at V_to (a hard delete; UPDATE pre-images are excluded because the key still exists at V_to). When omitted and the table has a declared PRIMARY KEY constraint, the constraint columns are used. When omitted and there is no PRIMARY KEY, the function falls back to full-row EXCEPT, which is correct on schemas where row identity is structural but collapses duplicate rows.

Examples

-- Audit which customers were removed in commit 17
SELECT * FROM DELETED('silver.customer', 17, 17);
-- Latest commit shorthand: rows removed in the most recent commit
SELECT * FROM DELETED('silver.events');
-- ETL pattern: propagate upstream deletes to a downstream warehouse mirror
DELETE FROM warehouse.dim_customer
 WHERE id IN (
     SELECT id FROM DELETED('silver.customer', 100, 105, ARRAY['id'])
 );
-- Anti-join with explicit key: distinguishes hard deletes from update pre-images
SELECT *
  FROM DELETED('silver.customer', 100, 110, ARRAY['customer_id']);
-- Compose with a JOIN to enrich the audit trail
SELECT d.customer_id, r.region
  FROM DELETED('silver.customer', 1, 2, ARRAY['customer_id']) d
  JOIN silver.region r ON d.region_id = r.id;
-- Canonical incremental-ETL template using the full DELETED / UPDATED / INSERTED trio.
-- Each operator covers one slice of the commit's change set; together they form a
-- complete, non-overlapping partition.
DELETE FROM warehouse.dim_customer
 WHERE id IN (SELECT id FROM DELETED('silver.customer', :last_seen, :current, ARRAY['id']));

INSERT INTO warehouse.stage_updates
 SELECT * FROM UPDATED('silver.customer', :last_seen, :current, ARRAY['id']);

INSERT INTO warehouse.stage_inserts
 SELECT * FROM INSERTED('silver.customer', :last_seen, :current, ARRAY['id']);

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →