Updates column values in rows of a Delta table that match an optional predicate, rewriting affected data files and recording the changes in the transaction log.
UPDATE <table> SET <col1> = <val1> [, <col2> = <val2> ...] [WHERE <condition>] [RETURNING <columns>]
## Overview Modifies column values in rows of a Delta table that satisfy the optional WHERE predicate. If no predicate is provided, all rows in the table are updated. Each UPDATE produces a new table version in the Delta transaction log, preserving previous versions for time travel. ## Behavior Delta Lake does not modify Parquet data files in place. UPDATE uses a copy-on-write strategy: 1. The engine identifies data files containing rows that match the predicate. 2. For each affected file, the engine reads all rows, applies the SET assignments to matching rows, and writes a new Parquet file. 3. The transaction log records `remove` actions for the old files and `add` actions for the new files. 4. Unaffected data files are not touched. When the predicate references partition columns, Delta applies partition pruning to skip entire partitions that cannot contain matching rows. ### Assignment Expressions Each SET clause assignment is parsed into a column name and a value expression. The expression is evaluated per row and may reference any column in the table. Supported expression types include literals, column references, arithmetic operations, CASE expressions, COALESCE, ROUND, and other scalar functions. Column name prefixes (table name or alias) are stripped automatically. For example, `SET t.price = t.price * 1.1` and `SET price = price * 1.1` are equivalent when updating table `t`. ### FROM Clause (Join Updates) The parser supports PostgreSQL-style UPDATE ... FROM syntax, where the UPDATE target is an alias resolved from the FROM clause. This enables join-based updates where values come from a secondary table. The engine resolves the real table name from the FROM clause aliases and strips alias prefixes from assignments. ### Subquery Optimization When the WHERE clause contains a subquery (e.g., `WHERE id IN (SELECT ...)`), the executor evaluates the subquery first and selects an execution strategy: - **Small results (fewer than 1,000 rows):** The subquery result is materialized as an IN-list predicate, and the standard UPDATE path executes. - **Large results (1,000 rows or more):** The operation is rewritten internally as a MERGE with a hash join for efficient execution. ### Auto-Compaction If the table property `delta.autoOptimize.autoCompact` is set to `true`, the engine runs a compaction pass after the UPDATE completes, combining small files produced by the rewrite into larger, more efficient files. ## Access Control | Privilege | Object | Notes | |-----------|--------|-------| | write | Table | Required. Permits INSERT, UPDATE, and DELETE on the table. | ## Compatibility UPDATE SET ... WHERE follows standard SQL semantics. DeltaForge extends it with the FROM clause for join updates, alias prefix stripping, automatic subquery optimization, and `arrow_cast` expression rewriting for timestamp and date literals.
| Name | Type | Description |
|---|---|---|
table | Specifies the name or path of the Delta table to update. The table must already be registered via CREATE DELTA TABLE or OPEN DELTA TABLE. Supports fully qualified names (zone.schema.table). An alias may follow the table name (with or without the AS keyword). | |
assignments | Specifies one or more column = expression pairs separated by commas. Each assignment sets the named column to the result of the expression for every row matching the predicate. Expressions may reference other columns in the same table, use scalar functions (ROUND, COALESCE, CASE), and include arithmetic operators. Column names may be qualified with the table name or alias prefix, which is stripped automatically. | |
condition | Specifies a SQL predicate that identifies which rows to update. Supports comparison operators, logical operators (AND, OR, NOT), IN lists, BETWEEN, IS NULL, LIKE, and subqueries. When omitted, all rows in the table are updated. | |
columns | Specifies the columns to return from updated rows via the RETURNING clause. An empty column list returns all columns (RETURNING *). This clause is parsed for forward compatibility but is not yet evaluated. |
-- Update a single column with a simple predicate
UPDATE warehouse.sales.customers
SET tier = 'gold'
WHERE lifetime_value > 10000;
-- Update multiple columns in one statement
UPDATE warehouse.sales.orders
SET status = 'shipped',
shipped_at = CURRENT_TIMESTAMP
WHERE batch_id = 42;
-- Update with arithmetic expression referencing another column
UPDATE warehouse.inventory.products
SET price = ROUND(price * 1.10, 2)
WHERE category = 'Electronics';
-- Update with CASE expression for conditional logic
UPDATE warehouse.sales.orders
SET discount = CASE
WHEN amount > 1000 THEN 0.15
WHEN amount > 500 THEN 0.10
ELSE 0.05
END
WHERE status = 'pending';
-- Update all rows (no WHERE clause)
UPDATE warehouse.staging.raw_events
SET processed = true;
-- Update with a subquery in the predicate
UPDATE warehouse.sales.orders
SET status = 'flagged'
WHERE customer_id IN (
SELECT customer_id
FROM warehouse.sales.customers
WHERE fraud_score > 0.9
);