Deletes rows from a Delta table matching an optional predicate, recording file-level remove and add actions in the transaction log.
DELETE FROM <table> [WHERE <condition>] [RETURNING <columns>]
## Overview Removes rows from a Delta table that match the optional WHERE predicate. If no predicate is provided, all rows are removed. Each DELETE produces a new table version in the Delta transaction log, preserving previous versions for time travel. ## Behavior Delta Lake does not modify Parquet data files in place. Instead, DELETE uses a copy-on-write strategy: 1. The engine identifies data files containing rows that match the predicate. 2. For each affected file, a new Parquet file is written that excludes the deleted rows. 3. The transaction log records `remove` actions for the old files and `add` actions for the new files. 4. Unaffected data files are not touched. When the predicate references partition columns, Delta applies partition pruning to skip entire partitions that cannot contain matching rows. This significantly reduces I/O for partitioned tables. If no predicate is supplied, the operation removes all data files via `remove` actions without rewriting any files, which is equivalent to TRUNCATE TABLE. ### Subquery Optimization When the WHERE clause contains a subquery (e.g., `WHERE id IN (SELECT ...)`), the executor evaluates the subquery first and selects an execution strategy based on the result size: - **Small results (fewer than 1,000 rows):** The subquery result is materialized as an IN-list predicate, and the standard DELETE path executes with the resolved predicate. - **Large results (1,000 rows or more):** The operation is rewritten internally as a MERGE with a hash join, which is more efficient for large sets and can leverage deletion vectors for metadata-only updates. ### Auto-Compaction If the table property `delta.autoOptimize.autoCompact` is set to `true`, the engine runs a compaction pass after the DELETE completes, combining small files produced by the rewrite into larger, more efficient files. ## Access Control | Privilege | Object | Notes | |-----------|--------|-------| | write | Table | Required. Permits INSERT, UPDATE, and DELETE on the table. | ## Compatibility DELETE FROM is standard SQL. DeltaForge extends it with the USING clause (for PostgreSQL-style join deletes), alias support, and subquery optimization with automatic MERGE rewriting. The RETURNING clause is parsed for forward compatibility but is not yet evaluated.
| Name | Type | Description |
|---|---|---|
table | Specifies the name or path of the Delta table from which to delete rows. The table must already be registered via CREATE DELTA TABLE or OPEN DELTA TABLE. Supports fully qualified names (zone.schema.table). | |
condition | Specifies a SQL predicate that identifies which rows to delete. Supports comparison operators, logical operators (AND, OR, NOT), IN lists, BETWEEN, IS NULL, LIKE, and subqueries. When omitted, all rows in the table are deleted (equivalent to TRUNCATE TABLE). Partition columns in the predicate enable partition pruning, limiting the number of data files scanned. | |
columns | Specifies the columns to return from deleted rows via the RETURNING clause. An empty column list returns all columns (RETURNING *). This clause is parsed but reserved for future use. |
-- Delete rows matching a simple predicate
DELETE FROM warehouse.sales.orders
WHERE status = 'cancelled';
-- Delete all rows from a table (unconditional delete)
DELETE FROM staging.temp_data;
-- Delete with a compound predicate using AND
DELETE FROM warehouse.sales.orders
WHERE status = 'cancelled'
AND order_date < '2024-01-01';
-- Delete using IN list to target specific records
DELETE FROM warehouse.inventory.products
WHERE product_id IN (101, 202, 303);
-- Delete with a subquery (automatically optimized)
DELETE FROM warehouse.sales.orders
WHERE customer_id IN (
SELECT customer_id
FROM warehouse.sales.customers
WHERE is_deactivated = true
);
-- Delete rows older than a threshold from a partitioned table
-- Partition pruning skips files in partitions outside the date range
DELETE FROM warehouse.events.clickstream
WHERE event_date < '2023-06-01';