Compacts small files into larger files and optionally applies Z-ordering for improved read performance.
OPTIMIZE <table> [WHERE <condition>] [ZORDER BY (<columns>)] [TARGET_SIZE <n>]
## Overview OPTIMIZE compacts small data files in a Delta table into larger files to improve read performance. Small files accumulate from streaming ingestion, frequent updates, high-cardinality partitioning, or partition pruning during deletes. Each small file incurs per-file overhead (file open, metadata read, footer parsing) that degrades query throughput. ## Compaction Strategy OPTIMIZE uses a bin-packing algorithm: 1. **File selection**: Identifies files smaller than the target size within each partition. 2. **Bin packing**: Groups selected files into bins that approximate the target output file size. 3. **Rewrite**: Reads all rows from each bin's files, writes a single compacted output file, and records AddFile/RemoveFile actions in the transaction log. 4. **Commit**: Commits the new version atomically. Files already at or above the target size are not rewritten. ## Deletion Vector Handling OPTIMIZE is deletion-vector-aware. When processing files that have associated deletion vectors: - **Low deletion ratio (at or below 5%)**: Deletion vectors are preserved. All physical rows are written to the output file and a new merged DV is created that remaps deleted row positions to their new locations. - **High deletion ratio (above 5%)**: Deleted rows are filtered out during compaction. The output file contains only live rows and no deletion vector is attached. This threshold (5%) balances storage efficiency against rewrite cost and is the widely adopted default for Delta Lake implementations. ## Z-Ordering When ZORDER BY is specified, OPTIMIZE applies Z-order (Morton curve) data layout within each partition. Z-ordering maps multi-dimensional column values to a single sort key that preserves spatial locality. After Z-ordering, each output file's min/max statistics for the Z-order columns form tight ranges, enabling the query engine to skip files that cannot contain matching rows. Z-ordering is most effective when queries commonly filter on two or more of the specified columns simultaneously. For single-column filtering, standard sorting (via CLUSTER BY) may be more efficient. ## Idempotency Running OPTIMIZE on an already-optimized table is safe. If no small files are found, the operation completes with zero files compacted and does not create a new table version. ## Result Set Returns a result set with three rows: | metric | value | |--------|-------| | files_removed | Number of input files compacted | | files_added | Number of output files written | | version | New table version after compaction | ## Access Control | Privilege | Object | Notes | |-----------|--------|-------| | Ownership or write | Table | Required to rewrite data files and commit new versions. | ## Compatibility OPTIMIZE is a DeltaForge extension implementing Delta Lake file compaction with Z-order support. The TARGET_SIZE clause is a DeltaForge enhancement.
| Name | Type | Description |
|---|---|---|
table | Specifies the name or path of the Delta table to optimize. The table must be registered in the session. Fully qualified names (zone.schema.table) are supported. | |
where_clause | Filter partitions to optimize. | |
zorder_columns | Columns to Z-order by. | |
target_size | Target file size in bytes. |
-- Basic compaction with default settings
OPTIMIZE orders;
-- Optimize only recent partitions
OPTIMIZE orders WHERE date >= '2024-01-01';
-- Compact and Z-order by two columns for multi-dimensional query performance
OPTIMIZE orders ZORDER BY (customer_id, product_id);
-- Optimize a specific partition with Z-ordering and a custom target file size
OPTIMIZE events
WHERE year = 2024 AND month = 3
ZORDER BY (user_id, event_type)
TARGET_SIZE 268435456;
-- Optimize using the optional TABLE keyword
OPTIMIZE TABLE warehouse.sales.line_items;