ZORDER

Applies Z-order data layout optimization to a Delta table for efficient multi-dimensional data skipping.

Category: maintenanceDeltaForge extension

Syntax

ZORDER <table> BY (<columns>) [WHERE <condition>]

Description

## Overview ZORDER is a standalone command that applies Z-order (Morton curve) data layout optimization to a Delta table. It is functionally equivalent to OPTIMIZE ... ZORDER BY but can be invoked independently when file compaction is not the primary goal. ## How Z-Ordering Works Z-ordering maps multi-dimensional data to a single dimension using a space-filling curve. For each row, the bit representations of the Z-order column values are interleaved to produce a single Z-value. Rows are then sorted by this Z-value and written to output files. The result is that rows with similar values across all Z-order columns are stored physically close together in the same file or adjacent files. Each output file's min/max statistics for the Z-order columns form tight, non-overlapping ranges. ## Data Skipping Benefit When a query filters on one or more of the Z-order columns, the query engine reads the per-file min/max statistics and skips files whose ranges do not overlap with the query predicate. Because Z-ordering creates tight ranges across multiple columns simultaneously, queries that filter on any combination of the Z-order columns benefit from data skipping. Without Z-ordering, column values are typically spread across many files, and few files can be skipped. After Z-ordering, the same query may skip 80-95% of files. ## File Rewriting ZORDER reads all data from the selected files (filtered by partition predicate if specified), sorts by the Z-value, and writes new files. The old files are logically removed in the transaction log and become eligible for physical deletion by VACUUM after the retention period. ## Column Selection Guidelines - Z-order columns should be high-cardinality and frequently used together in query predicates. - Limit the number of Z-order columns to 2-4. Each additional column reduces the locality benefit for the other columns. - Do not Z-order by partition columns. Partition pruning already handles those; Z-ordering them provides no additional benefit. - Columns with very low cardinality (boolean, small enums) are poor candidates because they do not produce meaningful ranges. ## Result Set Returns a result set with two rows: | metric | value | |--------|-------| | files_added | Number of output files written | | bytes_processed | Total bytes of data processed | ## Access Control | Privilege | Object | Notes | |-----------|--------|-------| | Ownership or write | Table | Required to rewrite files and commit new versions. | ## Compatibility ZORDER is a DeltaForge extension. The underlying Z-order algorithm implements the Morton curve (Z-curve) space-filling technique used in Delta Lake's OPTIMIZE ZORDER BY.

Parameters

NameTypeDescription
tableSpecifies the name or path of the Delta table to Z-order. The table must be registered in the session. Fully qualified names (zone.schema.table) are supported.
columnsSpecifies one or more columns to Z-order by. Columns can be listed with or without parentheses. The Z-order algorithm interleaves the bit representations of the specified column values to create a space-filling curve (Morton curve) that preserves multi-dimensional locality.
where_clauseFilter partitions.

Examples

-- Z-order a table by two frequently queried columns
ZORDER orders BY (customer_id, product_id);
-- Z-order a specific partition
ZORDER orders BY (customer_id, product_id) WHERE year = 2024;
-- Z-order by a single high-cardinality column
ZORDER events BY (user_id);
-- Z-order by three columns for complex analytical queries
ZORDER telemetry BY (device_id, metric_type, region)
  WHERE date >= '2024-01-01';

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →