CREATE INDEX

Creates a row-level sorted index (PGM or B+ tree) over a Delta table to accelerate equality and range lookups on the indexed prefix.

Category: indexingDeltaForge extension

Syntax

CREATE INDEX [IF NOT EXISTS] <name> ON TABLE <table> (<columns>) [USING <algorithm>] [WITH (<key> = <value>, ...)]

Description

## Overview CREATE INDEX builds a row-level sorted index over a Delta table. The index materializes a child Delta table that maps the indexed key prefix to row pointers (file path plus row offset). Subsequent equality and range lookups on the indexed prefix are answered by the index instead of a full table scan. ## Behavior - The index is materialized at the parent table's current version. The index records that version so DESCRIBE INDEXES can report whether it is up to date. - The child index Delta table is created under the parent at a derived path (`<parent_path>/_delta_indexes/<index_id>`). - With `auto_update = false` (default) the index becomes stale on the next parent commit. Stale indexes are still queryable but the planner falls back to scan for any rows changed after the indexed version. REBUILD INDEX or `auto_update = true` brings it back in sync. - With `auto_update = true` parent writes trigger an incremental maintenance pass that re-indexes the changed files only. Maintenance is part of the parent commit, so writes pay the cost. - Algorithm choice is fixed at creation time. Switching from PGM to B+ tree (or vice versa) requires DROP INDEX followed by CREATE INDEX. - Composite indexes (multiple columns) honor the prefix rule: predicates on the leading columns benefit, predicates that skip leading columns do not. ## Algorithms ### PGM (default) A learned, recursive piecewise model. Smaller than a B+ tree for most distributions, with O(log log n) lookup. Best general-purpose choice. ### B+ tree Classical B+ tree. Larger memory footprint than PGM but predictable performance across all key distributions. Choose this if PGM model size grows pathologically for your data, or for workloads dominated by range scans. ## Access Control No specific privilege is required beyond table-level access in the standalone SQL layer. In a Control Plane deployment, the table's MODIFY privilege governs index creation alongside other schema changes. ## Compatibility DeltaForge extension. The PGM and B+ tree row indexes are not part of the Delta Lake protocol; they are stored as a sibling Delta table that the DeltaForge planner consults during query planning.

Parameters

NameTypeDescription
nameSpecifies the index name. Must be unique per table. Used in DROP, ALTER, and REBUILD INDEX.
tableSpecifies the parent Delta table. The table must already exist and be registered in the session (CREATE DELTA TABLE or OPEN DELTA TABLE).
columnsSpecifies the indexed columns in declared prefix order. The leftmost column is the primary key for ordering; lookups that supply only the leftmost prefix benefit from the index.
algorithmSpecifies the index data structure. Valid values: `pgm` (default, learned-index recursive piecewise model), `btree` (classical B+ tree). The parser also accepts `default`, `b-tree`, `b+tree`, and `b+` as aliases.
auto_updateSpecifies whether parent writes maintain the index. When `false` (default) the index becomes stale on the next parent commit and must be refreshed with REBUILD INDEX. When `true`, every parent commit triggers a maintenance pass.
if_not_existsSkip creation if an index with the same name already exists on the table. The existing index is left unchanged.
descriptionOptional human-readable description stored with the index definition. Surfaced by DESCRIBE INDEXES.
optionsAlgorithm-specific options forwarded to the index builder. Use this clause for tuning knobs that do not have first-class syntax.

Examples

-- Default PGM index on a single column
CREATE INDEX idx_customer ON TABLE orders (customer_id);
-- B+ tree index for workloads with frequent point lookups
CREATE INDEX idx_order_btree ON TABLE orders (order_id) USING btree;
-- Composite index covering a (customer_id, order_date) prefix
CREATE INDEX idx_customer_date ON TABLE orders (customer_id, order_date);
-- Auto-maintained index, refreshed on every parent commit
CREATE INDEX idx_auto ON TABLE orders (order_id) WITH (auto_update = true);
-- Idempotent bootstrap that skips if the index already exists
CREATE INDEX IF NOT EXISTS idx_order_btree ON TABLE orders (order_id) USING btree;

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →